Python

Scaling Data Pipelines @Magenta Telekom
Scaling Data Pipelines @Magenta Telekom

Magenta Telekom ingests many terabytes of new data every day, and every downstream consumer wants it immediately. The real bottleneck turned out not to be hardware but humans wrestling with hidden, hard-wired dependencies in hundreds of heterogeneous pipelines and sometimes tool silos.

Nov 4, 2025

Upskilling data engineers
Upskilling data engineers

A comprehensive guide to modern data engineering with local-first development practices

Mar 14, 2025

Local data stack template
Local data stack template

Jumpstart your data processing with this local modern data stack template

Oct 25, 2024

Cost efficient alternative to databricks lock-in
Cost efficient alternative to databricks lock-in

Spark-based data PaaS solutions are convenient. But they come with their own set of challenges such as a high vendor lock-in and obscured costs. We show how to use a dedicated orchestrator ([dagster-pipes](https://docs.dagster.io/guides/dagster-pipes)). It can not only make Databricks an implementation detail but also save cost. Also, it improves developer productivity. It allows you to take back control.

Sep 12, 2024

Cloud arbitrage for spark pipelines
Cloud arbitrage for spark pipelines

Spark-based data PaaS solutions are convenient. But they come with their own set of challenges such as a high vendor lock-in and obscured costs. We show how to use a dedicated orchestrator ([dagster-pipes](https://docs.dagster.io/guides/dagster-pipes)). It can not only make Databricks an implementation detail but also save cost. Also, it improves developer productivity. It allows you to take back control.

Jun 21, 2024

Cost efficient alternative to databricks lock-in
Cost efficient alternative to databricks lock-in

Save money 💰 and increase developer productivity 👩‍💻👨‍💻 by limiting scope-creep of Spark-based data PaaS solutions: 🌐 turn them into an implementation detail 🔧.

Jun 21, 2024

Dagster, dbt, duckdb as new local MDS
Dagster, dbt, duckdb as new local MDS

Lean and efficient MDS experience: Delivers better software engineering practices to the data ecosystem with the new local MDS stack comprised of Dagster, dbt and DuckDB which offers better developer productivity by enhancing testability of the E2E pipeline.

Dec 11, 2023

Unlocking Advanced Metadata Extraction with the New DBT API in Dagster
Unlocking Advanced Metadata Extraction with the New DBT API in Dagster

📊 Unleash the power of metadata extraction in your data engineering pipelines with the new DBT API in Dagster! 🚀 Learn how to seamlessly integrate and leverage DBT transformations, while enriching your data catalog with advanced metadata. Elevate your data governance and collaboration to new heights!

Jun 13, 2023

Governance and pipelines in the modern data stack

The data orchestrator is at the heart of the data pipelines. We start by exploring how a modern data orchestrator drastically eases the development of pipelines. Then we will see how govanance can be conducted efficiently in a MDS-based setup.

Dec 8, 2022

AI basierte Root Cause Analyse von CPD Störquellen in Docsis Netzen
AI basierte Root Cause Analyse von CPD Störquellen in Docsis Netzen

Good quality network connectivity is ever more important. For hybrid fiber coaxial (HFC) networks, searching for upstream \emph{high noise} in the past was cumbersome and time-consuming. Even with machine learning due to the heterogeneity of the network and its topological structure, the task remains challenging. We present the automation of a simple business rule (largest change of a specific value) and compare its performance with state-of-the-art machine-learning methods and conclude that the precision@1 can be improved by 2.3 times. As it is best when a fault does not occur in the first place, we secondly evaluate multiple approaches to forecast network faults, which would allow performing predictive maintenance on the network.

May 10, 2022