Spark-based data PaaS solutions are convenient. But they come with their own set of challenges such as a high vendor lock-in and obscured costs. We show how to use a dedicated orchestrator ([dagster-pipes](https://docs.dagster.io/guides/dagster-pipes)). It can not only make Databricks an implementation detail but also save cost. Also, it improves developer productivity. It allows you to take back control.
Sep 12, 2024
Spark-based data PaaS solutions are convenient. But they come with their own set of challenges such as a high vendor lock-in and obscured costs. We show how to use a dedicated orchestrator ([dagster-pipes](https://docs.dagster.io/guides/dagster-pipes)). It can not only make Databricks an implementation detail but also save cost. Also, it improves developer productivity. It allows you to take back control.
Jun 21, 2024
Good quality network connectivity is ever more important. For hybrid fiber coaxial (HFC) networks, searching for upstream \emph{high noise} in the past was cumbersome and time-consuming. Even with machine learning due to the heterogeneity of the network and its topological structure, the task remains challenging. We present the automation of a simple business rule (largest change of a specific value) and compare its performance with state-of-the-art machine-learning methods and conclude that the precision@1 can be improved by 2.3 times. As it is best when a fault does not occur in the first place, we secondly evaluate multiple approaches to forecast network faults, which would allow performing predictive maintenance on the network.
May 10, 2022
Comparing established and up-and-coming streaming approaches for an integrated real-time data model
Apr 1, 2022
Mar 14, 2022
Getting started with simple dagster pipelines.
Mar 4, 2022
Using Apache Spark for **sparse** matrix multiplication
Aug 6, 2021
Combining the power of Scala and Python to make the calculation of percentiles in Spark easy and fast
Nov 21, 2020
Finally, nested types in Arrow.
Nov 20, 2020
Data preparation using spark without ACID tables
Nov 19, 2020