Spark-based data PaaS solutions are convenient. But they come with their own set of challenges such as a high vendor lock-in and obscured costs. We show how to use a dedicated orchestrator ([dagster-pipes](https://docs.dagster.io/guides/dagster-pipes)). It can not only make Databricks an implementation detail but also save cost. Also, it improves developer productivity. It allows you to take back control.
Sep 12, 2024
Spark-based data PaaS solutions are convenient. But they come with their own set of challenges such as a high vendor lock-in and obscured costs. We show how to use a dedicated orchestrator ([dagster-pipes](https://docs.dagster.io/guides/dagster-pipes)). It can not only make Databricks an implementation detail but also save cost. Also, it improves developer productivity. It allows you to take back control.
Jun 21, 2024
Nov 10, 2023
Feb 8, 2023
Oct 19, 2022
Good quality network connectivity is ever more important. For hybrid fiber coaxial (HFC) networks, searching for upstream \emph{high noise} in the past was cumbersome and time-consuming. Even with machine learning due to the heterogeneity of the network and its topological structure, the task remains challenging. We present the automation of a simple business rule (largest change of a specific value) and compare its performance with state-of-the-art machine-learning methods and conclude that the precision@1 can be improved by 2.3 times. As it is best when a fault does not occur in the first place, we secondly evaluate multiple approaches to forecast network faults, which would allow performing predictive maintenance on the network.
May 10, 2022
Mar 14, 2022
Figure description: (a) Probability $p(s|c)$ to find a supply link, sij , given that there exists a communication link, cij, between firms i and j for communication links exceeding a given call duration, dij. Error bars denote the quartiles of a bootstrap simulation described in SI Text 1.
Oct 13, 2021
Nov 16, 2020
Execute the latest version of spark on HDP.
Aug 31, 2020