Apache-Spark

Cost efficient alternative to databricks lock-in featured image

Cost efficient alternative to databricks lock-in

Spark-based data PaaS solutions are convenient. But they come with their own set of challenges such as a high vendor lock-in and obscured costs. We show how to use a dedicated …

avatar
Dr. Georg Heiler
Cloud arbitrage for spark pipelines featured image

Cloud arbitrage for spark pipelines

Spark-based data PaaS solutions are convenient. But they come with their own set of challenges such as a high vendor lock-in and obscured costs. We show how to use a dedicated …

avatar
Dr. Georg Heiler
AI basierte Root Cause Analyse von CPD Störquellen in Docsis Netzen featured image

AI basierte Root Cause Analyse von CPD Störquellen in Docsis Netzen

Good quality network connectivity is ever more important. For hybrid fiber coaxial (HFC) networks, searching for upstream \emph{high noise} in the past was cumbersome and …

avatar
Dr. Georg Heiler
Comparing SQL-based streaming approaches featured image

Comparing SQL-based streaming approaches

Comparing established and up-and-coming streaming approaches for an integrated real-time data model

avatar
Dr. Georg Heiler
Identifying the root cause of cable network problems with machine learning featured image

Identifying the root cause of cable network problems with machine learning

Good quality network connectivity is ever more important. For hybrid fiber coaxial (HFC) networks, searching for upstream high noise in the past was cumbersome and time-consuming. …

avatar
Dr. Georg Heiler
Scalable data pipelines from dagster with pyspark featured image

Scalable data pipelines from dagster with pyspark

Getting started with simple dagster pipelines.

avatar
Dr. Georg Heiler
Scalable sparse matrix multiplication featured image

Scalable sparse matrix multiplication

Using Apache Spark for **sparse** matrix multiplication

avatar
Dr. Georg Heiler

Exact percentiles in Spark

Combining the power of Scala and Python to make the calculation of percentiles in Spark easy and fast

avatar
Dr. Georg Heiler

Arrow 2.0.0 - structs in pandas

Finally, nested types in Arrow.

avatar
Dr. Georg Heiler

Sparkling SCD2

Data preparation using spark without ACID tables

avatar
Dr. Georg Heiler