Big-Data

Cloud arbitrage for spark pipelines
Spark-based data PaaS solutions are convenient. But they come with their own set of challenges such as a high vendor lock-in and obscured costs. We show how to use a dedicated orchestrator (dagster-pipes). It can not only make Databricks an implementation detail but also save cost. Also, it improves developer productivity. It allows you to take back control.
Cloud arbitrage for spark pipelines
Production grade pyspark jobs
Use additional python packages with pyspark