apache-spark

Arrow 2.0.0 - structs in pandas
Finally, nested types in Arrow.
Sparkling SCD2
Data preparation using spark without ACID tables
Run the latest version of spark
Execute the latest version of spark on HDP.
Production grade pyspark jobs
Use additional python packages with pyspark
Spark and Hive 3
Get spark and Hive to play nice again on HDP 3.1
Parallel aggregation of dataframes
Use idempotency of RDD’s to your advantage