Combining the power of Scala and Python to make the calculation of percentiles in Spark easy and fast
Finally, nested types in Arrow.
Data preparation using spark without ACID tables
Execute the latest version of spark on HDP.
Use additional python packages with pyspark
Make spark jobs scale reliably using iteration
Get spark and Hive to play nice again on HDP 3.1
Use idempotency of RDD's to your advantage
Bring hexagons as efficient spatial operations to spark
recent history of data processing.