apache-spark

Exact percentiles in Spark

Combining the power of Scala and Python to make the calculation of percentiles in Spark easy and fast

Arrow 2.0.0 - structs in pandas

Finally, nested types in Arrow.

Sparkling SCD2

Data preparation using spark without ACID tables

Run the latest version of spark

Execute the latest version of spark on HDP.

Production grade pyspark jobs

Use additional python packages with pyspark

Deterministic scale-out for spark jobs under increased load

Make spark jobs scale reliably using iteration

Spark and Hive 3

Get spark and Hive to play nice again on HDP 3.1

Parallel aggregation of dataframes

Use idempotency of RDD's to your advantage

Geospatial binning with hexagons on spark

Bring hexagons as efficient spatial operations to spark

data processing

recent history of data processing.