big-data

Run the latest version of spark
Execute the latest version of spark on HDP.
Production grade pyspark jobs
Use additional python packages with pyspark
Parallel aggregation of dataframes
Use idempotency of RDD’s to your advantage
data processing
recent history of data processing.