Big-Data

Deterministic scale-out for spark jobs under increased load

Make spark jobs scale reliably using iteration

Georg Heiler

Dec 13, 2019 2 min read

Parallel aggregation of dataframes

Use idempotency of RDD’s to your advantage

Georg Heiler

Last updated on Jun 13, 2019 1 min read

Geospatial binning with hexagons on spark

Bring hexagons as efficient spatial operations to spark

Georg Heiler

Last updated on Jun 2, 2019 2 min read

Geospatial binning with hexagons on spark

data processing

recent history of data processing.

Aug 4, 2019 9:00 AM — 11:00 AM Yogyakarta, Indonesia

Georg Heiler

data processing

Spark descriptive name for cached dataframes

Display user friendly names for cached table in Spark web UI

Georg Heiler

Jul 23, 2019 1 min read

Solve data skew issues for array columns in spark

Preventing data skew issues for Arrays.

Georg Heiler

Jun 13, 2019 4 min read

Ultimate open vector geoprocessing on spark

Combine the strengths from geomesa and geospark for ultimate geoprocessing capabilities on spark

Georg Heiler

Last updated on Jun 2, 2019 4 min read

Ultimate open vector geoprocessing on spark

Analyze OSM data in spark

Analyze the OSM community and extract geometries from the graph.

Georg Heiler

May 7, 2019 2 min read

Processing OSM in a scalable hadoop native way.

Georg Heiler

May 3, 2019 6 min read

OSM to Spark

Headless spark on YARN

Sometimes you want to run a different version of spark than the one offered by your distributor. This is easily possible.

Georg Heiler

May 1, 2019 2 min read

Headless spark on YARN