Georg Heiler

PhD candidate & data scientist

Complexity Science Hub

TU Wien


Vienna Data Science Group


Georg Heiler is a PhD candidate at the Vienna University of Technology and Complexity Science Hub Vienna.

Georg obtained a bachelor’s and a master’s degree in business informatics from the Vienna University of Technology. In his master thesis titled “Cost-based statistical methods for fraud detection”, he showed the superiority of an individual cost based machine learning based credit check process over the traditional methodology used at the partner company.

As a data scientist Georg works on E2E analytical pipelines.


  • Geo-spatial analytics
  • Time series
  • Network analytics
  • Large and fast data


  • MSc in Business Informatics, 2018

    TU Wien

  • BSc in Business Informatics, 2015

    TU Wien

Recent Posts

Production grade pyspark jobs

Use additional python packages with pyspark

blazing-fast data science on GPUs

Fast calculation of ego network using RAPIDS-AI.

Deterministic scale-out for spark jobs under increased load

Make spark jobs scale reliably using iteration

Spark and Hive 3

Get spark and Hive to play nice again on HDP 3.1

Parallel aggregation of dataframes

Use idempotency of RDD’s to your advantage

Recent & Upcoming Talks

data processing

recent history of data processing.

Geospatial data processing

Introduction to processing geospatial data from classical small data applications to big data workloads

R for HPC and big data

Intro duction to processing large quantities of data in R

Make your ML app rock

Moving a draft ML application to a real world production environment can be hard. This talk outlines several tools and strategies to do …



H3 conda-forge

Conda forge offers effortless installation of various well tested python packages. I am a maintainer of H3 and H3-py on conda-forge.

Datalake for the enterprise & large geospatial data

Reverse engineering old data pipelines ; ) and analyzing huge quantities of spatial data.

Music streaming Analytics

Anomaly detection for music streaming royalties

Predictive credit scoring

Individual cost based classification model outperforms classical processes.


PredictR is a fintech startup which turns personal transaction lists into cashflow forecasts. It allows customers to explor their financial future to put life decisions into context. http://predictr.eu/.


Vienna Data Science Group [VDSG] is a nonprofit association promoting knowledge about data science. I am a member here and help newcomers find their way into data science. https://viennadatasciencegroup.at/.

Recent Publications

Quickly discover relevant content by filtering publications.

Comparing Implementation Variants Of Distributed Spatial Join on Spark

As an increasing number of sensor devices (Internet of Things) is used, more and more spatio-temporal data becomes available. Being …

Cost-based statistical methods for fraud detection. Prediction of never-paying customers considering individual risk

Telecommunication providers not only offer services but increasingly finance consumer devices. Credit scoring and the detection of …

Clustering time-series. An overview about different application contexts of time-series clustering

Time-series are becoming more and more important in the digitized industry 4.0. from forecasting of sales to increase the profit in …