• Home
  • Blog
  • Publications
  • Projects
  • Talks
  • Teaching
  • Projects
    • tsai
    • CSH Skillup
    • H3 conda-forge
    • Datalake for the enterprise & large geospatial data
    • Music streaming Analytics
    • Predictive credit scoring
    • PredictR
    • VDSG
  • Courses
    • DS101 - introduction to data science, 2022 DHBW
    • UII: Introduction to Big-data analytics 2021
    • DS101 - introduction to data science, 2021 DHBW
    • DHBW: introduction to data science 2020
    • UII: Introduction to Big-data analytics 2020
    • DHBW: introduction to data science 2019
    • UII: Big-data analytics with R & spark 2019
  • Posts
    • 2nd brain obsidian template
    • Migrating to rattler-build
    • Data Vault User Group Vienna 2025 February
    • Upskilling data engineers
    • Local data stack template
    • Cost efficient alternative to databricks lock-in
    • Dagster, dbt, duckdb as new local MDS
    • Securing Secrets with Mozilla SopS and AGE: A Powerful Combo
    • Unlocking Advanced Metadata Extraction with the New DBT API in Dagster
    • Making BigData small again (and green)
    • Comparing SQL-based streaming approaches
    • SFTP sensor
    • Connector goodness from Airbyte E2E lineage
    • Scalable data pipelines from dagster with pyspark
    • Tame your notebooks
    • Fully-fledged example with resources
    • Turning the data pipeline inside out
    • From hello-world to simple pipelines
    • Modern data orchestration using Dagster
    • Interactive dagster debugging
    • Scalable sparse matrix multiplication
    • COVID population model
    • ML project configuration management
    • Can you tell the nuts & berries apart in each group?
    • Intersting links about deep learning
    • Exact percentiles in Spark
    • Arrow 2.0.0 - structs in pandas
    • Sparkling SCD2
    • Speed up conda and improve error messages
    • Time-series visualization in python
    • Intersting links about Bayesian modeling
    • Run the latest version of spark
    • Intersting links about IoT
    • Production grade pyspark jobs
    • blazing-fast data science on GPUs
    • Deterministic scale-out for spark jobs under increased load
    • Spark and Hive 3
    • Parallel aggregation of dataframes
    • Tricks for scala with gradle
    • reproducible geospatial visualization in kepler.gl
    • Geospatial binning with hexagons on spark
    • Spark descriptive name for cached dataframes
    • Data links KW 28
    • Writing technical content in Markdown
    • Solve data skew issues for array columns in spark
    • Data links KW 22
    • Data links KW 21
    • Ultimate open vector geoprocessing on spark
    • Data links KW 20
    • Data links KW 19
    • Scalable cohort sampler
    • Analyze OSM data in spark
    • OSM to Spark
    • Headless spark on YARN
    • Data links KW 18
    • Data links KW 17
    • Scaling geospatial data processing in R
    • Data links KW 13
    • Data Works Summit
    • Data links KW 12
    • Data links KW 11
    • Data links KW 10
    • Data links KW 9
    • Data links KW 8
    • Dynamically select columns by type
    • Data links KW 7
    • Data links KW 6
    • Learnings from my master thesis with an industrial partner
    • Noise pollution data cleanup
    • Display Jupyter Notebooks with Academic
    • Data links KW 5
    • Data links KW 4
    • The beginning
    • Typesafe data analytics
    • fast AI deep learning course on google collab
    • You are the mean of all your peers
  • Publications
    • Cost-Effective Big Data Orchestration Using Dagster: A Multi-Platform Approach
    • The diaspora model for human migration
    • Visual analytics of mobility network changes observed using mobile phone data during COVID-19 pandemic
    • Specialization in Criminal Careers
    • Viral variant-resolved wastewater surveillance of SARS-CoV-2 at national scale
    • Identifying the root cause of cable network problems with machine learning
    • Data Anonymization – The key to innovation
    • Mobility changes in Austria in fall 2021
    • Monitoring supply networks from mobile phone data for estimating the systemic risk of an economy
    • Preprint: Varieties of mobility measures: Comparing survey and mobile phone data during the COVID-19 pandemic
    • Meteorological factors and non-pharmaceutical interventions explain local differences in the spread of SARS-CoV-2 in Austria
    • Hin zu einer regionalisierten Niedriginzidenz-Strategie für kommende Covid-19-Infektionswellen
    • Von Lockdown zu Lockdown: Über die Entwicklung der Mobilitätssreduktion in Österreichs Bundesländern
    • Behavioral gender differences are reinforced during the COVID-19 crisis
    • The impact of COVID-19 on relative changes in aggregated mobility using mobile-phone data
    • Complexity, transparency and time pressure: practical insights into science communication in times of crisis
    • Country-wide mobility changes observed using mobile phone data during COVID-19 pandemic
    • Comparing Implementation Variants Of Distributed Spatial Join on Spark
    • An example preprint / working paper
    • Cost-based statistical methods for fraud detection. Prediction of never-paying customers considering individual risk
    • An example journal article
    • Clustering time-series. An overview about different application contexts of time-series clustering
  • Recent & Upcoming Talks
    • Scaling data pipelines @Telekom
    • Open Data Hackathon Wien 25
    • Pixi powering Telekom data cloud
    • Cost efficient alternative to databricks lock-in
    • Cloud arbitrage for spark pipelines
    • Introduction to Geostatistics
    • Data Engineering in the DBT ecosystem
    • Modern data stack in the enterprise
    • Governance and pipelines in the modern data stack
    • Efficient Temporal Graph Analytics
    • AI basierte Root Cause Analyse von CPD Störquellen in Docsis Netzen
    • Orchestrating data in the mesh of the fragmented modern data stack
    • Mobility analytics
    • data processing
    • Geospatial data processing
    • R for HPC and big data
    • Make your ML app rock

Datalake for the enterprise & large geospatial data

Apr 27, 2016 · 1 min read

Building the new analytical plattform, reverse engineering age old data pipelines. and buiding new use cases by analyzing huge quantities of spatial data.

Last updated on Nov 20, 2019
Machine-Learning
Georg Heiler
Authors
Georg Heiler
senior data expert
My research interests include large geo-spatial time and network data analytics.

← H3 conda-forge Oct 1, 2019
Music streaming Analytics Apr 27, 2016 →

© 2025 Georg Heiler. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.