Data links KW 20
some useful & interesting links
- Apache Spark Data Validation
- Submit jobs to spark in parallel
- Architecting Structured Streaming Pipelines the right way
- Understanding query plans and Spark Uis. Great tips on SQL tuning
- Modular Apache Spark Transform Your Code in Pieces
- Parition handling in spark and handoop as well as small files problem and possible solutions
- Apache Kafka Data Access Semantics: Consumers and Membership
- Persitable HyperLogLog in spark using swoop-inc/spark-alchemy