Pixi powering Telekom data cloud

Jan 31, 2025·
Aleksandar Milicevic
Aleksandar Milicevic
Georg Heiler
Georg Heiler
· 2 min read
Abstract
At Magenta Telekom in Austria we build our new data platform around metadata and strong governance and an explicit graph of data dependencies. Pixi is a tool which enables efficient dependency handling for us. In this talk we share our experiences with Pixi and how it empowers our data infrastructure.
Date
Jan 31, 2025 11:30 AM
Event
Location

Livestream

Pixi is a tool which enables efficient dependency handling. It is created from prefix.dev, built in Rust and very fast. For us at Magenta Telekom in Austria Pixi is beneficial as we build our new data platform around metadata and strong governance and an explicit graph of data dependencies. In this talk we share our experience with Pixi and how it empowers our data infrastructure - in conjunction with Dagster.

Key principles

Explicit Modeling of Data Dependencies as a Graph

  • Graph-Based Dependencies: By representing data dependencies as a graph, the data orchestrator Dagster manages pipelines efficiently, automatically handling dependencies between transformations - like a calculator when dealing with numbers.
  • Event-Based Notifications: The dependency graph enables immediate notifications when upstream sources change, allowing downstream consumers to update promptly, reducing wait times.
  • Comprehensive Integration: Incorporating data ingestion, transformation, BI/reporting, and AI into the dependency graph makes all relationships transparent, fostering cross-tool & cross-departmental collaboration by breaking down silos.
  • Strong Governance: We adhere to robust governance principles by collecting metadata during ingestion and propagating it throughout the graph, ensuring data integrity and compliance.

Advantages Over the Previous System (Conda)

  • Optimized Lockfiles: Lockfile resolution occurs only during development, streamlining deployment processes.
  • Faster Dependency Resolution: Enhanced speed in resolving dependencies accelerates pipeline execution.
  • Integrated Task Runner: Replaces cumbersome makefiles with a seamless task runner, improving workflow efficiency.
  • Environment-Specific Dependencies: Supports feature-based environments (development, production, linting), ensuring appropriate configurations across different stages.
  • Multi-Language Support: Facilitates environments that include both Python and Java, catering to diverse development needs.

Dates

Further Resources

Summary

Try It Out: Explore our Local Data Stack Template to enhance your own data platform.

Here is the 2nd recording with maybe refined audio:

Connect With Us: We welcome your feedback and questions! Reach out to us to discuss how these principles and the template can add value to your data infrastructure.

Join us for this live stream and learn how to empower your data platform with Pixi!

Announcement of talk

Aleksandar Milicevic
Authors
Aleksandar Milicevic
Data Engineer
I enjoy doing rust and work on making data platforms faster and more reliable.
Georg Heiler
Authors
senior data expert
My research interests include large geo-spatial time and network data analytics.