Name: Pixi powering Telekom data cloud
Start: 2025-01-31T11:30:00Z
Location: Livestream

Pixi powering Telekom data cloud

Jan 31, 2025·

Aleksandar Milicevic

Georg Heiler

· 2 min read

Slides

Abstract

At Magenta Telekom in Austria we build our new data platform around metadata and strong governance and an explicit graph of data dependencies. Pixi is a tool which enables efficient dependency handling for us. In this talk we share our experiences with Pixi and how it empowers our data infrastructure.

Date

Jan 31, 2025 11:30 AM

Event

Prefix.dev community show and tell

Location

Livestream

Pixi is a tool which enables efficient dependency handling. It is created from prefix.dev, built in Rust and very fast. For us at Magenta Telekom in Austria Pixi is beneficial as we build our new data platform around metadata and strong governance and an explicit graph of data dependencies. In this talk we share our experience with Pixi and how it empowers our data infrastructure - in conjunction with Dagster.

Key principles

Explicit Modeling of Data Dependencies as a Graph

Graph-Based Dependencies: By representing data dependencies as a graph, the data orchestrator Dagster manages pipelines efficiently, automatically handling dependencies between transformations - like a calculator when dealing with numbers.
Event-Based Notifications: The dependency graph enables immediate notifications when upstream sources change, allowing downstream consumers to update promptly, reducing wait times.
Comprehensive Integration: Incorporating data ingestion, transformation, BI/reporting, and AI into the dependency graph makes all relationships transparent, fostering cross-tool & cross-departmental collaboration by breaking down silos.
Strong Governance: We adhere to robust governance principles by collecting metadata during ingestion and propagating it throughout the graph, ensuring data integrity and compliance.

Advantages Over the Previous System (Conda)

Optimized Lockfiles: Lockfile resolution occurs only during development, streamlining deployment processes.
Faster Dependency Resolution: Enhanced speed in resolving dependencies accelerates pipeline execution.
Integrated Task Runner: Replaces cumbersome makefiles with a seamless task runner, improving workflow efficiency.
Environment-Specific Dependencies: Supports feature-based environments (development, production, linting), ensuring appropriate configurations across different stages.
Multi-Language Support: Facilitates environments that include both Python and Java, catering to diverse development needs.

Dates

Live stream 2025-01-31 at 11:30
- https://www.youtube.com/live/QM-QTGa4b8U
- for interaction join directly on discord https://discord.gg/MdEYuYhtyd?event=1331572690904944742

eventually a recording will be available here https://www.youtube.com/watch?v=Z0M8h0xeHRM

Further Resources

Enterprise Implementation: Explore the key ideas we implement for enterprises by trying out our Local Data Stack Template, which encompasses all the main elements discussed. Check out this post for more detail.
Upcoming Events: Stay tuned! On April 9th, we will share more details about our architecture at the Vienna Data Engineering Meetup.

Summary

Try It Out: Explore our Local Data Stack Template to enhance your own data platform.

Connect With Us: We welcome your feedback and questions! Reach out to us to discuss how these principles and the template can add value to your data infrastructure.