<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data-Science |</title><link>https://georgheiler.com/tags/data-science/</link><atom:link href="https://georgheiler.com/tags/data-science/index.xml" rel="self" type="application/rss+xml"/><description>Data-Science</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Fri, 29 May 2026 18:00:00 +0000</lastBuildDate><image><url>https://georgheiler.com/media/icon_hu_2b4e40339850646.png</url><title>Data-Science</title><link>https://georgheiler.com/tags/data-science/</link></image><item><title>VDSG: Optimizing Multimodal AI Pipelines with Metaxy</title><link>https://georgheiler.com/event/vdsg-optimizing-multimodal-ai-pipelines-with-metaxy/</link><pubDate>Fri, 29 May 2026 18:00:00 +0000</pubDate><guid>https://georgheiler.com/event/vdsg-optimizing-multimodal-ai-pipelines-with-metaxy/</guid><description>&lt;h2 id="abstract"&gt;Abstract&lt;/h2&gt;
&lt;p&gt;The AI era has changed the economics of data pipelines. Multimodal workflows often fan out into transcription, image understanding, embeddings, classification, extraction, review, and downstream analytics. Without precise metadata, a small change can invalidate too much of the pipeline and force costly reruns.&lt;/p&gt;
&lt;p&gt;This talk introduces
, an open source Python framework for sample-level metadata versioning and field-level provenance. Metaxy acts as a control layer for incremental data pipelines: it records which fields depend on which upstream fields, computes what became stale, and lets the execution layer process only the affected records.&lt;/p&gt;
&lt;p&gt;We focus on practical examples across startup, enterprise, and research settings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;avoiding wasteful recomputation in multimodal AI workflows,&lt;/li&gt;
&lt;li&gt;using field-level lineage to decide what can be skipped,&lt;/li&gt;
&lt;li&gt;keeping provenance queryable across document, audio, image, and tabular data,&lt;/li&gt;
&lt;li&gt;connecting Metaxy with orchestrators and compute engines such as Dagster, Ray, and Slurm.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The core idea is simple: if an audio file changes, recompute transcription. Do not rerun face recognition if it only depends on the video stream.&lt;/p&gt;
&lt;h2 id="links"&gt;Links&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Deck:
&lt;/li&gt;
&lt;li&gt;Event:
&lt;/li&gt;
&lt;li&gt;Metaxy docs:
&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>