Incremental |

Perfecting the Art of Doing Nothing: Incremental Multimodal AI Pipelines with Metaxy

Thu, 07 May 2026 18:00:00 +0000

Abstract

AI pipelines are now expensive enough that recomputing more than necessary is the dominant cost. Tokens and GPU hours change the economics, and agentic workflows branch and converge in ways the traditional single-state data platform was never designed for.

This talk introduces , an open-source metadata control plane that:

tracks lineage at the level of individual fields per record (not per dataset, not per asset),
computes a precise diff when something changes — rows to add or recompute, rows to retire,
hands that diff to whichever orchestrator or compute engine you already run.

We walk through field-versioning, field-level dependencies, and selective recompute, then look at two production applications:

— Cara 3 training-data workflows (face detection and cropping, audio extraction, transcription, embedding generation) over millions of multimodal samples since December 2025.
— structured patent intelligence built on Docling for parsing, Metaxy as the incremental control plane, Ray and Dagster for execution, and a reviewer workspace that turns expert corrections into signal for prompts, models, and evaluation.

The second half of the talk zooms out to platform anatomy — building blocks vs. domain products, quality that lives in the graph, executable specifications as the seam between platform and domain teams, and compute flexibility from laptop to HPC cluster.

Two takeaways on what Metaxy actually buys you:

Topological caching of expensive AI work. Field-level lineage and a precise diff mean GPU and token spend is scoped to what truly changed downstream — the usual incremental-recompute story, but at sample-and-field granularity instead of asset granularity.
Efficient metadata access over multimodal data enables intelligent routing. Once per-sample, per-field metadata is queryable, the platform can route work conditionally — pick a transcription model by detected language, pick a vision model by document type, send only the slices that need a heavy VLM through the expensive path, and keep the rest on cheap defaults.

Docling + Metaxy: Patent Intelligence at Scale

Sun, 26 Apr 2026 16:00:00 +0000

Recording

Abstract

Patent corpora are an unforgiving stress test for document AI: documents are long, multilingual, dense with figures, tables, formulas and claim hierarchies, and the corpus keeps changing as new filings, translations, and corrections arrive. A naive batch pipeline either reprocesses everything (expensive on GPU and tokens) or silently goes stale.

This Docling Community Office Hours session shows how builds structured patent intelligence by combining:

for parsing — turning heterogeneous patent PDFs into a structured, reproducible representation,
as an incremental metadata control plane — tracking lineage at the level of individual fields per record, computing a precise diff when prompts, models, or upstream documents change, and handing that diff to whichever orchestrator or compute engine you already run,
Ray and Dagster for execution at scale,
a reviewer workspace that turns expert corrections into signal for prompts, models, and evaluation — closing an active-learning loop where uncertain or high-value samples are routed to humans first, and their corrections feed back into prompts, fine-tunes, and evaluation sets.

Two takeaways on what this stack actually buys you:

Topological caching of expensive AI work. Field-level lineage and a precise diff mean GPU and token spend is scoped to what truly changed downstream — incremental recompute at sample-and-field granularity instead of asset granularity.
Efficient metadata access over multimodal data enables intelligent routing. Once per-sample, per-field metadata is queryable, the platform can route work conditionally — pick a parser configuration by document type, pick a vision model where it is needed, and keep the rest on cheap defaults.
Provenance is non-negotiable in a world of AI slop and hallucinations. Every extracted field must be traceable back to the exact source document, page, region, code/model version that produced it — so reviewers, downstream consumers, and auditors can verify claims instead of trusting a chat-style summary.

Incremental |

Perfecting the Art of Doing Nothing: Incremental Multimodal AI Pipelines with Metaxy

Abstract

Links

Docling + Metaxy: Patent Intelligence at Scale

Recording

Abstract

Links