Name: Docling + Metaxy: Patent Intelligence at Scale
Start: 2026-04-26T16:00:00Z
Location: Online (Zoom, Linux Foundation)

Docling + Metaxy: Patent Intelligence at Scale

Apr 26, 2026·

Dr. Georg Heiler

· 2 min read

Slides Video Link

Abstract

A walkthrough at the Docling Community Office Hours of how Jubust combines Docling for document parsing with Metaxy as an incremental, field-level metadata control plane to build structured patent intelligence at scale. We cover why patent corpora break naive batch pipelines (long, multilingual, figure- and formula-heavy documents that change over time), how Docling’s structured output plugs into Metaxy’s per-record, per-field lineage, and how the resulting diff drives selective recompute on Ray and Dagster — so that GPU and token spend stays scoped to what truly changed downstream. We also show the reviewer workspace where expert corrections close an active-learning loop — feeding back into prompts, models, and evaluation.

Date

Apr 26, 2026 4:00 PM

Location

Online (Zoom, Linux Foundation)

Recording

Abstract

Patent corpora are an unforgiving stress test for document AI: documents are long, multilingual, dense with figures, tables, formulas and claim hierarchies, and the corpus keeps changing as new filings, translations, and corrections arrive. A naive batch pipeline either reprocesses everything (expensive on GPU and tokens) or silently goes stale.

This Docling Community Office Hours session shows how Jubust builds structured patent intelligence by combining:

Docling for parsing — turning heterogeneous patent PDFs into a structured, reproducible representation,
Metaxy as an incremental metadata control plane — tracking lineage at the level of individual fields per record, computing a precise diff when prompts, models, or upstream documents change, and handing that diff to whichever orchestrator or compute engine you already run,
Ray and Dagster for execution at scale,
a reviewer workspace that turns expert corrections into signal for prompts, models, and evaluation — closing an active-learning loop where uncertain or high-value samples are routed to humans first, and their corrections feed back into prompts, fine-tunes, and evaluation sets.

Two takeaways on what this stack actually buys you:

Topological caching of expensive AI work. Field-level lineage and a precise diff mean GPU and token spend is scoped to what truly changed downstream — incremental recompute at sample-and-field granularity instead of asset granularity.
Efficient metadata access over multimodal data enables intelligent routing. Once per-sample, per-field metadata is queryable, the platform can route work conditionally — pick a parser configuration by document type, pick a vision model where it is needed, and keep the rest on cheap defaults.
Provenance is non-negotiable in a world of AI slop and hallucinations. Every extracted field must be traceable back to the exact source document, page, region, code/model version that produced it — so reviewers, downstream consumers, and auditors can verify claims instead of trusting a chat-style summary.

Links

Deck: jubust.com/decks/ibm-docling
Recording: youtube.com/watch?v=7eyoqVMoguY
Community office hours event: LinkedIn · Zoom (Linux Foundation)
Docling: github.com/docling-project/docling
Metaxy: docs.metaxy.io · github.com/anam-org/metaxy
Jubust: jubust.com

Last updated on May 8, 2026

Docling Metaxy Patents Multimodal Ai Dagster Ray Incremental

Authors

Dr. Georg Heiler

senior data expert

Georg is a co-founder @Jubust and a Senior data expert at Magenta as well as a ML-ops engineer at ASCII. He is solving challenges with data. His interests include geospatial graphs and time series. Georg transitions the data platform of Magenta to the cloud and is handling large scale multi-modal ML-ops challenges at ASCII.

← Perfecting the Art of Doing Nothing: Incremental Multimodal AI Pipelines with Metaxy May 7, 2026

Scaling Data Pipelines @Magenta Telekom Nov 4, 2025 →

No results found

Docling + Metaxy: Patent Intelligence at Scale

Recording

Abstract

Links