<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Patents |</title><link>https://georgheiler.com/tags/patents/</link><atom:link href="https://georgheiler.com/tags/patents/index.xml" rel="self" type="application/rss+xml"/><description>Patents</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Sun, 26 Apr 2026 16:00:00 +0000</lastBuildDate><image><url>https://georgheiler.com/media/icon_hu_2b4e40339850646.png</url><title>Patents</title><link>https://georgheiler.com/tags/patents/</link></image><item><title>Docling + Metaxy: Patent Intelligence at Scale</title><link>https://georgheiler.com/event/docling--metaxy-patent-intelligence-at-scale/</link><pubDate>Sun, 26 Apr 2026 16:00:00 +0000</pubDate><guid>https://georgheiler.com/event/docling--metaxy-patent-intelligence-at-scale/</guid><description>&lt;h2 id="recording"&gt;Recording&lt;/h2&gt;
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;"&gt;
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/7eyoqVMoguY?autoplay=0&amp;amp;controls=1&amp;amp;end=0&amp;amp;loop=0&amp;amp;mute=0&amp;amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"&gt;&lt;/iframe&gt;
&lt;/div&gt;
&lt;h2 id="abstract"&gt;Abstract&lt;/h2&gt;
&lt;p&gt;Patent corpora are an unforgiving stress test for document AI: documents are long, multilingual, dense with figures, tables, formulas and claim hierarchies, and the corpus keeps changing as new filings, translations, and corrections arrive.
A naive batch pipeline either reprocesses everything (expensive on GPU and tokens) or silently goes stale.&lt;/p&gt;
&lt;p&gt;This Docling Community Office Hours session shows how
builds structured patent intelligence by combining:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;
&lt;/strong&gt; for parsing — turning heterogeneous patent PDFs into a structured, reproducible representation,&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;
&lt;/strong&gt; as an incremental metadata control plane — tracking lineage at the level of individual fields per record, computing a precise diff when prompts, models, or upstream documents change, and handing that diff to whichever orchestrator or compute engine you already run,&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ray&lt;/strong&gt; and &lt;strong&gt;Dagster&lt;/strong&gt; for execution at scale,&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;reviewer workspace&lt;/strong&gt; that turns expert corrections into signal for prompts, models, and evaluation — closing an &lt;strong&gt;active-learning&lt;/strong&gt; loop where uncertain or high-value samples are routed to humans first, and their corrections feed back into prompts, fine-tunes, and evaluation sets.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Two takeaways on what this stack actually buys you:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Topological caching of expensive AI work.&lt;/strong&gt; Field-level lineage and a precise diff mean GPU and token spend is scoped to what truly changed downstream — incremental recompute at sample-and-field granularity instead of asset granularity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Efficient metadata access over multimodal data enables intelligent routing.&lt;/strong&gt; Once per-sample, per-field metadata is queryable, the platform can route work conditionally — pick a parser configuration by document type, pick a vision model where it is needed, and keep the rest on cheap defaults.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Provenance is non-negotiable in a world of AI slop and hallucinations.&lt;/strong&gt; Every extracted field must be traceable back to the exact source document, page, region, code/model version that produced it — so reviewers, downstream consumers, and auditors can verify claims instead of trusting a chat-style summary.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="links"&gt;Links&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Deck:
&lt;/li&gt;
&lt;li&gt;Recording:
&lt;/li&gt;
&lt;li&gt;Community office hours event:
·
&lt;/li&gt;
&lt;li&gt;Docling:
&lt;/li&gt;
&lt;li&gt;Metaxy:
·
&lt;/li&gt;
&lt;li&gt;Jubust:
&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>