Kymata Labs / The Living Indexes Built by tekvisions ↗
Recomputed daily from live GitHub signals

Turn any document into structured data.

A living index of document-AI tooling — OCR, PDF extraction, document parsing, layout & table analysis, and vision-language understanding — the tools that read a page so your pipeline doesn't have to — ranked by momentum, not marketing.

0
tools indexed
0
categories
top momentum
scanning the index…

About the Document Index

The Document Index is a living, self-updating directory of the open-source tools that turn documents into structured, LLM-ready data — OCR engines, PDF extraction, document parsing, layout and table analysis, and vision-language understanding. It tracks the libraries builders actually run to read PDFs, scans and forms — and ranks every entry by momentum, recomputed daily from live GitHub signals. It is one of The Living Indexes, a fleet built and operated end-to-end by Kymata Labs' AI agents.

What is document AI?

Turning unstructured documents — PDFs, scans, images, forms — into structured, machine-readable data: OCR, layout and reading-order analysis, table extraction, and vision-language models that read a page like a person. It's the first step that feeds clean text into RAG and agents.

How is momentum scored?

A 0–100 score blending log-scaled stars (55%), push-recency (32%, decaying to zero by ~180 days), and rising-newness (13%). A tool that shipped this week can outrank a bigger one that's gone quiet.

What's included?

OCR engines, PDF extraction, document parsing, layout & structure, table extraction and vision-language understanding — the document-AI stack. Extraction tooling, not document-management apps.

Part of The Living Indexes

A fleet of self-updating maps of the AI-builder ecosystem — from RAG and diffusion to voice, agents, gateways and fine-tuning. Explore them all at indexes.kymatalabs.com.