<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>The Document Index</title>
    <link>https://document.kymatalabs.com</link>
    <description>The living index of document-AI tooling — OCR, PDF extraction, document parsing, layout, tables and VLM understanding.</description>
    <item><title>PaddlePaddle/PaddleOCR — momentum 87</title><link>https://document.kymatalabs.com/p/paddlepaddle-paddleocr/</link><guid isPermaLink="false">PaddlePaddle/PaddleOCR</guid><description>Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.</description></item>
    <item><title>opendatalab/MinerU — momentum 86</title><link>https://document.kymatalabs.com/p/opendatalab-mineru/</link><guid isPermaLink="false">opendatalab/MinerU</guid><description>Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.</description></item>
    <item><title>docling-project/docling — momentum 86</title><link>https://document.kymatalabs.com/p/docling-project-docling/</link><guid isPermaLink="false">docling-project/docling</guid><description>Get your documents ready for gen AI</description></item>
    <item><title>tesseract-ocr/tesseract — momentum 85</title><link>https://document.kymatalabs.com/p/tesseract-ocr-tesseract/</link><guid isPermaLink="false">tesseract-ocr/tesseract</guid><description>Tesseract Open Source OCR Engine (main repository)</description></item>
    <item><title>ocrmypdf/OCRmyPDF — momentum 83</title><link>https://document.kymatalabs.com/p/ocrmypdf-ocrmypdf/</link><guid isPermaLink="false">ocrmypdf/OCRmyPDF</guid><description>OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched</description></item>
    <item><title>datalab-to/marker — momentum 82</title><link>https://document.kymatalabs.com/p/datalab-to-marker/</link><guid isPermaLink="false">datalab-to/marker</guid><description>Convert PDF to markdown + JSON quickly with high accuracy</description></item>
    <item><title>opendataloader-project/opendataloader-pdf — momentum 81</title><link>https://document.kymatalabs.com/p/opendataloader-project-opendataloader-pdf/</link><guid isPermaLink="false">opendataloader-project/opendataloader-pdf</guid><description>PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.</description></item>
    <item><title>datalab-to/surya — momentum 80</title><link>https://document.kymatalabs.com/p/datalab-to-surya/</link><guid isPermaLink="false">datalab-to/surya</guid><description>OCR, layout analysis, reading order, table recognition in 90+ languages</description></item>
    <item><title>naptha/tesseract.js — momentum 78</title><link>https://document.kymatalabs.com/p/naptha-tesseract-js/</link><guid isPermaLink="false">naptha/tesseract.js</guid><description>Pure Javascript OCR for more than 100 Languages 📖🎉🖥</description></item>
    <item><title>Unstructured-IO/unstructured — momentum 78</title><link>https://document.kymatalabs.com/p/unstructured-io-unstructured/</link><guid isPermaLink="false">Unstructured-IO/unstructured</guid><description>Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models.  Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning</description></item>
    <item><title>pymupdf/PyMuPDF — momentum 77</title><link>https://document.kymatalabs.com/p/pymupdf-pymupdf/</link><guid isPermaLink="false">pymupdf/PyMuPDF</guid><description>PyMuPDF is a high performance Python library for data extraction, analysis, conversion &amp; manipulation of PDF (and other) documents.</description></item>
    <item><title>run-llama/liteparse — momentum 77</title><link>https://document.kymatalabs.com/p/run-llama-liteparse/</link><guid isPermaLink="false">run-llama/liteparse</guid><description>A fast, helpful, and open-source document parser</description></item>
    <item><title>RapidAI/RapidOCR — momentum 75</title><link>https://document.kymatalabs.com/p/rapidai-rapidocr/</link><guid isPermaLink="false">RapidAI/RapidOCR</guid><description>📄 Awesome OCR multiple programing languages toolkits based on ONNX Runtime, OpenVINO, MNN, PaddlePaddle, TensorRT and PyTorch.</description></item>
    <item><title>Zipstack/unstract — momentum 75</title><link>https://document.kymatalabs.com/p/zipstack-unstract/</link><guid isPermaLink="false">Zipstack/unstract</guid><description>LLM-Driven Extraction of Unstructured Data — Built for API Deployments &amp; ETL Pipeline Workflows</description></item>
    <item><title>PaddlePaddle/PaddleX — momentum 74</title><link>https://document.kymatalabs.com/p/paddlepaddle-paddlex/</link><guid isPermaLink="false">PaddlePaddle/PaddleX</guid><description>All-in-One Development Tool based on PaddlePaddle</description></item>
    <item><title>mindee/doctr — momentum 74</title><link>https://document.kymatalabs.com/p/mindee-doctr/</link><guid isPermaLink="false">mindee/doctr</guid><description>docTR (Document Text Recognition) - a seamless, high-performing &amp; accessible library for OCR-related tasks powered by Deep Learning.</description></item>
    <item><title>DayBreak-u/chineseocr_lite — momentum 73</title><link>https://document.kymatalabs.com/p/daybreak-u-chineseocr-lite/</link><guid isPermaLink="false">DayBreak-u/chineseocr_lite</guid><description>超轻量级中文ocr，支持竖排文字识别, 支持ncnn、mnn、tnn推理 ( dbnet(1.8M) + crnn(2.5M) + anglenet(378KB)) 总模型仅4.7M</description></item>
    <item><title>deepdoctection/deepdoctection — momentum 71</title><link>https://document.kymatalabs.com/p/deepdoctection-deepdoctection/</link><guid isPermaLink="false">deepdoctection/deepdoctection</guid><description>A Repo For Document AI</description></item>
    <item><title>shipfastlabs/parsel — momentum 71</title><link>https://document.kymatalabs.com/p/shipfastlabs-parsel/</link><guid isPermaLink="false">shipfastlabs/parsel</guid><description>A fast, helpful, and open-source document parser for PHP</description></item>
    <item><title>UglyToad/PdfPig — momentum 70</title><link>https://document.kymatalabs.com/p/uglytoad-pdfpig/</link><guid isPermaLink="false">UglyToad/PdfPig</guid><description>Read and extract text and other content from PDFs in C# (port of PDFBox)</description></item>
    <item><title>datalab-to/chandra — momentum 68</title><link>https://document.kymatalabs.com/p/datalab-to-chandra/</link><guid isPermaLink="false">datalab-to/chandra</guid><description>OCR model that handles complex tables, forms, handwriting with full layout.</description></item>
    <item><title>Yuliang-Liu/MonkeyOCR — momentum 68</title><link>https://document.kymatalabs.com/p/yuliang-liu-monkeyocr/</link><guid isPermaLink="false">Yuliang-Liu/MonkeyOCR</guid><description>A lightweight LMM-based Document Parsing Model</description></item>
    <item><title>run-llama/llama_cloud_services — momentum 68</title><link>https://document.kymatalabs.com/p/run-llama-llama-cloud-services/</link><guid isPermaLink="false">run-llama/llama_cloud_services</guid><description>Knowledge Agents and Management in the Cloud</description></item>
    <item><title>jingsongliujing/OnnxOCR — momentum 68</title><link>https://document.kymatalabs.com/p/jingsongliujing-onnxocr/</link><guid isPermaLink="false">jingsongliujing/OnnxOCR</guid><description>基于PaddleOCR重构，并且脱离PaddlePaddle深度学习训练框架的轻量级OCR，推理速度超快   ——  A lightweight OCR system based on PaddleOCR, decoupled from the PaddlePaddle deep learning training framework, with ultra-fast inference speed.</description></item>
    <item><title>shcherbak-ai/contextgem — momentum 67</title><link>https://document.kymatalabs.com/p/shcherbak-ai-contextgem/</link><guid isPermaLink="false">shcherbak-ai/contextgem</guid><description>ContextGem: Effortless LLM extraction from documents</description></item>
    <item><title>kotaro-kinoshita/yomitoku — momentum 67</title><link>https://document.kymatalabs.com/p/kotaro-kinoshita-yomitoku/</link><guid isPermaLink="false">kotaro-kinoshita/yomitoku</guid><description>YomiTokuはAIを活用した日本語文書解析エンジンを提供するPythonパッケージです。 Yomitoku is an AI-powered document image analysis package designed specifically for the Japanese language.</description></item>
    <item><title>zai-org/GLM-OCR — momentum 66</title><link>https://document.kymatalabs.com/p/zai-org-glm-ocr/</link><guid isPermaLink="false">zai-org/GLM-OCR</guid><description>GLM-OCR: Accurate ×  Fast × Comprehensive</description></item>
    <item><title>firecrawl/pdf-inspector — momentum 66</title><link>https://document.kymatalabs.com/p/firecrawl-pdf-inspector/</link><guid isPermaLink="false">firecrawl/pdf-inspector</guid><description>Fast Rust library for PDF inspection, classification, and text extraction. Intelligently detects scanned vs text-based PDFs to enable smart routing decisions.</description></item>
    <item><title>unjs/unpdf — momentum 66</title><link>https://document.kymatalabs.com/p/unjs-unpdf/</link><guid isPermaLink="false">unjs/unpdf</guid><description>📄 PDF extraction and rendering across all JavaScript runtimes</description></item>
    <item><title>YaoFANGUK/video-subtitle-extractor — momentum 65</title><link>https://document.kymatalabs.com/p/yaofanguk-video-subtitle-extractor/</link><guid isPermaLink="false">YaoFANGUK/video-subtitle-extractor</guid><description>视频硬字幕提取，生成srt文件。无需申请第三方API，本地实现文本识别。基于深度学习的视频字幕提取框架，包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.</description></item>
  </channel>
</rss>
