The Document Index / PDF Extraction / #114

CatchTheTornado/text-extract-api

by CatchTheTornado · PDF Extraction · updated 6mo ago

Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown

momentum

3,104

stars

276

forks

#114

rank

anonymizationapiextractjsonllmocrocr-pythonpdfpii

View on GitHub →

CatchTheTornado/text-extract-api

More in PDF Extraction