PaddleOCR is an OCR toolkit and document AI engine for converting PDFs and images into structured data. It is aimed at extracting text, tables, formulas, charts, and layout from documents so the output can be used by LLM and RAG workflows.
- Converts PDFs and images into Markdown or JSON
- Document parsing for text, tables, formulas, and charts
- PP-OCRv6 single model supports 50 languages
- Browser inference with PaddleOCR.js










