https://github.com/adrn-mm/LLM-OCR

This project introduces a paradigm shift in document processing by combining traditional extraction with the cognitive reasoning of Large Language Models (LLMs). By treating document extraction not just as a vision problem, but as a multimodal language problem, this application can “read” a document much like a human does. It understands the spatial layout, infers the relationship between key-value pairs, and intelligently corrects extraction errors based on the surrounding context.