Show HN: Zerox – document OCR with GPT-mini
16 by themanmaran | 6 comments on Hacker News.
This started out as a weekend hack with gpt-4-mini, using the very basic strategy of "just ask the ai to ocr the document". But this turned out to be better performing than our current implementation of Unstructured/Textract. At pretty much the same cost. I've tested almost every variant of document OCR over the past year, especially trying things like table / chart extraction. I've found the rules based extraction has always been lacking. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. Using a vision model just make sense! In general, I'd categorize this solution as slow, expensive, and non deterministic. But 6 months ago it was impossible. And 6 months from now it'll be fast, cheap, and probably more reliable!
Post Top Ad
Responsive Ads Here
Tuesday, July 23, 2024
New top story on Hacker News: Show HN: Zerox – document OCR with GPT-mini
Subscribe to:
Post Comments (Atom)
Post Bottom Ad
Responsive Ads Here
Author Details
India Hindi News App Brings You The Latest News And Videos From The Hindi Top Breaking News Studios In India. Stay Tuned To The Latest News Stories From India And The World. Access Videos And Photos On Your Device With The Hindi Top Breaking News India News App
No comments:
Post a Comment