New top story on Hacker News: Show HN: Zerox – document OCR with GPT-mini - Hindi Top Breaking News - Hindi News, Latest News in Hindi, Breaking News

Hindi Top Breaking News - Hindi News, Latest News in Hindi, Breaking News

India Hindi News app brings you the latest news and videos from the Hindi Top Breaking News studios in India. Stay tuned to the latest news stories from India and the world. Access videos and photos on your device with the Hindi Top Breaking News India News app.

Breaking

Home Top Ad

Post Top Ad

Responsive Ads Here

Tuesday, July 23, 2024

New top story on Hacker News: Show HN: Zerox – document OCR with GPT-mini

Show HN: Zerox – document OCR with GPT-mini
16 by themanmaran | 6 comments on Hacker News.
This started out as a weekend hack with gpt-4-mini, using the very basic strategy of "just ask the ai to ocr the document". But this turned out to be better performing than our current implementation of Unstructured/Textract. At pretty much the same cost. I've tested almost every variant of document OCR over the past year, especially trying things like table / chart extraction. I've found the rules based extraction has always been lacking. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. Using a vision model just make sense! In general, I'd categorize this solution as slow, expensive, and non deterministic. But 6 months ago it was impossible. And 6 months from now it'll be fast, cheap, and probably more reliable!

No comments:

Post a Comment

Post Bottom Ad

Responsive Ads Here

Pages