Past Webinar
What Matters in Document Parsing—and How to Measure It

Why our current OCR-era benchmarks fall short and how SCORE helps you evaluate what really counts.

Oct 8, 2025

Speakers

Daniel Schofield
Daniel Schofield
Principal Solutions Architect
Antonio Jimeno Yepes
Antonio Jimeno Yepes
Principal Data Scientist, Unstructured

Recorded

Wednesday, Oct 8, 2025
30 mins on Zoom Events

Overview:

Most benchmarks for document parsing were designed for the OCR era. They reward clean text extraction but overlook the subtle errors that break downstream RAG, search, and agentic workflows, e.g. hallucinations, broken structure, and missing context. The result? Generative models and parsing tools that look strong on paper but fail in production.

This webinar introduces SCORE (Structural and COntent Robust Evaluation), a new multi-dimensional evaluation framework published recently to Arxiv by Unstructured's R&D team. SCORE was built to measure what truly matters: fidelity, structure, and retrieval performance. You'll see how SCORE surfaces meaningful differences between models, helping you choose the right parser for real-world applications.

Technical Details:

In this session, we walked through:

  • Why traditional benchmarks mislead practitioners working with modern parsers
  • How SCORE evaluates document parsing strategies across fidelity, structure, and retrieval utility
  • Why SCORE allows for better document parsing strategy selection, even when considering state of the art models
  • How to think about document parsing in production