Call for Participation - HIPE-OCRepair 2026 - ICDAR Competition on LLM-Assisted OCR Post-Correction

8 views
Skip to first unread message

Maud Ehrmann

unread,
2:14 AM (10 hours ago) 2:14 AM
to ai4lam
(apologies for cross-postings)
====
HIPE-OCRepair 2026 - Historical OCR Post-Correction Shared Task 
Task: LLM-Assisted OCR Post-Correction for Multilingual Historical Documents
VenueICDAR 2026 (31 Aug - 4th Sep 2026)
====
====

We invite participation in HIPE-OCRepair 2026, the ICDAR 2026 Competition on LLM-Assisted OCR Post-Correction for Historical Documents.

Large-scale digitized historical collections still contain substantial OCR errors. Re-processing millions of pages with improved engines is rarely feasible, making post-correction the most viable strategy for addressing the OCR debt accumulated in digital heritage collections. Recent progress in large language models opens promising new directions, but their effectiveness varies across languages and error types, and they may introduce hallucinations.

To what extent can modern large language models address the OCR debt accumulated in large-scale digitized historical collections?

HIPE-OCRepair 2026 addresses this question through HIPE-OCRepair-Bench, a unified multilingual benchmark comprising curated datasets, a standardised evaluation protocol, baseline systems, and an open leaderboard.

Task

Participants correct noisy OCR transcripts of historical documents without access to the original images. For each text chunk (typically a paragraph or article), the dataset provides:

  • one OCR hypothesis
  • document metadata (language, date, publication title)
  • OCR quality indicators (CER, WER, lexicon-based quality score)

Systems must produce improved corrected text. Both generative (LLM-based) and discriminative or hybrid approaches are welcome.

Data

The benchmark consists of parallel OCR and ground truth data drawn from multiple curated historical collections, covering English, French, and German materials from the 17th to the 20th century, including newspapers and printed works. It consolidates existing resources alongside newly curated materials.

Important dates 
  • 10 Dec 2025: Sample data release
  • 02 Mar 2026: Training and development data release; scorer
  • 23 Mar 2026: Hugging Face leader board release
  • 06-08 Apr 2026: Evaluation phase (test release and submission)
  • 10 Apr 2026: Results publication
  • 31 Aug-4 Sep 2026: Presentation at ICDAR 2026

HIPE-OCRepair addresses a central challenge for the document analysis, NLP, and digital humanities communities: improving the usability of large historical text collections at scale. It offers a reproducible evaluation framework, openly available data and tools, and a leaderboard for benchmarking beyond the competition itself.

We look forward to your participation!

Best regards,
HIPE-OCRepair 2026 Organizers

Reply all
Reply to author
Forward
0 new messages