DKDS: A Benchmark Dataset of Degraded Kuzushiji Documents with Seals for Detection and Binarization

arXiv — cs.CVThursday, November 13, 2025 at 5:00:00 AM
On November 13, 2025, the introduction of the Degraded Kuzushiji Documents with Seals (DKDS) dataset marked a significant advancement in the field of Optical Character Recognition (OCR) for pre-modern Japanese cursive script. Kuzushiji, understood by only a few thousand experts in Japan, presents unique challenges due to document degradation and the presence of seals, which existing OCR methods struggle to handle effectively. The DKDS dataset aims to provide a benchmark for addressing these issues, featuring two defined tracks: text and seal detection, and document binarization. The dataset was constructed with the assistance of a trained Kuzushiji expert, ensuring its relevance and accuracy. Baseline results were established using various YOLO models for detection tasks and traditional algorithms, K-means clustering, and GAN-based methods for binarization. This initiative not only enhances the capabilities of OCR technology but also plays a vital role in preserving and transcribing hi…
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it