Cost Analysis of Human-corrected Transcription for Predominately Oral Languages
Authors: Yacouba Diarra, Nouhoum Coulibaly, and Michael Leventhal Affiliation: RobotsMali AI4D Lab — robotsmali.org Published: October 2025
Abstract
Creating speech datasets for low-resource languages is a critical yet poorly understood challenge, especially regarding the human cost of producing high-quality annotated data. This study focuses on Bambara, a Manding language of Mali, as an example of a Predominately Oral Language (POL) — a language where oral communication is far more common than written expression.
Through a one-month field study involving ten native transcribers, the researchers analyzed the time and complexity required to correct ASR-generated transcriptions of 53 hours of Bambara voice data.
It takes 30 hours of human labor to accurately transcribe one hour of speech data under laboratory conditions.
Under typical field conditions, that number increases to 36 hours.