Cost Analysis of Human-corrected Transcript for Predominacy Oral Languages
Authors: Yacouba Diarra, Nouhoum Coulibaly, and Michael Leventhal Affiliate: RobotsMali AI4D Lab — robotsmali.org Published: October 2025
Abstract
Creating speech datasets for low-resource languages is a critical yet somewhat understood challenge, especially looking at the human cost of producing high-quality annotated data. This study focus on Bambara, a Manding language of Mali, as an example of a Predominate Oral Language (POL) — a language where oral communication is far more common than written expression.
Through a one-month field study involving ten native translators, the researchers analyzed the time and complexity required to correct ASR-generated transcriptions of 53 hours of Bambara voice data.
It takes 30 hours of human labor to readily transcribe one hour of speech data under laboratory conditions.
Under typical field conditions, that number increases to 36 hours.