Cost Analysis of Human-corrected Transcript for Predominacy Oral Languages

Cost Analysis of Human-corrected Transcript for Predominacy Oral Languages

Authors: Yacouba Diarra, Nouhoum Coulibaly, and Michael Leventhal
Affiliate: RobotsMali AI4D Lab — robotsmali.org
Published: October 2025

Abstract

Creating speech datasets for low-resource languages is a critical yet somewhat understood challenge, especially looking at the human cost of producing high-quality annotated data.
This study focus on Bambara, a Manding language of Mali, as an example of a Predominate Oral Language (POL) — a language where oral communication is far more common than written expression.

Through a one-month field study involving ten native translators, the researchers analyzed the time and complexity required to correct ASR-generated transcriptions of 53 hours of Bambara voice data.

  • It takes 30 hours of human labor to readily transcribe one hour of speech data under laboratory conditions.
  • Under typical field conditions, that number increases to 36 hours.
EnglishenEnglishEnglish