Cost Analysis of Human-corrected Transcription for Predominately Oral Languages

Cost Analysis of Human-corrected Transcription for Predominately Oral Languages

Authors: Yacouba Diarra, Nouhoum Coulibaly, and Michael Leventhal
Affiliation: RobotsMali AI4D Lab — robotsmali.org
Published: October 2025

Abstract

Creating speech datasets for low-resource languages is a critical yet poorly understood challenge, especially regarding the human cost of producing high-quality annotated data.
This study focuses on Bambara, a Manding language of Mali, as an example of a Predominately Oral Language (POL) — a language where oral communication is far more common than written expression.

Through a one-month field study involving ten native transcribers, the researchers analyzed the time and complexity required to correct ASR-generated transcriptions of 53 hours of Bambara voice data.

  • It takes 30 hours of human labor to accurately transcribe one hour of speech data under laboratory conditions.
  • Under typical field conditions, that number increases to 36 hours.
FrançaisfrFrançaisFrançais