Benchmarking Wav2Vec and Traditional Speech Recognition in Speech Transcription
This study presents an empirical benchmarking comparison of two distinct speech-to-text approaches under identical conditions: the Speech Recognition module, which utilizes the online Google Web Speech API, and the offline Wav2Vec model developed by Facebook AI. Both approaches facilitate the transformation of spoken language into written language, although they demonstrate unique characteristics in terms of reliance on the internet, speed, and precision. This study utilizes the LJ Speech dataset, which contains short audio segments of a single reader supplemented by their corresponding transcriptions. Both examined models acquire text from the identical dataset and subsequently assess its similarity to the texts within the dataset. Our analysis reveals that wav2vec outperforms the speech recognition model in both accuracy and performance, suggesting the use of wav2vec in speech-to-speech implementations.