FADIYAH, NAHDAH ABRAR and Hendy, Santosa and Ika, Novia Anggraini (2024) DEEPSPEECH ARCHITECTURE BASED NON-NATIVE ENGLISH SPEAKER ASR USING FINE-TUNING METHOD. Undergraduated thesis, Universitas Bengkulu.
Archive (Thesis)
SKRIPSI_FADIYAH NAHDAH ABRAR - nada yay.pdf - Bibliography Restricted to Repository staff only Available under License Creative Commons GNU GPL (Software). Download (2MB) |
Abstract
This research focuses on optimizing automatic speech recognition (ASR) for non- native English speakers, specifically Mandarin L1 speakers. The aim is to identify the best dataset, hyper-parameters, and evaluation methods. The study uses secondary datasets and evaluates using Word Error Rate (WER), Character Error Rate (CER), and learning curves. The research applies fine-tuning techniques and training from scratch to improve ASR performance. The fine-tuned model achieved a WER of 0.45, CER of 0.19, and a loss of 35.29, outperforming the model trained from scratch. The study found that a batch size of 8 for training and 2 for testing and validation resulted in the most efficient resource usage. The fine�tuned model performed best over 150 epochs, showing stable learning and minimized loss.The results showed that fine-tuning outperformed training from scratch, with lower WER and loss. The optimal GPU usage was achieved with specific batch sizes, and the best models achieved a 0% WER/CER, ensuring accurate speech transcription. Keyword: ASR, Non-native Speaker, Transfer Learning
Item Type: | Thesis (Undergraduated) |
---|---|
Subjects: | T Technology > T Technology (General) |
Divisions: | Faculty of Engineering > Department of Electrical Engineering |
Depositing User: | Lili Haryanti, S.IPust |
Date Deposited: | 03 Oct 2024 04:01 |
Last Modified: | 03 Oct 2024 04:01 |
URI: | http://repository.unib.ac.id/id/eprint/21890 |
Actions (login required)
View Item |