Skip to content
Snippets Groups Projects
Commit a62fc43f authored by asd@asd.fi's avatar asd@asd.fi
Browse files
parents 7cce986c 1e5a49a3
No related branches found
No related tags found
No related merge requests found
# CommonVoice-TH Recipe
A commonvoice-th recipe for training ASR engine using Kaldi. The following recipe follows `commonvoice` recipe with slight modification
# CommonVoice-FI Recipe
Fork from [commonvoice-th](https://github.com/vistec-AI/commonvoice-th)
A commonvoice-fi recipe for training ASR engine using Kaldi. The following recipe follows `commonvoice` and `commonvoice-th` recipe with slight modification.
## Installation
The author use docker to run the container. **GPU is required** to train `tdnn_chain`, else the script can train only up to `tri3b`.
### Building Docker
```bash
$ docker build -t <docker-name> .
```
### Run docker and attach command line
Use docker to run the container. **GPU is required** to train `tdnn_chain`, else the script can train only up to `tri3b`.
## Downloading SRILM
Before building docker, SRILM file need to be downloaded. You can download it from [here](http://www.speech.sri.com/projects/srilm/download.html). Once the file is downloaded, remove version name (e.g. from `srilm-1.7.3.tar.gz` to `srilm.tar.gz` and place it inside `docker` directory. Your `docker` directory should contains 2 files: `dockerfile`, and `srilm.tar.gz`.
## Run docker and attach command line
Since gpu is required you are going to need the kaldi-gpu-image.
```bash
$ docker run -it -v <path-to-repo>:/opt/kaldi/egs/commonvoice-th -v <path-to-labels>:/mnt/labels -v <path-to-cv-corpus>:/mnt --gpus all --name <container-name> <built-docker-name> bash
$ docker run -it --runtime=nvidia -v <path-to-repo>:/opt/kaldi/egs/commonvoice-th -v <path-to-labels>:/mnt/labels -v <path-to-cv-corpus>:/mnt --gpus all --name <container-name> kaldiasr/kaldi:gpu-latest bash
```
Once you finish this step, you should be in a docker container bash shell now
## Usage
To run the training pipeline, go to recipe directory and run `run.sh` script
```bash
$ cd /opt/kaldi/egs/commonvoice-th
$ cd /opt/kaldi/egs/commonvoice-fi
$ ./run.sh --stage 0
```
## Experiment Results
Here are some experiment results evaluated on dev set:
|Model|dev WER|
|:----|:----:|
|mono|-%|
|tri1|-%|
|tri2a|-%|
|tri2b|-%|
|tri3b|-%|
|tdnn-chain|-%|
Here is final `test` set result evaluated on `tdnn-chain`
|Model|dev WER|test WER|
|:----|:------|:------:|
|tdnn-chain|-%|-%|
## Author
Chompakorn Chaksangchaichot
Building the model takes roughly 4 hours with the voice dataset from [Mozilla Common Voice](https://commonvoice.mozilla.org/fi/datasets).
Since the dataset is only 14 hours long, it does not contain enough words for the dictionary to be used for actual voice recognition.
## Constructing a working VOSK-model
Vosk is a higher level library that uses Kaldi internally for voice recognition. It requires certain type of Kaldi-model in order for it to work.
There is a list in [VOSKs own website](https://alphacephei.com/vosk/models#training-your-own-model) about what the model folder should contain.
Find these files produced by the scripts and put them in right folders to create a working model. NOTE: take the files from `tdnn_chain` directories. Using files from `tri4b` or models created by earlier stages won't work.
## Prebuilt models
You can download prebuilt models from google drive built with this recipe from [here](https://drive.google.com/drive/folders/1orMXB84d9EXpHrNaI5wlynkvXCOCzwvJ?usp=sharing).
FROM kaldiasr/kaldi:gpu-latest
# make sox compat with mp3 as commonvoice is in mp3 format
apt update
apt install -y libsox-fmt-mp3
# install SRILM
# install python3.8
# install python dependencies
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment