Skip to content
Snippets Groups Projects
Commit b1048999 authored by Aleksi Papalitsas's avatar Aleksi Papalitsas
Browse files

Update README.md

parent 73632e38
No related branches found
No related tags found
No related merge requests found
# CommonVoice-TH Recipe
A commonvoice-th recipe for training ASR engine using Kaldi. The following recipe follows `commonvoice` recipe with slight modification
# CommonVoice-FI Recipe
Fork from [commonvoice-th](https://github.com/vistec-AI/commonvoice-th)
A commonvoice-fi recipe for training ASR engine using Kaldi. The following recipe follows `commonvoice` and `commonvoice-th` recipe with slight modification.
## Installation
The author use docker to run the container. **GPU is required** to train `tdnn_chain`, else the script can train only up to `tri3b`.
### Building Docker
```bash
$ docker build -t <docker-name> .
```
### Downloading SRILM
Before building docker, SRILM file need to be downloaded. You can download it from [here](http://www.speech.sri.com/projects/srilm/download.html). Once the file is downloaded, remove version name (e.g. from `srilm-1.7.3.tar.gz` to `srilm.tar.gz` and place it inside `docker` directory. Your `docker` directory should contains 2 files: `dockerfile`, and `srilm.tar.gz`.
### Run docker and attach command line
Since gpu is required you are going to need the kaldi-gpu-image.
```bash
$ docker run -it -v <path-to-repo>:/opt/kaldi/egs/commonvoice-th -v <path-to-labels>:/mnt/labels -v <path-to-cv-corpus>:/mnt --gpus all --name <container-name> <built-docker-name> bash
$ docker run -it --runtime=nvidia -v <path-to-repo>:/opt/kaldi/egs/commonvoice-th -v <path-to-labels>:/mnt/labels -v <path-to-cv-corpus>:/mnt --gpus all --name <container-name> kaldiasr/kaldi:gpu-latest bash
```
Once you finish this step, you should be in a docker container bash shell now
......@@ -20,21 +25,12 @@ $ cd /opt/kaldi/egs/commonvoice-th
$ ./run.sh --stage 0
```
## Experiment Results
Here are some experiment results evaluated on dev set:
|Model|dev WER|
|:----|:----:|
|mono|-%|
|tri1|-%|
|tri2a|-%|
|tri2b|-%|
|tri3b|-%|
|tdnn-chain|-%|
Here is final `test` set result evaluated on `tdnn-chain`
|Model|dev WER|test WER|
|:----|:------|:------:|
|tdnn-chain|-%|-%|
## Author
Chompakorn Chaksangchaichot
Building the model takes roughly 4 hours with the voice dataset from [Mozilla Common Voice](https://commonvoice.mozilla.org/fi/datasets).
Since the dataset is only 14 hours long, it does not contain enough words for the dictionary to be used for actual voice recognition.
## Constructing a working VOSK-model
Vosk is a higher level library that uses Kaldi internally for voice recognition. It requires certain type of Kaldi model in order for it to work.
There is a list in [VOSKs own website](https://alphacephei.com/vosk/models#training-your-own-model) about what the model folder should contain.
Find these files produced by the scripts and put them in right folders to create a working model. NOTE: take the files from nnet directories. Using files from tri3b or models created by earlier stages won't work.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment