README.md 2.31 KB
Newer Older
Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
1
2
3
4
5
# CommonVoice-FI Recipe
Fork from [commonvoice-th](https://github.com/vistec-AI/commonvoice-th)

A commonvoice-fi recipe for training ASR engine using Kaldi. The following recipe follows `commonvoice` and `commonvoice-th` recipe with slight modification.

asd@asd.fi's avatar
asd@asd.fi committed
6
7

## Installation
Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
8
Use docker to run the container. **GPU is required** to train `tdnn_chain`, else the script can train only up to `tri3b`.
Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
9

Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
10
## Downloading SRILM
Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
11
Before building docker, SRILM file needs to be downloaded. You can download it from [here](http://www.speech.sri.com/projects/srilm/download.html). Once the file is downloaded, remove version name (e.g. from `srilm-1.7.3.tar.gz` to `srilm.tar.gz` and place it inside `docker` directory. Your `docker` directory should contain 2 files: `dockerfile`, and `srilm.tar.gz`.
Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
12

Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
13
## Run docker and attach command line
Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
14
15
Since gpu is required you are going to need the kaldi-gpu-image.

asd@asd.fi's avatar
asd@asd.fi committed
16
```bash
Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
17
$ docker run -it --runtime=nvidia -v <path-to-repo>:/opt/kaldi/egs/commonvoice-th -v <path-to-labels>:/mnt/labels -v <path-to-cv-corpus>:/mnt --gpus all --name <container-name> kaldiasr/kaldi:gpu-latest bash
asd@asd.fi's avatar
asd@asd.fi committed
18
19
20
21
22
23
```
Once you finish this step, you should be in a docker container bash shell now

## Usage
To run the training pipeline, go to recipe directory and run `run.sh` script
```bash
Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
24
$ cd /opt/kaldi/egs/commonvoice-fi
asd@asd.fi's avatar
asd@asd.fi committed
25
26
27
$ ./run.sh --stage 0
```

Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
28
29
30
31
32
33
Building the model takes roughly 4 hours with the voice dataset from [Mozilla Common Voice](https://commonvoice.mozilla.org/fi/datasets).

Since the dataset is only 14 hours long, it does not contain enough words for the dictionary to be used for actual voice recognition.

## Constructing a working VOSK-model

Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
34
Vosk is a higher level library that uses Kaldi internally for voice recognition. It requires certain type of Kaldi-model in order for it to work.
Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
35
There is a list in [VOSKs own website](https://alphacephei.com/vosk/models#training-your-own-model) about what the model folder should contain.
Aleksi Papalitsas's avatar
Aleksi Papalitsas committed
36
37
38
39
Find these files produced by the scripts and put them in right folders to create a working model. NOTE: take the files from `tdnn_chain` directories. Using files from `tri4b` or models created by earlier stages won't work.

## Prebuilt models
You can download prebuilt models from google drive built with this recipe from [here](https://drive.google.com/drive/folders/1orMXB84d9EXpHrNaI5wlynkvXCOCzwvJ?usp=sharing).