diff --git a/README.md b/README.md index 83840f0517f92b3c41bfb49ed10f2b50981b97b8..73650d90f891f5b4c3aac9517245fcbd763149ad 100644 --- a/README.md +++ b/README.md @@ -1,40 +1,39 @@ -# CommonVoice-TH Recipe -A commonvoice-th recipe for training ASR engine using Kaldi. The following recipe follows `commonvoice` recipe with slight modification +# CommonVoice-FI Recipe +Fork from [commonvoice-th](https://github.com/vistec-AI/commonvoice-th) + +A commonvoice-fi recipe for training ASR engine using Kaldi. The following recipe follows `commonvoice` and `commonvoice-th` recipe with slight modification. + ## Installation -The author use docker to run the container. **GPU is required** to train `tdnn_chain`, else the script can train only up to `tri3b`. -### Building Docker -```bash -$ docker build -t <docker-name> . -``` -### Run docker and attach command line +Use docker to run the container. **GPU is required** to train `tdnn_chain`, else the script can train only up to `tri3b`. + +## Downloading SRILM +Before building docker, SRILM file need to be downloaded. You can download it from [here](http://www.speech.sri.com/projects/srilm/download.html). Once the file is downloaded, remove version name (e.g. from `srilm-1.7.3.tar.gz` to `srilm.tar.gz` and place it inside `docker` directory. Your `docker` directory should contains 2 files: `dockerfile`, and `srilm.tar.gz`. + +## Run docker and attach command line +Since gpu is required you are going to need the kaldi-gpu-image. + ```bash -$ docker run -it -v <path-to-repo>:/opt/kaldi/egs/commonvoice-th -v <path-to-labels>:/mnt/labels -v <path-to-cv-corpus>:/mnt --gpus all --name <container-name> <built-docker-name> bash +$ docker run -it --runtime=nvidia -v <path-to-repo>:/opt/kaldi/egs/commonvoice-th -v <path-to-labels>:/mnt/labels -v <path-to-cv-corpus>:/mnt --gpus all --name <container-name> kaldiasr/kaldi:gpu-latest bash ``` Once you finish this step, you should be in a docker container bash shell now ## Usage To run the training pipeline, go to recipe directory and run `run.sh` script ```bash -$ cd /opt/kaldi/egs/commonvoice-th +$ cd /opt/kaldi/egs/commonvoice-fi $ ./run.sh --stage 0 ``` -## Experiment Results -Here are some experiment results evaluated on dev set: -|Model|dev WER| -|:----|:----:| -|mono|-%| -|tri1|-%| -|tri2a|-%| -|tri2b|-%| -|tri3b|-%| -|tdnn-chain|-%| - -Here is final `test` set result evaluated on `tdnn-chain` -|Model|dev WER|test WER| -|:----|:------|:------:| -|tdnn-chain|-%|-%| - -## Author -Chompakorn Chaksangchaichot +Building the model takes roughly 4 hours with the voice dataset from [Mozilla Common Voice](https://commonvoice.mozilla.org/fi/datasets). + +Since the dataset is only 14 hours long, it does not contain enough words for the dictionary to be used for actual voice recognition. + +## Constructing a working VOSK-model + +Vosk is a higher level library that uses Kaldi internally for voice recognition. It requires certain type of Kaldi-model in order for it to work. +There is a list in [VOSKs own website](https://alphacephei.com/vosk/models#training-your-own-model) about what the model folder should contain. +Find these files produced by the scripts and put them in right folders to create a working model. NOTE: take the files from `tdnn_chain` directories. Using files from `tri4b` or models created by earlier stages won't work. + +## Prebuilt models +You can download prebuilt models from google drive built with this recipe from [here](https://drive.google.com/drive/folders/1orMXB84d9EXpHrNaI5wlynkvXCOCzwvJ?usp=sharing). diff --git a/docker/dockerfile b/docker/dockerfile deleted file mode 100644 index 411dd89b54a7ccf2c0eb8a0063d91dbae17acf8b..0000000000000000000000000000000000000000 --- a/docker/dockerfile +++ /dev/null @@ -1,12 +0,0 @@ -FROM kaldiasr/kaldi:gpu-latest - -# make sox compat with mp3 as commonvoice is in mp3 format -apt update -apt install -y libsox-fmt-mp3 - -# install SRILM - -# install python3.8 - -# install python dependencies -