diff --git a/readme.md b/readme.md new file mode 100644 index 0000000000000000000000000000000000000000..438947a415855b4f93f7ade66490517f53378f76 --- /dev/null +++ b/readme.md @@ -0,0 +1,39 @@ +Trainer script for gpt2-like model with huggingface and deepspeed. + +Step 1) +clone folders testing/venv_trainer/ and tokenizer/ + +Step 2) +create virtual environment, upgrade pip (pip3 install --upgrade pip) and install deepspeed as Iiro instructed earlier (just search this channel w/ "deepspeed"). Install also transformers and datasets. + +Step 3) +Set trainer.py-file line 100. +tokenizer_path = your_tokenizer_dir + +Step 4) +Setup your own configs at trainer.bash +Set paths correctly to your own locations, maybe remove comment from --cpus-per-task=10 to get data tokenization more efficient etc. + +Step 5) +sbatch trainer.bash + + +Notes: +* Script isn't ready and has lot's of tweaks. +* confligts of ds_config.json and TrainingArguments result to crashing. Use "auto"-values in ds_config to propagate correct values to deepspeed engine. +* If you experience slow startups you may want use singularity-module to load torch. It may hurt performance, but is fast with imports. Process is something like this: +1. module load torch. +2. pip install deepspeed datasets transformers --user +3. make sure you've got the following on your slurm-script: + +``` +export TORCH_EXTENSIONS_DIR=/PATH/TO/SOME/DIR/ + +module load pytorch +module load gcc/9.1.0 +module load cuda/11.1.0 + +export SINGULARITYENV_APPEND_PATH="/users/$USER/.local/bin" +export CPATH=/appl/spack/install-tree/gcc-9.1.0/python-3.6.8-ecovls/include/python3.6m:$CPATH +``` +4. You may need to change python path from your deepspeed-launcher located at /users/$USER/.local/bin/deepspeed to `which python` pointing to singularity-python