Nix flake sample for for Machine Learning (whisper)

This is a Nix flake environment for developing a sample Python project scaffolding for Machine Learning using whisper and the following dependencies:

Python 3.10
ffmpeg
loguru
flask
torch
pandas

The mkShell function is used to create a development shell that includes all these dependencies, as well as any dependencies listed in the requirements.txt file. The shell is created in a virtual environment located in the ./.venv directory.

To use this flake, you need to have Nix installed. Once you have Nix installed, you can clone this repository and run the following command in the repository directory:

nix develop

This will create a development shell with all the required dependencies installed and You will have venv shell automaticly.

You can use this environment to develop a machine learning project using whisper. Whisper provides a set of tools for building, training and testing neural networks in PyTorch.

The src directory in this project includes a sample Python module nix_python that uses whisper to run a neural network. You can modify this module to build and train your own neural network models.

The flake also includes a buildPythonPackage for a package named "nix-python", which is built using the pyproject format. The package includes the dependencies listed above, as well as the setuptools package.

Enjoy your machine learning development environment with whisper!

Usage

If you have our recommended configuration of Nix data science VM, then to use this setup:

Check out this repository and open the folder
direnv allow to permit direnv to operate
Automatic: direnv and Nix will ensure exact python version, virtual environment and binary dependencies.
pip install --editable . - install python project for editing. This will also create serve command.

Further in the example will be used a Bashkir Langiage speech-to-text transcription model

You can execute serve and should get the output like:

You can open your browser at http://127.0.0.1:5001 to view a web UI:

Select any ogg and mp3 files that contain language sample by Choose the file button:

Cick on Upload button and wait for a moments for resutl:

GPU is not required, demo app will run on CPU, but on GPU the application will run much faster. If GPU support is available, the application will switch to using the GPU automatically.

More training details

The model is fine-tuning with the HuggingFace scripts: https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event
The openai/medium model with 769M parameters was used as a basis. Our model was the winner among Bashkir language models in the Whisper Event.
We used Common Voice dataset: https://commonvoice.mozilla.org/ba/datasets
To train the model we used our TrustBit server: AMD EPYC 7452 32-Core Processor, RTX 3090 GPU, 128 GB RAM

Implementation Details

Using direnv and nix flakes for automated dependency management
Using pyproject.toml for project metadata (PEP621 and Project Metadata)
Setuptools as the default build system
Modules go into src folder
Dockerfile that support install flakes from Github

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
img		img
src/nix_python		src/nix_python
.dockerignore		.dockerignore
.envrc		.envrc
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
audio_2023-02-20_11-34-56.ogg		audio_2023-02-20_11-34-56.ogg
flake.lock		flake.lock
flake.nix		flake.nix
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sample.ogg		sample.ogg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Nix flake sample for for Machine Learning (whisper)

Usage

More training details

Implementation Details

About

Licenses found

Releases

Packages

Contributors 3

Languages

License

Licenses found

trustbit/nix-whisper-transcription-demo

Folders and files

Latest commit

History

Repository files navigation

Nix flake sample for for Machine Learning (whisper)

Usage

More training details

Implementation Details

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages