🤗 HuggingFace - Candle

Raw

https://github.com/huggingface/candle
https://github.com/nogibjj/candle-cookbook

A minimalist ML framework for Rust with a focus on performance (including GPU support) and ease of use.

Examples

These online demos run entirely in your browser:

yolo: pose estimation and object recognition.
whisper: text to speech.
LLaMA2: text generation.
LLaMA and LLaMA-v2: general LLM.
Falcon: general LLM.
StarCoder: LLM specialized to code generation.
Quantized LLaMA: quantized version of the LLaMA model using the same quantization techniques as llama.cpp.

Stable Diffusion: text to image generative model, support for the 1.5, 2.1, and SDXL 1.0 versions.

yolo-v3 and yolo-v8: object detection and pose estimation models.

segment-anything: image segmentation model with prompt.

Whisper: speech recognition model.
Bert: useful for sentence embeddings.
DINOv2: computer vision model trained using self-supervision (can be used for imagenet classification, depth evaluation, segmentation).

Setup

llama2-c on Windows 11 WSL2 Ubuntu, RTX4090

# Choose example llama2-c
cd candle-wasm-examples/llama2-c

# Setup raw
wget https://huggingface.co/spaces/lmz/candle-llama2/resolve/main/model.bin
wget https://huggingface.co/spaces/lmz/candle-llama2/resolve/main/tokenizer.json

# Setup tools
cargo install --locked trunk
cargo install --locked wasm-bindgen-cli
sudo apt install libssl-dev

# Build
. ./build-lib.sh

# Serve
trunk serve --release --port 8081
open http://127.0.0.1:8081

Mistral on Windows 11 WSL2 Ubuntu, RTX4090

# Update & upgrade
sudo apt update && sudo apt upgrade

# Remove previous NVIDIA installation
sudo apt autoremove nvidia* --purge

# Setup cuda ref: https://gist.github.com/denguir/b21aa66ae7fb1089655dd9de8351a202
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda-repo-wsl-ubuntu-12-4-local_12.4.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-4-local_12.4.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4
sudo apt-get -y install cuda-nvcc-12-4

sudo ubuntu-drivers autoinstall
sudo apt install nvidia-driver-525
sudo apt install nvidia-cuda-toolkit

# Check NVIDIA Drivers
nvidia-smi

# Check CUDA
nvcc --version

# Setup cuDNN
sudo apt install libcudnn8
sudo apt install libcudnn8-dev

# Check CuDNN
/sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep libcudnn

# Source
echo 'export CUDA_HOME=/usr/local/cuda-12.4' >> ~/.bashrc
echo 'export PATH=/usr/local/cuda-12.4/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
sudo ln -fs /usr/local/cuda-12.4/bin/nvcc /usr/bin/nvcc

# Setup raw
wget https://huggingface.co/lmz/candle-mistral/resolve/main/pytorch_model-00001-of-00002.safetensors
wget https://huggingface.co/lmz/candle-mistral/resolve/main/pytorch_model-00002-of-00002.safetensors
wget https://huggingface.co/lmz/candle-mistral/resolve/main/tokenizer.json

# Or macOS
curl -LO https://huggingface.co/lmz/candle-mistral/resolve/main/pytorch_model-00001-of-00002.safetensors
curl -LO https://huggingface.co/lmz/candle-mistral/resolve/main/pytorch_model-00002-of-00002.safetensors
curl -LO https://huggingface.co/lmz/candle-mistral/resolve/main/tokenizer.json

# Run
cargo run --example mistral --release --features cuda,cudnn -- --prompt "Write helloworld code in Rust" --weight-files=pytorch_model-00001-of-00002.safetensors,pytorch_model-00002-of-00002.safetensors --tokenizer-file=tokenizer.json --sample-len 150

Whisper v3

cargo run --example whisper --release -- --model=large-v3

Gist of Rust

🤗 HuggingFace - Candle

Raw

Examples

Setup

Whisper v3