LLM FineTuning
Raw
- RLHF in 2024 with DPO & Hugging Face
- Direct Preference Optimization (DPO): A Simplified Approach to Fine-tuning Large Language Models
- Fine-tuning SeaLLM on Your Own Dataset with QLoRA on RTX4090
- MLX: Quantize, LoRA, QLoRA, Fuse
Supervised Fine-tuning (SFT) with Unsloth
(Recommend)
Direct Preference Optimization (DPO) (Not Recommend)
Brain dump (WIP)
A Beginner’s Guide to Fine-Tuning Mistral 7B Instruct Model
Fixed notebook is Mistral_7B_qLora_Finetuning.ipynb. But prompt formatting is still in doubt.
-
Colab: https://adithyask.medium.com/a-beginners-guide-to-fine-tuning-mistral-7b-instruct-model-0f39647b20fe
-
Source: https://github.com/adithya-s-k/CompanionLLM
-
⚠️ This notebook needs to add
pad_token_id=2
when calling merged_model.generate() in Test the merged model:outputs = merged_model.generate(input_ids=input_ids, pad_token_id=2, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.5)