Fine-tuning cheatsheet.
When to fine-tune
- Need specific output format consistently.
- Reduce prompt length (saves cost).
- Stylistic / tone consistency.
- Domain-specific terminology.
- Smaller model that matches larger via fine-tune.
When NOT to:
- Better prompting works.
- Limited training data (<1000 examples).
- Knowledge injection (use RAG).
Data format (OpenAI)
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"messages": [...]}
500-1000 examples typical minimum.
OpenAI fine-tune
# Upload data
openai files create --file train.jsonl --purpose fine-tune
# Start job
openai fine_tuning.jobs.create --model gpt-4o-mini-2024-07-18 --training_file file-abc
# Monitor
openai fine_tuning.jobs.list
# Use
openai chat.completions.create -m ft:gpt-4o-mini:org:custom:abc123
LoRA (Low-Rank Adaptation)
Adds small trainable matrices to frozen base model. ~1% of params.
from peft import LoraConfig, get_peft_model
config = LoraConfig(
r=16, # rank
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
)
model = get_peft_model(base_model, config)
QLoRA
LoRA on 4-bit quantized base. Fits 70B on single A100/H100.
from transformers import BitsAndBytesConfig
bnb = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
model = AutoModelForCausalLM.from_pretrained("llama-4-8b", quantization_config=bnb)
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, lora_config)
Unsloth (faster)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/llama-3.1-8b-instruct-bnb-4bit",
max_seq_length=2048,
load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(model, r=16, target_modules=[...])
# Train with HF TRL SFTTrainer
2x faster, less memory.
Training loop
from transformers import TrainingArguments, Trainer
from trl import SFTTrainer
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
args=TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
warmup_steps=10,
num_train_epochs=3,
learning_rate=2e-4,
fp16=True,
logging_steps=10,
output_dir="./output",
),
max_seq_length=2048,
)
trainer.train()
trainer.save_model("./adapter")
DPO (preference tuning)
from trl import DPOTrainer
trainer = DPOTrainer(model, args=..., train_dataset=preference_dataset)
Data: (prompt, chosen, rejected) triples. Aligns model toward preferred outputs.
Eval after fine-tune
Always compare base vs fine-tuned on held-out test set. Watch for:
- Catastrophic forgetting (lost general ability).
- Overfitting (memorized train).
- Bias amplification.
Merge LoRA back
merged = model.merge_and_unload()
merged.save_pretrained("./merged")
For deployment without PEFT runtime.
Cost
- OpenAI fine-tune: ~$25 per 1M training tokens; inference ~3x base price.
- LoRA on RunPod / Modal: $5-50 for typical run.
- QLoRA on local M-series: free, slow.
Common mistakes
- Too little data → overfit.
- No validation set.
- Skipping eval on base.
- Long sequences (>2048) → OOM.
- Mismatched train / inference template.
Read this next
If you want my QLoRA + Unsloth template, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .