Fine-Tuning Approach for Slips Immune

Summary: Task-specific fine-tuning of compact models (1.5B parameters) using LoRA + Unsloth, exported to GGUF for CPU inference on the Raspberry Pi 5. The same training pipeline applies across tasks; only the dataset and system prompt are task-specific.


Index


Motivation

The Raspberry Pi 5 can run small quantized models (1.5B–3B parameters) via Ollama/llama.cpp, but untuned models at this scale perform poorly on domain-specific tasks like security incident summarization or decision making. Fine-tuning on task-specific data allows a 1.5B model to match or exceed the quality of a larger untuned 3B model — a meaningful gain on constrained hardware.

Fine-tuning is performed off-device on GPU hardware, and the resulting model is exported to GGUF for direct deployment on the RPi5.


General Pipeline

Every fine-tuning run follows the same four-stage pipeline regardless of task:

Raw dataset
    │
    ▼
1. Quality filtering      filter low-quality examples, produce 90/10 train/eval split
    │
    ▼
2. Ground truth selection  pick best response per incident, format as SFT conversations
    │
    ▼
3. SFT training           LoRA fine-tuning via Unsloth, config driven by config.yaml
    │
    ▼
4. Export                 merge adapters → 16-bit weights + GGUF (q4_k_m) for RPi

What varies per task: the dataset source, the filtering criteria, and the system prompt used to format conversations.


Framework and Hardware

Fine-tuning uses Unsloth for its integrated GGUF export, memory-efficient LoRA training, and direct Hugging Face model compatibility. See Fine-Tuning Frameworks for the full framework comparison rationale.

Fixed training setup across tasks:

Parameter

Value

Base model family

Qwen2.5-Instruct (1.5B)

Training mode

SFT (Supervised Fine-Tuning)

Adapter method

LoRA (rank 16, alpha 16)

LoRA targets

q/k/v/o projections, MLP gate/up/down

Optimizer

AdamW 8-bit

Precision

FP16

Hardware

GPU with 12 GB VRAM

Task-specific parameters (learning rate, epochs, sequence length, batch size) are configured in config.yaml.


Output Format

After training, the pipeline produces:

  • Merged 16-bit weights — for GPU inference and evaluation

  • GGUF (q4_k_m) — for direct deployment on Raspberry Pi 5 via Ollama or llama.cpp

The core training script is train_qwen.py. Both outputs are generated automatically based on config.yaml settings.


Task-Specific Procedures

Task

Dataset

Procedure

Model

Incident Summarization

summarization_dataset_v3

Summarization Procedure

stratosphere/qwen2.5-1.5b-slips-immune

Risk Assessment & Cause Analysis

risk_dataset_v2

Risk Procedure

stratosphere/qwen2.5-1.5b-slips-immune-risk

Unified (Summary + Cause + Risk)

unified_train_dataset_augmented

Unified Procedure

stratosphere/qwen2.5-1.5b-slips-immune-unified