Slips Immune
This is the main guide to the documentation related to the changes done to Slips as part of incorporating the immunology ideas
Architecture
RPI performance
Updating Slips
Security & Network Configuration
Datasets & LLM Training
Report Documents:
Summarization Dataset Report - Event summarization and behavior analysis
Risk Analysis Dataset Report - Root cause and risk assessment
Workflow Guides:
Summarization Workflow Implementation - Step-by-step guide for generating summarization datasets
Risk Analysis Workflow Implementation - Step-by-step guide for generating risk datasets
Alert DAG Parser Documentation - DAG structural analysis reference
Datasets Evaluation (LLM-as-a-judge):
LLM Evaluation Guide - How to evaluate and compare LLM models
LLM-as-Judge Rubric - Scoring criteria for cause analysis and risk assessment evaluation (two-stage rubric, max 60 pts)
Summarization Dataset Evaluation Results - Performance metrics for summarization models.
Risk Analysis Dataset Evaluation Results - Performance metrics for risk assessment models
LLM finetuning
LLM RPI Finetuning Frameworks - Framework selection rationale (Unsloth vs alternatives)
Fine-Tuning Approach - General pipeline: LoRA, GGUF export, hardware setup
Fine-Tuning Evaluation Methodology - LLM-as-judge pipeline, metrics, and breakdown dimensions
Quantization and Deployment - GGUF conversion, Ollama publication, and quantization performance analysis
Incident Summarization task:
Summarization Training Procedure - Dataset filtering, ground truth selection, system prompt
Summarization Fine-Tuned Model: Evaluation Results - Benchmark results for the finetuned Qwen2.5-1.5B
Risk Assessment & Cause Analysis task:
Risk Assessment Training Procedure - Dataset filtering, best-of-N selection, combined adapter training
Risk Fine-Tuned Model: Evaluation Results - Benchmark results for the finetuned Qwen2.5-1.5B risk model
Unified model (Summary + Cause + Risk):
Unified Training Procedure - Single adapter for all three tasks: dataset merging, lora_r=128 + RSLoRA, version history
Unified Fine-Tuned Model: Evaluation Results - Benchmark results vs standalone models and quantization impact