Add project README and reason/quote fields to classifier response

- README.md: full project overview with setup, training, API, and RSpamd integration docs - server.py: add reason (human-readable explanation) and quote (suspicious snippet) to response - spamllm.lua: pass reason and quote through to RSpamd symbol description for logs/UI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 22:32:54 +01:00 · 2026-03-19 22:32:54 +01:00 · f05320a8cb
commit f05320a8cb
parent 38efd20b4d
3 changed files with 203 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,129 @@
 # SpamLLM
 A multilingual spam classifier for RSpamd using fine-tuned DistilBERT. Classifies emails in German and English, with automatic language detection that flags unexpected languages as suspicious.
 ## How it works
 ```
 Incoming Mail → RSpamd → HTTP POST → SpamLLM Service → Score + Reason + Quote → RSpamd
 ```
 RSpamd sends mail metadata (subject, body, sender) to the SpamLLM FastAPI service. The service runs the text through a fine-tuned `distilbert-base-multilingual-cased` model and returns:
 - **score** (0-15, RSpamd-compatible)
 - **reason** (human-readable explanation, e.g. "High spam confidence (94%)")
 - **quote** (the most suspicious snippet from the mail)
 - **language** detection with a score bonus for non-DE/EN mails
 ## Project structure
 ```
 spamllm/
 ├── train.py                 # Fine-tune DistilBERT on spam/ham data
 ├── server.py                # FastAPI classification service
 ├── test_classify.py         # Local model validation (DE/EN/foreign samples)
 ├── export_rspamd_data.py    # Export Maildir folders to CSV training data
 ├── requirements.txt         # Python dependencies
 └── rspamd/
    ├── local.d/
    │   └── external_services.conf   # RSpamd config
    └── lua/
        └── spamllm.lua              # RSpamd Lua plugin
 ```
 ## Setup
 ```bash
 python3 -m venv venv
 source venv/bin/activate
 pip install -r requirements.txt
 ```
 ## Training
 ### Option A: Demo dataset (quick start)
 Uses the public SMS Spam Collection (~5,500 English messages):
 ```bash
 python train.py --epochs 3
 ```
 ### Option B: Your own mail data (recommended for production)
 1. Export your existing mails from Maildir:
 ```bash
 python export_rspamd_data.py \
  --spam-dir /var/vmail/user/.Junk/cur \
  --ham-dir /var/vmail/user/.INBOX/cur \
  --max-per-class 5000
 ```
 2. Train on the exported data:
 ```bash
 python train.py --custom-data --epochs 5
 ```
 You can also place a German-only dataset at `data/train_de.csv` (columns: `text`, `labels`) to supplement the English demo data when training without `--custom-data`.
 ## Running the service
 ```bash
 uvicorn server:app --host 127.0.0.1 --port 8000
 ```
 ### API
 **POST /classify**
 ```json
 {
  "subject": "Sie haben gewonnen!",
  "body": "Klicken Sie hier um Ihren Preis abzuholen...",
  "from_addr": "spam@example.com"
 }
 ```
 Response:
 ```json
 {
  "is_spam": true,
  "confidence": 0.94,
  "score": 14.1,
  "language": "de",
  "foreign_lang_bonus": 0.0,
  "reason": "High spam confidence (94%)",
  "quote": "...Klicken Sie hier um Ihren Preis abzuholen. Senden Sie uns Ihre Bankdaten..."
 }
 ```
 **GET /health** — returns `{"status": "ok", "model_loaded": true}`
 ## RSpamd integration
 1. Copy `rspamd/lua/spamllm.lua` to `/etc/rspamd/plugins.d/`
 2. Copy `rspamd/local.d/external_services.conf` to `/etc/rspamd/local.d/`
 3. Reload RSpamd: `rspamadm reload`
 The plugin registers three symbols:
 | Symbol | Weight | Description |
 |--------|--------|-------------|
 | `SPAMLLM_SPAM` | +5.0 | Spam detected by classifier |
 | `SPAMLLM_HAM` | -2.0 | Ham detected by classifier |
 | `SPAMLLM_FOREIGN_LANG` | +4.0 | Unexpected language (not DE/EN) |
 The Lua plugin only sends mails in the RSpamd grey zone (score 3-12) to the service, so obvious spam/ham is handled by RSpamd's built-in rules without extra latency.
 ## Language detection
 Mails are classified by language using `langdetect`. Expected languages (German, English) are scored normally. All other languages receive a +4 point spam bonus and a lowered spam threshold (0.3 instead of 0.5), since non-DE/EN mails are almost always spam in this environment.
 ## Performance
 - Inference: ~20-50ms per mail on CPU
 - Model size: ~500MB (distilbert-base-multilingual-cased)
 - Training on demo dataset: ~10-15 min on CPU, ~2 min on GPU
--- a/rspamd/lua/spamllm.lua
+++ b/rspamd/lua/spamllm.lua
@ -75,9 +75,14 @@ local function check_spamllm(task)
    local result = parser:get_object()
    if result.is_spam then
-      task:insert_result(settings.symbol_spam, result.confidence, "SpamLLM")
+      -- Reason und Quote als Options an das Symbol anhängen
-      rspamd_logger.infox(task, "SpamLLM: SPAM (confidence=%.2f, score=%.2f, lang=%s)",
+      local description = result.reason or "SpamLLM"
-        result.confidence, result.score, result.language or "?")
+      if result.quote and #result.quote > 0 then
        description = description .. " | " .. result.quote
      end
      task:insert_result(settings.symbol_spam, result.confidence, description)
      rspamd_logger.infox(task, "SpamLLM: SPAM (confidence=%.2f, score=%.2f, lang=%s, reason=%s)",
        result.confidence, result.score, result.language or "?", result.reason or "?")
    else
      task:insert_result(settings.symbol_ham, -result.confidence, "SpamLLM")
    end
--- a/server.py
+++ b/server.py
@ -62,6 +62,8 @@ class ClassifyResponse(BaseModel):
    score: float  # RSpamd-kompatibler Score (0-15)
    language: str  # Erkannte Sprache
    foreign_lang_bonus: float  # Zusätzlicher Score für Fremdsprache
    reason: str  # Menschenlesbare Begründung
    quote: str  # Verdächtigster Textausschnitt
 def detect_language(text: str) -> tuple[str, bool]:
@ -79,6 +81,65 @@ def detect_language(text: str) -> tuple[str, bool]:
        return "unknown", False
 # Spam-Signalwörter für Quote-Extraktion (DE + EN)
 SPAM_PATTERNS = [
    "click here", "klicken sie", "jetzt bestellen", "order now",
    "act now", "sofort", "dringend", "urgent", "verify your",
    "bestätigen sie", "gewonnen", "you won", "congratulations",
    "herzlichen glückwunsch", "free", "gratis", "kostenlos",
    "100%", "guarantee", "garantie", "limited time", "nur heute",
    "unsubscribe", "abmelden", "no risk", "kein risiko",
    "bank details", "bankdaten", "password", "passwort",
    "account suspended", "konto gesperrt", "credit card", "kreditkarte",
    "viagra", "cialis", "pharmacy", "apotheke", "discount", "rabatt",
    "million", "prize", "preis", "winner", "gewinner",
 ]
 def find_spam_quote(subject: str, body: str) -> str:
    """Findet den verdächtigsten Textausschnitt in der Mail."""
    full_text = f"{subject} {body}".lower()
    for pattern in SPAM_PATTERNS:
        pos = full_text.find(pattern)
        if pos != -1:
            # Kontext um das Match herum extrahieren (max 120 Zeichen)
            original = f"{subject} {body}"
            start = max(0, pos - 30)
            end = min(len(original), pos + len(pattern) + 60)
            snippet = original[start:end].strip()
            if start > 0:
                snippet = "..." + snippet
            if end < len(original):
                snippet = snippet + "..."
            return snippet
    # Kein Pattern gefunden -> ersten Satz des Bodys als Fallback
    if body:
        first_sentence = body.split(".")[0].strip()
        return first_sentence[:120] + ("..." if len(first_sentence) > 120 else "")
    return subject[:120] if subject else ""
 def build_reason(spam_prob: float, is_foreign: bool, language: str) -> str:
    """Baut eine menschenlesbare Begründung zusammen."""
    reasons = []
    if spam_prob > 0.8:
        reasons.append(f"High spam confidence ({spam_prob:.0%})")
    elif spam_prob > 0.5:
        reasons.append(f"Moderate spam confidence ({spam_prob:.0%})")
    elif spam_prob > 0.3:
        reasons.append(f"Low spam confidence ({spam_prob:.0%})")
    else:
        reasons.append(f"Likely ham ({1 - spam_prob:.0%} confidence)")
    if is_foreign:
        reasons.append(f"Unexpected language: {language} (not DE/EN)")
    return "; ".join(reasons)
@app.post("/classify", response_model=ClassifyResponse)
 async def classify(request: ClassifyRequest):
    # Kombiniere Mail-Felder zu einem Text
@ -106,12 +167,17 @@ async def classify(request: ClassifyRequest):
    # Spam-Schwelle nach Bonus neu bewerten
    effective_spam = spam_prob > 0.5 or (is_foreign and spam_prob > 0.3)
    reason = build_reason(spam_prob, is_foreign, language)
    quote = find_spam_quote(request.subject, request.body) if effective_spam else ""
    return ClassifyResponse(
        is_spam=effective_spam,
        confidence=spam_prob,
        score=round(rspamd_score, 2),
        language=language,
        foreign_lang_bonus=lang_bonus,
        reason=reason,
        quote=quote,
    )