Initial commit: SpamLLM - DistilBERT spam classifier for RSpamd

Multilingual spam classifier (DE/EN) with language detection. Non-DE/EN mails receive an additional spam score bonus. - train.py: Fine-tune distilbert-base-multilingual-cased on spam/ham data - server.py: FastAPI service with langdetect integration - rspamd/: Lua plugin and config for RSpamd integration - export_rspamd_data.py: Export Maildir folders to CSV training data - test_classify.py: Local model validation with DE/EN/foreign test cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 22:27:05 +01:00 · 2026-03-19 22:27:05 +01:00 · 38efd20b4d
commit 38efd20b4d
7 changed files with 671 additions and 0 deletions
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,9 @@
+torch>=2.0.0
+transformers>=4.36.0
+fastapi>=0.104.0
+uvicorn>=0.24.0
+pydantic>=2.0.0
+datasets>=2.16.0
+scikit-learn>=1.3.0
+accelerate>=0.25.0
+langdetect>=1.0.9