Initial commit: SpamLLM - DistilBERT spam classifier for RSpamd
Multilingual spam classifier (DE/EN) with language detection. Non-DE/EN mails receive an additional spam score bonus. - train.py: Fine-tune distilbert-base-multilingual-cased on spam/ham data - server.py: FastAPI service with langdetect integration - rspamd/: Lua plugin and config for RSpamd integration - export_rspamd_data.py: Export Maildir folders to CSV training data - test_classify.py: Local model validation with DE/EN/foreign test cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
commit
38efd20b4d
7 changed files with 671 additions and 0 deletions
9
requirements.txt
Normal file
9
requirements.txt
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
torch>=2.0.0
|
||||
transformers>=4.36.0
|
||||
fastapi>=0.104.0
|
||||
uvicorn>=0.24.0
|
||||
pydantic>=2.0.0
|
||||
datasets>=2.16.0
|
||||
scikit-learn>=1.3.0
|
||||
accelerate>=0.25.0
|
||||
langdetect>=1.0.9
|
||||
Loading…
Add table
Add a link
Reference in a new issue