Show HN: Spam classifier in Go using Naive Bayes (github.com)

0 points 242 days ago ago | visit original

🤖 AI Summary

A lightweight Naive Bayes spam classifier implemented in Go is now available (go get github.com/igomez10/nspammer). It exposes a simple API that trains from a map[string]bool dataset (true = spam) and classifies new messages with a boolean result. The project is geared toward prototyping and learning but includes real-world evaluation: unit tests, accuracy measurements on train/test splits, and integration with the Kaggle Spam Mails Dataset via an init.sh script (requires the Kaggle CLI). Technically, the classifier computes priors P(spam) and P(not spam), builds a vocabulary, and tallies word counts per class. Classification is done by comparing log scores to avoid numerical underflow: log(P(class)) + Σ log(P(word|class)). Likelihoods use Laplace (additive) smoothing with the formula P(word|class) = (count + α) / (total + α × V) and a default α = 1.0 to handle unseen tokens. The code demonstrates usage in a few lines and runs go test -v for reproducible evaluation. Because it relies on the Naive Bayes independence assumption and bag-of-words features, it’s best suited for fast, interpretable spam filters, education, or as a baseline before moving to more complex sequence- or embedding-based models.

Loading comments...

loading comments...