Improving the effectiveness and accuracy of SpamAssassin updating its rules automatically on Debian

One can improve the default effectiveness and accuracy of SpamAssassin on Debian systems by automatically updating its rules from official channel and from suggested channel.

This tutorial will show how to update the rules and include the Sought automatically generated daily rules from messages caught in spam traps.

Also, read the other "Related Content" articles at this site regarding antispam and SpamAssassin linked.

 

vi /etc/default/spamassassin

# Cronjob
# Set to anything but 0 to enable the cron job to automatically update
# spamassassin's rules on a nightly basis
#AFM 20150723 https://wiki.apache.org/spamassassin/ImproveAccuracy
CRON=1

 

mkdir ~/spamassassin
cd ~/spamassassin/

mkdir /etc/spamassassin/sa-update-keys
chmod go-rx /etc/spamassassin/sa-update-keys

mkdir -p ~/temp/etc
cd ~/temp/etc
cp -pr /etc/spamassassin .

ls -lh /var/lib/spamassassin/
ls -lh /var/lib/spamassassin/sa-update-keys/
ls -lh /var/lib/spamassassin/3.004000/
ls -lh /var/lib/spamassassin/3.004000/updates_spamassassin_org/

mkdir -p ~/temp/var/lib
cp -pr /var/lib/spamassassin ~/temp/var/lib/
ls -lah ~/temp/var/lib/spamassassin

wget http://spamassassin.apache.org/updates/GPG.KEY
sa-update --import GPG.KEY
mv GPG.KEY spamassassinGPG.KEY
sa-update --checkonly -v

sa-update -v  --channel updates.spamassassin.org
ls -lah /var/lib/spamassassin/3.004000/updates_spamassassin_org
invoke-rc.d spamassassin reload

 

 

#You can now install Sought rules:

wget http://yerp.org/rules/GPG.KEY
sa-update --import GPG.KEY 
mv GPG.KEY soughtGPG.KEY
sa-update --checkonly -v

sa-update -v  --gpgkey 6C6191E3 --channel sought.rules.yerp.org  --channel updates.spamassassin.org 
ls -lah /var/lib/spamassassin/3.004000/sought_rules_yerp_org/
invoke-rc.d spamassassin reload
#sa-update && /etc/init.d/spamassassin reload

less /var/lib/spamassassin/3.004000/sought_rules_yerp_org/20_sought.cf

cat /var/lib/spamassassin/3.004000/updates_spamassassin_org/STATISTICS-set0-72_scores.cf.txt

##### WITH NEW RULES AND SCORES #####

# SUMMARY for threshold 5.0:
# Correctly non-spam: 135863  39.432%  (97.611% of non-spam corpus)
# Correctly spam:     149688  43.444%  (72.889% of spam corpus)
# False positives:      3325  0.965%  (2.389% of nonspam, 146801 weighted)
# False negatives:     55677  16.159%  (27.111% of spam, 139536 weighted)
# Average score for spam:  10.0    nonspam: 1.0
# Average for false-pos:   6.0  false-neg: 2.5
# TOTAL:              344553  100.00%

Reading scores from "tmprules"...
Reading per-message hit stat logs and scores...

# SUMMARY for threshold 5.0:
# Correctly non-spam:  16997  97.42%
# Correctly spam:      18797  73.13%
# False positives:       450  2.58%
# False negatives:      6908  26.87%
# TCR(l=50): 0.874082  SpamRecall: 73.126%  SpamPrec: 97.662%

##### WITHOUT NEW RULES AND SCORES #####
Reading scores from "../rules-base"...
Reading per-message hit stat logs and scores...

# SUMMARY for threshold 5.0:
# Correctly non-spam: 135534  97.37%
# Correctly spam:      56405  27.47%
# False positives:      3654  2.63%
# False negatives:    148960  72.53%
# TCR(l=50): 0.619203  SpamRecall: 27.466%  SpamPrec: 93.916%
Reading scores from "../rules-base"...
Reading per-message hit stat logs and scores...

# SUMMARY for threshold 5.0:
# Correctly non-spam:  17011  97.50%
# Correctly spam:       7152  27.82%
# False positives:       436  2.50%
# False negatives:     18553  72.18%
# TCR(l=50): 0.637003  SpamRecall: 27.823%  SpamPrec: 94.254%

 

cat  /var/lib/spamassassin/3.004000/updates_spamassassin_org/STATISTICS-set1-72_scores.cf.txt

##### WITH NEW RULES AND SCORES #####

# SUMMARY for threshold 5.0:
# Correctly non-spam: 154663  41.631%  (99.548% of non-spam corpus)
# Correctly spam:     106767  28.739%  (49.397% of spam corpus)
# False positives:       703  0.189%  (0.452% of nonspam,  57031 weighted)
# False negatives:    109374  29.441%  (50.603% of spam, 220677 weighted)
# Average score for spam:  8.9    nonspam: -0.5
# Average for false-pos:   5.8  false-neg: 2.0
# TOTAL:              371507  100.00%

Reading scores from "tmprules"...
Reading per-message hit stat logs and scores...

# SUMMARY for threshold 5.0:
# Correctly non-spam:  19456  99.51%
# Correctly spam:      13315  49.17%
# False positives:        95  0.49%
# False negatives:     13766  50.83%
# TCR(l=50): 1.462573  SpamRecall: 49.167%  SpamPrec: 99.292%

##### WITHOUT NEW RULES AND SCORES #####
Reading scores from "../rules-base"...
Reading per-message hit stat logs and scores...

# SUMMARY for threshold 5.0:
# Correctly non-spam: 154853  99.67%
# Correctly spam:      87475  40.47%
# False positives:       513  0.33%
# False negatives:    128666  59.53%
# TCR(l=50): 1.400639  SpamRecall: 40.471%  SpamPrec: 99.417%
Reading scores from "../rules-base"...
Reading per-message hit stat logs and scores...

# SUMMARY for threshold 5.0:
# Correctly non-spam:  19484  99.66%
# Correctly spam:      10975  40.53%
# False positives:        67  0.34%
# False negatives:     16106  59.47%
# TCR(l=50): 1.391910  SpamRecall: 40.527%  SpamPrec: 99.393%

 

ls -lah /var/lib/spamassassin/3.004000/updates_spamassassin_org

less /var/lib/spamassassin/3.004000/updates_spamassassin_org/50_scores.cf
less /var/lib/spamassassin/3.004000/updates_spamassassin_org/72_scores.cf

 

Now that you have manually tested the update, you have to adjust permissions to leave the spamassassin daily cronjob  update the rules automatically for you

chown -R debian-spamd:debian-spamd /var/lib/spamassassin
chown -R debian-spamd:debian-spamd /etc/spamassassin/sa-update-keys/
chown -R debian-spamd:debian-spamd /etc/spamassassin/sa-update-hooks.d/
su - debian-spamd -c "/usr/bin/sa-update -v --gpghomedir /var/lib/spamassassin/sa-update-keys"
sh -x /etc/cron.daily/spamassassin

 

Verify it will run daily.

IF your machine is not running 24 hours per day you must install anacron.

apt-get install anacron
run-parts -v --report /etc/cron.daily

 

Next day, at 06:25 am on Debian, your rules will be updated automagically.

Verify it next day by reading /var/log/syslog and /var/log/cron.log

less /var/log/syslog
less /var/log/cron.log

 

 

 

Bibliography

http://www.vivaolinux.com.br/dica/SpamAssassin-Melhorando-a-eficacia-do-...
https://forums.cpanel.net/threads/spamassassin-3-4-improvement-with-upda...
http://taint.org/2007/08/15/004348a.html
http://taint.org/2007/08/04/200125a.html
http://taint.org/2007/04/17/132339a.html

https://wiki.apache.org/spamassassin/ImproveAccuracy
https://wiki.apache.org/spamassassin/NightlyMassCheck
https://wiki.apache.org/spamassassin/UploadedCorpora

http://forums.contribs.org/index.php?topic=42454.0
https://wiki.apache.org/spamassassin/WritingRules

https://wiki.apache.org/spamassassin/InstallingDCC
http://www.rhyolite.com/dcc/
http://debian.dev-zero.nl/blog/archives/315
http://debian.dev-zero.nl/blog/about
http://debian.dev-zero.nl/debian/dists/

http://dspam.nuclearelephant.com/
http://sourceforge.net/projects/dspam/files/
http://sourceforge.net/projects/dspam/
 

http://ubuntuforums.org/showthread.php?t=2233556

http://serverfault.com/questions/705040/spamassassin-sa-update-failed-fo...

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=739489

http://unix.stackexchange.com/questions/7053/how-can-get-a-list-of-all-s...

http://serverfault.com/questions/99584/not-all-cron-jobs-in-etc-cron-dai...

 

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=743872

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=623861

https://bugs.launchpad.net/ubuntu/+source/spamassassin/+bug/1373560

https://forum.directadmin.com/showthread.php?t=36125

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=59647

http://stackoverflow.com/questions/14647447/cronjob-cron-daily-does-not-...

https://packages.debian.org/jessie/anacron

Blog Tags: