SpamAssassin

SpamAssassin
	<templatestyles src="Template:Hidden begin/styles.css"/> Screenshot E-mail recognized as spam by SpamAssassin, here in the Novell Evolution email client.
Developer(s)	Apache Software Foundation
Stable release	3.4.1 / April 29, 2015
Development status	Active
Written in	Perl
Operating system	Cross-platform
Type	Email spam filter
License	Apache License 2.0
Website	spamassassin.apache.org

SpamAssassin is a computer program released under the Apache License 2.0 used for e-mail spam filtering based on content-matching rules. It is now part of the Apache Foundation.

SpamAssassin uses a variety of spam-detection techniques, that includes DNS-based and fuzzy-checksum-based spam detection, Bayesian filtering, external programs, blacklists and online databases.

The program can be integrated with the mail server to automatically filter all mail for a site. It can also be run by individual users on their own mailbox and integrates with several mail programs. SpamAssassin is highly configurable; if used as a system-wide filter it can still be configured to support per-user preferences.

SpamAssassin was awarded the Linux New Media Award 2006 as the "Best Linux-based Anti-spam Solution".^[3]

History

SpamAssassin was created by Justin Mason who had maintained a number of patches against an earlier program named filter.plx by Mark Jeftovic, which in turn was begun in August 1997. Mason rewrote all of Jeftovic's code from scratch and uploaded the resulting codebase to SourceForge.net on April 20, 2001. In summer 2004 the project became an Apache Software Foundation project and later officially renamed to Apache SpamAssassin. The project involved algorithms developed in part by Gary Robinson and others.^[4]^[5]^[6]

Methods of usage

SpamAssassin is a Perl-based application (Mail::SpamAssassin in CPAN) which is usually used to filter all incoming mail for one or several users. It can be run as a standalone application or as a subprogram of another application (such as Milter, SA-Exim, Exiscan, MailScanner, MIMEDefang, Amavis) or as a client (spamc) that communicates with a daemon (spamd). The client/server or embedded mode of operation has performance benefits, but under certain circumstances may introduce additional security risks.

Typically either variant of the application is set up in a generic mail filter program, or it is called directly from a mail user agent that supports this, whenever new mail arrives. Mail filter programs such as procmail can be made to pipe all incoming mail through SpamAssassin with an adjustment to user's .procmailrc file.

Operation

Spam mail recognized by SpamAssassin.

SpamAssassin comes with a large set of rules which are applied to determine whether an email is spam or not. Most rules are based on regular expressions that are matched against the body or header fields of the message, but SpamAssassin also employs a number of other spam-fighting techniques. The rules are called "tests" in the SpamAssassin documentation.

Each test has a score value that will be assigned to a message if it matches the test's criteria. The scores can be positive or negative, with positive values indicating "spam" and negative "ham" (non-spam messages). A message is matched against all tests and SpamAssassin combines the results into a global score which is assigned to the message. The higher the score, the higher the probability that the message is spam.

SpamAssassin has an internal (configurable) score threshold to classify a message as spam. Usually a message will only be considered as spam if it matches multiple criteria; matching just a single test will not usually be enough to reach the threshold.

If SpamAssassin considers a message to be spam, it can be further rewritten. In the default configuration, the content of the mail is appended as a MIME attachment, with a brief excerpt in the message body, and a description of the tests which resulted in the mail being classified as spam. If the score is lower than the defined settings, by default the information about the tests passed and total score is still added to the email headers and can be used in post-processing for less severe actions, such as tagging the mail as suspicious.

SpamAssassin allows for a per-user configuration of its behaviour, even if installed as system-wide service; the configuration can be read from a file or a database. In their configuration users can specify individuals whose emails are never considered spam, or change the scores for certain rules. The user can also define a list of languages which they want to receive mail in, and SpamAssassin then assigns a higher score to all mails that appear to be written in another language.

SpamAssassin is based on heuristics (pattern recognition), and such software exhibits some false positives, blocking email that may be entirely innocent, hence the need for the software to go through a "learning" exercise. This is similar to heuristic software utilized by credit card issuing banks, that will block a credit card number based upon "suspicious" usage patterns, such as a large number of purchases made within a short time period. As there is no way to tell the "bad guys" from the "good guys" with one-hundred percent accuracy, there are going to be mistakes made determining the appropriate category for some email.^[7]

Network-based filtering methods

SpamAssassin also supports:

DNS-based blackhole lists and DNS-based whitelists
Fuzzy-checksum-based spam detection filters such as the Distributed Checksum Clearinghouses, Vipul's Razor and the Cloudmark Authority plug-in (commercial)
Hashcash email stamps based on proof-of-work
Sender Policy Framework and DomainKeys Identified Mail
URI blacklists such as SURBL or URIBL.com which track spam websites

More methods can be added reasonably easily by writing a Perl plug-in for SpamAssassin.

Bayesian filtering

SpamAssassin by default tries to reinforce its own rules through Bayesian filtering, but Bayesian learning is most effective with actual user input. Typically, the user is expected to "feed" example spam mails and example "ham" (useful) mails to the filter, which can then learn the difference between the two. For this purpose, SpamAssassin provides the command-line tool sa-learn, which can be instructed to learn a single mail or an entire mailbox as either ham or spam.

Typically, the user will move unrecognized spam to a separate folder for a while, and then run sa-learn on the folder of non-spam and on the folder of spam separately. Alternatively, if the mail user agent supports it, sa-learn can be called for individual emails. Regardless of the method used to perform the learning, SpamAssassin's Bayesian test will assign a higher score to e-mails that are similar to previously received spam (or, more precisely, to those emails that are different from non-spam in ways similar to previously received spam e-mails).

Licensing

SpamAssassin is free/open source software, licensed under the Apache License 2.0. Versions prior to 3.0 are dual-licensed under the Artistic License and the GNU General Public License.

sa-compile

sa-compile is a utility distributed with SpamAssassin as of version 3.2.0. It compiles a SpamAssassin ruleset into a deterministic finite automaton that allows SpamAssassin to use processor power more efficiently.

Testing SpamAssassin

Most implementations of SpamAssassin will trigger on the GTUBE, a 68-byte string similar to the antivirus EICAR test file. If this string is inserted in an RFC 5322 formatted message and passed through the SpamAssassin engine, SpamAssassin will trigger with a weight of 1000.

Notes

↑ http://svn.apache.org/repos/asf/spamassassin/trunk/CREDITS
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.

References

Lua error in package.lua at line 80: module 'strict' not found.
Lua error in package.lua at line 80: module 'strict' not found.

External links

SpamAssassin official homepage
SpamAssassin Wiki
sa-update Automatically updating SA
SpamAssassin Rules Emporium (SARE) containing many very good rules for filtering with SA (not updated any more since early 2008).
OpenProtect's SpamAssassin sa-update channel to automatically update SA with the newest and best SARE rules (not updated any more, see above).
Linux New Media Awards 2006 showing that SpamAssassin received 69% of the vote for "best Linux-based anti-spam solution"
Vipul's Razor (SourceForge)
Pyzor (SourceForge)
Questions about sa-compile

[1] ttp://svn.apache.org/repos/asf/spamassassin/trunk/CREDITS

[2] Lua error in package.lua at line 80: module 'strict' not found.

[3] Lua error in package.lua at line 80: module 'strict' not found.

[twsSep3-4] Lua error in package.lua at line 80: module 'strict' not found.

[twsSep14xx-5] Lua error in package.lua at line 80: module 'strict' not found.

[twsSep14yy-6] Lua error in package.lua at line 80: module 'strict' not found.

[7] Lua error in package.lua at line 80: module 'strict' not found.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

v t e The Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airflow Ambari Ant Aries Arrow Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Buildr Calcite Camel CarbonData Cassandra Cayenne Chemistry CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume FreeMarker Geronimo Giraph Gump Hadoop HBase Helix Hive Impala Jackrabbit James Jena Jini JMeter Kafka Kudu Kylin Lucene Mahout Maven MINA mod_perl MyFaces NetBeans Nutch OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pinot Pivot Qpid Roller RocketMQ Samza ServiceMix Shiro SINGA Sling Solr Spark Storm SpamAssassin Struts 1 Struts 2 Subversion Superset SystemDS Tapestry Thrift Tika Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	MXNet NuttX Taverna
Other projects	Batik Chainsaw FOP Ivy Log4j
Attic	Abdera Apex AxKit Beehive Bluesky iBATIS C++ Standard Library Cactus Click Continuum Deltacloud Etch Excalibur Forrest Hama Harmony HiveMind Jakarta Lenya Marmotta ODE Shale Shindig Slide Sqoop Stanbol Tuscany Wave Wink
Licenses	Apache License
Category

v t e Perl
People	Larry Wall Randal L. Schwartz Damian Conway Allison Randal Audrey Tang Sean M. Burke chromatic Adam Kennedy brian d foy Mark Jason Dominus Jesse Vincent
Things	CPAN Perl Foundation Perl Mongers PerlMonks archives module Perl VM Parrot YAPC
Frameworks	Bioperl Catalyst Dancer DBI DBIx::Class LWP Mason Maypole Mojolicious Moose Plack PSGI Template Toolkit
Software	Amavis Argus @SSP AWStats BackupPC Bricolage Bugzilla Dada Mail Makepp Movable Type Munin OTRS SpamAssassin TWiki/Foswiki W3Perl Webmin
Outline Category

SpamAssassin

Contents

History

Methods of usage

Operation

Network-based filtering methods

Bayesian filtering

Licensing

sa-compile

Testing SpamAssassin

See also

Notes

References

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools


<templatestyles src="Template:Hidden begin/styles.css"/> Screenshot E-mail recognized as spam by SpamAssassin, here in the Novell Evolution email client.
Developer(s)	Apache Software Foundation^[1]
Stable release	3.4.1^[2] / April 29, 2015 (2015-04-29)
Development status	Active
Written in	Perl
Operating system	Cross-platform
Type	Email spam filter
License	Apache License 2.0
Website	spamassassin.apache.org