How does content-filtering anti-spam software work?

How does content-filtering anti-spam software work?

Many criteria come into play when choosing an antispam tool to classify a message as inbox or spam. These criteria fall into two categories: the sender’s behavior (aka “reputation”) and the content of the message.When we talk about sender behavior, we’re often talking about technical principles, such as the way in which tracking is carried out, the sending history of IP addresses, SPF records, DNS reverses, the stability over time of sending volume and bad address rates, compliance with best practices during the SMTP transaction, the reputation of the sender and tracking domains… However, sometimes all this can be perfectly managed and the e-mail goes to spam. In this case, it’s the content of the message itself that needs to be taken into account.

Indeed, there are so many criteria involved in content analysis, and some of them are quite complex. So if you’re sending messages that recipients want to receive, you’re sure that the technical aspects are correct and you have a decent reputation, a content analysis may be necessary to optimize deliverability.

The aim is for your e-mails to look “respectable”, to be sent correctly and to stand out from spam, viruses, phishing mails, etc… And not just to please the anti-spam filter, but also to inspire trust, honesty and competence in the recipients.

In a nutshell, content-based antispam can tell the difference between spam and legitimate mail by the way the message is turned. Some spammers try to hide their identity and sometimes even their content (have you ever received mail with blanks between each letter? For example, “v i a g r a”).

The quality of the message source code is also important. A commonly accepted shortcut among anti-spam solution manufacturers is that personal e-mails (written in an Outlook-type e-mail client) and marketing e-mails written by professionals in specialized software (Dreaweaver, for example) are well written and comply with current standards (RFC, etc.). Spammers, on the other hand, work by hand, aiming for volume rather than quality.

Here is a non-exhaustive list of things to consider:

MIME format

In theory, an e-mail should contain an HTML version and a text version, all encapsulated in multipart format.

This statement should not be taken literally. If the text version is different from the HTML version, deliverability will be greatly reduced. It’s better to ignore this criterion if you get poor results (common with Orange).

Character encoding

Spammers try to prevent spam filters from analyzing their content. A common way of doing this is to use base64 encoding (intended for attachments) when it’s plain text. Mails are displayed correctly because e-mail clients can handle this, but the antispam content filter won’t be able to perform any semantic analysis.
Their reaction in this case is often to automatically classify the message as spam.

This type of behavior is not always malicious. For a developer, it’s quicker to encode everything in base64 so as not to have to worry about finding the best encoding for each part of the e-mail.

However, in the eyes of content filters and e-mail administrators, this will give an image of incompetence or dishonesty.

Images

Another way of thwarting antispam analysis is to use images. The classic case is to use just one large image in any text.

Of course, this type of message doesn’t pass the spam filter very well, because a minimum of “normal” text is required in an e-mail:

– an unsubscribe link
– a box with the advertiser’s contact details to comply with most legislation

HTML coding

An unreadable e-mail with broken images or links is not usually sent by a serious advertiser. Each image must have its own ALT tag, so that people who don’t display images can see what’s going on, but also so that filters will find the message more intelligible from their point of view.

Generally speaking, the HTML code must be correct.

Phishin

Some phishing messages will include code such as: Ma-Banque.com, to make the recipient think they’re visiting their bank, when in fact the link will take them to a malicious site.

Of course, anti-spam software will heavily penalize this type of message. Unfortunately, e-mails sent via a professional routing platform may be affected.
In order to provide you with statistics, routers replace your links with links to their tracking platform.

So avoid URLs in your messages coming from a router, or disable tracking.
Otherwise it will look like <a href=”http://www.eml-srv.com/tracking?id=45454515151″>http://www.votre-url-initiale.com</a>
Some anti-spam software will block messages containing a link of this type. This seems to have been the case with Wanadoo / Orange for several weeks now.

Form and content

Even if all your recipients are eagerly awaiting your newsletter, some words can cause you more problems than others. “Looking like a spammer” is very penalizing. If you really must use words like “Paypal”, “VISA”, “Viagra”, “Pharmacy” or “Porn” in your newsletter, expect to spend a lot of time optimizing the rest…

Reputation of sending domains and landing pages

If two e-mails look the same and the recipient reports one as spam, it’s highly likely that the second will go to spam.

Detecting the “similarity” of two e-mails is complex. So a common and simple method is to rely on the “fingerprint or signature” of a message. Other considerations may come into play, but sometimes it comes down to the domain names used by the message.

Domain names are used in two places: in the sender’s e-mail (and/or reply e-mail) and in the links in the e-mail. Public lists of domains that are urgently needed exist (www.uribl.com for example) and are used by some anti-spam software.

Free translation. Find the original version of this Fiddly Trivia article on wordtothewise.com

Need an all-in-one solution? Discover our emailing platform.

Our commitments

Signal Spam
SNCD
SNCD sustainable development
Privacy Protection Pact