Why Do We Get Nonsense Spam?

I imagine we’ve all been annoyed by spam selling viagra, watches, penny stocks and penis enlargement creams, but at least they make a certain sort of sense. Spammers send out millions of ads, get tens of sales, and make some money while annoying everyone else.

But what about the spam that isn’t selling anything? What is the point of sending out spam with a string of unrelated words that doesn’t even mention a product name, let alone claim that it will make your stock portfolio larger and more satisfying? To understand this kind of spam we need to think about the entire spamming process and how it has developed over the years.

Sending Spam

Originally spam used to be sent fairly directly. You’d sit (virtually at least) at your internet server or PC and send out your ads to unsuspecting mail servers all over the world. This approach didn’t work for long – blacklists were created to block known spam servers, anti-spamming clauses were written into internet service providers’ contracts, and anti-spam laws were passed in a number of countries. The spammers had to go under the radar.

At the same time this was happening, PCs all over the world were being infected by spyware, trojans, adware and other malware (a catch-all name for ‘malicious software’). Some of these were just annoyances that generated popup ads everywhere, but others would take over your PC and hand control of it to someone else.

The spammers saw this happening and realised that they could write malware that would send out spam – what could be stealthier then getting someone elses PC to send your spam for you? You’d infect their PC with spam-bot software, the spam-bot would connect to the spam server to get the latest ad campaign, and off it would go merrily sending spam out to all and sundry. If you could infect thousands of PCs with your software it didn’t matter if some got shut down, there were always plenty more to keep pumping it out. Spam was not only back, it was back in enormous volume.

Blocking Spam

However, the battle against spam wasn’t solely concentrated on stopping spam being sent. The other major front was stopping spam getting in by blocking incoming email that met certain rules. Originally these rules were fairly simple, looking for key-words like “viagra” in combination with links to websites. These worked somewhat, but they weren’t very effective (“Hey, let’s spell it as v1agra!”) and blocked too many real messages.

The anti-spam filters had to get more sophisticated and the new technique was something called bayesian filtering. Simply put, this technique works by taking a large body of email that has already been sorted into spam and non-spam. When a new email arrives, the bayesian filter is used to ask a simple question – does this new email look more like the emails in the spam group or more like the emails in the non-spam group? This method proved to be much more effective at filtering out spam and the anti-spammers were once again winning the battle.

Naturally the spammers fought back, this time by adding extra bits and pieces to their ads. A typical spam message would have the ad followed by a few paragraphs of pseudo-random generated text, with the hope that the email would look more like a real email and therefore get past the bayesian filters. (The pseudo-random text was quite surreally pretty at times and some geek-literateurs got quite excited and ran off to write learned papers about it.)

Tying it All Together

So the pieces are in place now but how does this explain the nonsense spam? Simply put, the spam-bot software isn’t very well written. It works something along these lines:

1. Infect PC.
2. Connect to spam-server and download the latest ad campaign.
3. Add nonsense text and other anti-anti-spam measures.
4. Start sending spam.

I believe that the nonsense spams happen when step 2 fails, either because of a bug in the spam-bot or because the spam-server has itself been shut down.

Well written software would just stop at this point, but spam-bots don’t have to be good and the software just marches on, adding the anti-bayesian text to the non-existent ad and sending the resulting ad-free nonsense spam out to the world.

And, for a final ironic twist, because the nonsense spams don’t have ads in them they’re more likely to get through the bayesian anti-spam filters and end up in your inbox!

1 Response to “Why Do We Get Nonsense Spam?”

  1. 1Chris Chesher on Dec 5, 2007 at 12:06 pm:

    My understanding is that the random text is an effort to poison the databases that train spam filters to distinguish spam from non-spam. If a message gets past the filter, it is categorised as legitimate. This successful, but barren intrusion slightly changes the definition of legitimacy, and makes it more likely that a similar message, including an advertising payload, will get through next time.