Dissecting Comment Spam

I’ve been getting a lot of comment spam over the last few days and there is one key element that is the same, and that is they all come via the following referrer:

The IPs may differ, but the referrer has consistantly come from the same address.

I doubt I am the only one being spammed in comments by people coming in via that referrer.

And their “User Agent” seems to always say: “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322)”

Don’t think the User Agent part is that interesting though.

Suggests that the spammer is running Internet Explorer 6 on a Windows 2003 machine, with .NET version 1.1 installed.

I think it would be really easy to speculate without knowing what is with this virtually non-existant What could it be?

I could take a few guesses, but chances are they would be pure speculation based on guesswork.

One of the good things about comment moderation has been that none of the SPAM comments have actually gotten past the moderation and onto the actual blog.

Doing a Google search to find answers to what in the world is so good about, and why IPs being referred to from there are spamming my site, I came across the following blog entry by Thomas Strömberg.

He also noticed the familiar referral pattern, so his solution was to do a mod_rewrite (available only in Apache) using the following code (place it into the .htaccess file):

RewriteCond %{HTTP_REFERER} ^
RewriteRule ^(.*) /asshole-bot

That’s basically to block users who use the above address as their referrer.

I’ve just put that into my .htaccess file, and i’ll see how it goes.

I’ve only decided to investigate this further because it got to the point where I was getting emails coming in asking to approve messages. This morning alone I had 7 when I opened up my mail client.

Ah well, if enough people block/redirect users who come via that referral, they will probably just adjust the referrer to something else, and then the battle with comment spam restarts once again.

It also looks like some blogs out there are have that as the “trackback” url to their posts.

My guess is that there are “infected” computers out there that are forcibly pulling down these pages and adding comments anyway they can.

Infected with what, you ask? Honestly, I have no idea.

Judging by the User Agent (which could be totally bogus), it is targetting a vulnerability in Windows 2003 machines with .NET 1.1.

Again, this is all just pure speculation.

Looking at something like Mitch Denny’s RoryCom, you can see just how easy it would be to develop such a monster.

Ok, maybe a bit of modification of what Mitch developed, but still, just a few lines of C# and we’ve got ourselves a monster.

There is a million and one ways this comment spam monster could be implemented… (I will refer to it as a monster, what type? use your imagination :P)

Perhaps it sets up a little web server on the host computer with an IP address of The hosted page then uses some form of GET request to pull down a single or multiple URLs.

Hrm, some of that may be a little too complex.

But hey, without knowing what is going on, anything is possible!

Update: cindy noticed that they are using the following two User Agents to harvest comment link URLs:

“Fetch API Request” as well as “Microsoft Scheduled Cache Content Downnload Service”.

RewriteCond %{HTTP_USER_AGENT} (Fetch\ API\ Request) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Microsoft\ Scheduled\ Cache\ Content\ Download\ Service) [NC,OR]
RewriteRule .* – [F]

Update: Ok, so the above doesn’t really work the way it should apparently.

After removing the second one, things did work. I noticed that I’ve never actually gotten any User Agent that contained “Microsoft Scheduled Cache Content Download Service” when I looked through my server logs, but I did get the Fetch API Request. So, now I’ve blocked it off.

So, right now what I’ve got contained in the .htaccess file is the following:

RewriteCond %{HTTP_REFERER} ^
RewriteRule ^(.*) /asshole-bot

RewriteCond %{HTTP_USER_AGENT} ^Fetch\ API\ Request
RewriteRule ^.* – [F,L]

Oh, and I found the following list to be handy for the curious.

And for those who are running IIS, you can try ISAPI_Rewrite, which is an ISAPI module for IIS that allows you to do much of the above, but for IIS rather than Apache. (So, basically bringing Apache like functionality to your IIS server).

Final Update: It seems that WordPress likes to strip the off the end of words. So the two mod-rewrites posted by Cindy, should have a trailing after each word. Well, except for the last one.

11 thoughts on “Dissecting Comment Spam”

  1. I found that they are using two other agents to harvest the comment link url’s. I have blocked them too. Check your server logs and I bet you find them too.

    RewriteCond %{HTTP_USER_AGENT} (Fetch\ API\ Request) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (Microsoft\ Scheduled\ Cache\ Content\ Download\ Service) [NC,OR]
    RewriteRule .* – [F]

    It may block some valid offline page readers but better that than spam. I’m not even running MT or WordPress and they still got me… until I blocked them.

  2. Hi Cindy, cool! Thanks for the tip! I didn’t notice those two user agents until now. I checked through my server logs, thus far I’ve only found the Fetch API Request one, and not the Microsoft Scheduled Cache Content Download Service.

    I’ll update the post with your advice!

  3. re. WP stripping words – what is a trailing?
    And thanks for the tip – that comment spam is getting totally out of hand.

  4. Will, the code Cindy sent to you has an error – there should be no OR in the line before the ReWriteRule.

    In other words it should look like:
    RewriteCond %{HTTP_USER_AGENT} (Fetch\ API\ Request) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (Microsoft\ Scheduled\ Cache\ Content\ Download\ Service) [NC]
    RewriteRule .* – [F]

    In my own .htaccess file, I took a different approach, I blocked the ip and it has worked so far. Here’s what I entered in my .htaccess file:
    RewriteCond %{REMOTE_ADDR} ^12\.163\.72\.13$
    RewriteRule .* – [F,L]

    Hope this helps,


Comments are closed.