Home » Okay, these spambots are getting scary now

Comments

Okay, these spambots are getting scary now — 17 Comments

  1. My guess [^un^educated] in this instance is that there’s minimal software out there that searches for keywords in a blog post, and then inserts words related to a found keyword into the spam post — and then neo gets to deal with the spam post.

    Computer geeks out there, I would not at all mind being better educated in this. But the above is my guess (worth what you paid for it).

  2. M J R: yes, there is. But I think you miss my point.

    A lot of bots are like that; it’s nothing new. A lot of bot comments are at least tangentially related to the subject of the post, or a word that appears in it. But only tangentially, and only in passing. These comments I highlighted here are related to the total post—the message of the post rather than a word here and there. It is much more as though a real human being has read the post and is responding.

  3. Perhaps the dawning of the Singularity will appear first in spambots. After that…. SkyNet. And after that we will not be spambotted but “Schwartzeneggered!”

  4. On another note, I’ve noticed the same “intelligence” creep in the spambots hitting my site. I used to be able to spot them all just by their content (or lack thereof). Now I often have to open the original post to see exactly what is going on.

    Last week these and other spam attacks were in the thousands and so many were getting through the filters I had to disable commenting until the storm passed.

    Even now the assault continues. The spam filter in the last two days has intercepted 2,300 spambots. The leakage has, however, gone down substantially. What was hundreds of spambots getting through over the weekend is now down to a manageable dozen or so a day.

  5. And looking at the contents of the spam filter right now I note that the latest one caught reads:

    “Several of these replies on this post are garbage, You should delete them.”

  6. neo, 1:59 pm —

    I did miss the entire point; I saw where you were going, but I got sidetracked in my search for a coherent explanation. Now I got it, having needed to be hit over the head with it. Thanks for your forebearance.

  7. neural net and genetic algorithms…

    the same tools behind the non player characters in games, expert systems, etc..

  8. Some of these “bots” (and I have to wonder if that is indeed what they are) are getting close to passing-the-Turing-test territory.

    Most interesting.

    Jamie Irons

  9. }}} the non player characters in games,

    I play computer games. Most times, the bots just CHEAT.

    They either use knowledge the user doesn’t have, speed the user can’t have, and/or have their “stats” enhanced to make them “recover” tougher or attack harder… So, if you presume any “character type” has a certain number of “hit points”, ‘z’, which recover a 1/second, and an attack with item ‘x’ causes ‘y’ damage, then the bot will have 1.5z hit points vs a player, recover hit points 2x faster, and/or either do yx2 damage or strike 1.5x as often… things like that.

    Game AI has a long way to go to really get close to the singularity… and there’s at least as much impetus to make good game AI as there is to make better spambots.

    My guess would be that the selection scripts have gotten much more sophisticated — say you start with 10,000 snippets of text, cross-correlated on keywords. The bot then looks at the primary post and makes best one based on the number of keywords appearing in it.

    Neo: Keep in mind that you are actually seeing a potential “observation bias” in your selection, there. You’re noting all the hits, you have no idea how many misses came from the same bot(s), I’d bet.

    That is — I can’t imagine you scanned the obviously bad ones to see if they pointed to the same sites as some of the good ones.

    A guess, but natural language recognition is really NOT that easy a problem.

    That was, BTW, one of the key things about Watson on Jeopardy… it was actually parsing the questions, not being told what to look for. And that’s a VERY controlled, specific, scenario. And it still got it laughably wrong sometimes, ignoring key words that made the answer provided notably clear — example: Final Jeopardy in the first game:

    Watson was the only contestant to miss the Final Jeopardy! response in the category U.S. CITIES (“Its largest airport was named for a World War II hero; its second largest, for a World War II battle”). Rutter and Jennings gave the correct response of Chicago, but Watson’s response was “What is Toronto????”

    See the wiki entry for guesses as to how it screwed that one up.

    P.S., Watson won it handily, since it only wagered a small amount, and was up quite a bit at that point.

    Now, sounds promising for spambots? Nope:

    Watson is made up of a cluster of ninety IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) with a total of 2880 POWER7 processor cores and 16 Terabytes of RAM. Each Power 750 server uses a 3.5 GHz POWER7 eight core processor, with four threads per core.

    That probably doesn’t mean a lot to you non-techies, but it’s a bit more processing power than any spambot is likely to manage to have.

  10. Actually, Jamie, no — even Watson couldn’t get close to the Turing test. See above for my guess as to what the spambots are actually doing.

  11. The coding for something like this is actually very simple, but its also very tedious, and it wouldn’t run very fast. But if the payoff is that they get a post up longer and more people read it…

  12. Baltimoron — yes, you spend a lot more time on creating the database you pull entries from, and adding keywords that don’t appear in its own text.

  13. It wouldn’t surprise me in the least if the new economic reality generated jobs for out of work college grads paid by websites to write commentary that includes links back to the website. Vanderleun’s site was getting strange encrypted types of messages that were only letters and numbers. I was trying to imagine just how bad it must have been for the incoherent ones to get through like that.

  14. Bupkis,

    I figure someone took a dozen or more common subjects for blog postings and identified three or four keywords that could identify each one. Once you’ve categorized a blog, you build a string using a series of if/else statements that have searches for a second set of more specific keywords as their arguments.
    As I said, it would be tedious, but a smart person could write a program that gives a very good approximation of a real post.

  15. I confess. It was me that wrote:

    “Speaking of Zero Mostel, we just threw out a toilet seat he had autographed. It said “Zero Mostel shat here” and had a cartoon sketch of himself. He “created” it when he played my father-in-law’s tent theater, loafing through Tevye, and still bringing the house down.”

    It must have been in a blog comment. Only about 5 people in the world would know about that toilet seat and its history. I think the bot is mining your blog’s comments and matching keywords.

  16. Pat: aha! I think you are correct. Makes sense.

    Sneaky, clever bots. But not so brilliant as I’d thought.

Leave a Reply

Your email address will not be published.

HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>