Month: September 2016

Markov Spam

After a long hiatus from blog posts, I returned to laugh at the comments on some of my posts, most notably:


Unless crimsonchilla is translating from Holikachuk to Klingon to Erebonian to English (I want to borrow his translator if he is!), I'm pretty sure this is spam.


(That Monty Python skit is, by the way, the origin of the word 'spam' meaning undesirable mass generated messages.  Look it up if you don't believe me!)


But what's more interesting than the hilarious words in my comment is ... how sensible it is.  The sentence structure is elaborate, the sentences almost make sense.  How do they automatically create such elaborate spam?  Math, of course!


Spam messages are created one word at a time with a probability process called Markov chains.  Markov chains are simple and remarkably versatile tools, with the key property that future development depends only on the current state.  What this means is once the spam has picked some words to write, the next word is randomly picked depending only on the previous word.  This explains why my spam message has so many sensible phrases, such as "electric cords", "athlete's foot", "litter box", and "boa constrictor".  These are incredibly common coupled phrases in the English language, so once the message has printed "litter", there's a very good chance the next word will be "box" and not "cords".  Feed your Markov chain algorithm tons of English text, it will "read" them, and be able to construct sentences with sensible word pairing, if not overall message.


An excerpt of a mock Shakespeare passage written using this technique, from Princeton :

I care not for meed!
	This I must woo yours: your request than your father: the time,
	That ever love I broke
	my sword upon some kind of men
	Then, heigh-ho! sing, heigh-ho! sing, heigh-ho! sing, heigh-ho! unto the needless stream;
	'Poor deer,' quoth he,
	'Call me not so keen,
	Because thou the creeping hours of the sun,
	As man's feasts and women merely players:
	Thus we may rest ourselves and neglect the cottage, pasture?

If I was bored and reading that in 11th grade, I may not have noticed the difference!

One famous parody of this technique is Josh Millard's Garkov, which auto-generates Garfield comics: Garkov


Interestingly, the tool that generates these spam messages is the same tool used to detect and prevent them!  Markov chains are built to weed out spam and protect your inbox,  see for example this undergraduate's poster:  Durham This turns the scene into a butter-battle arms race between spammers and spam blockers (splockers?) to further their revenues.


For more details about Markov chains and spam, see Markov Chains and Spam

If you just want to have some fun with it yourself, you can jump in without any coding at : Demo


Until next time, Consider their curls got entangled together at one of these cakes!