Author: Anthony Bonifonte

Math Comics Part 1: Saturday Morning Breakfast Cereal

I like math.  And I like humor.  Especially humor that involves math.  So, this is my inaugural post highlighting some of my favorite math based webcomic strips.  I'll start with my personal favorite, Saturday Morning Breakfast Cereal.  There will be a lot of these, as the strip is consistently clever and witty (despite there being over FOUR thousand strips, dating back to 2012.)  Since I have so many I enjoy, I'll start with just those from 2016.  If you've taken a class with me, you've seen some of these on your homework.

The Boy Who Cried Wolf

Math Education

That's Why I'm a Frequentist

What Researchers Study

Markov Spam

After a long hiatus from blog posts, I returned to laugh at the comments on some of my posts, most notably:

spam-comment

Unless crimsonchilla is translating from Holikachuk to Klingon to Erebonian to English (I want to borrow his translator if he is!), I'm pretty sure this is spam.

02may-monty-python-spam-sketch

(That Monty Python skit is, by the way, the origin of the word 'spam' meaning undesirable mass generated messages.  Look it up if you don't believe me!)

 

But what's more interesting than the hilarious words in my comment is ... how sensible it is.  The sentence structure is elaborate, the sentences almost make sense.  How do they automatically create such elaborate spam?  Math, of course!

 

Spam messages are created one word at a time with a probability process called Markov chains.  Markov chains are simple and remarkably versatile tools, with the key property that future development depends only on the current state.  What this means is once the spam has picked some words to write, the next word is randomly picked depending only on the previous word.  This explains why my spam message has so many sensible phrases, such as "electric cords", "athlete's foot", "litter box", and "boa constrictor".  These are incredibly common coupled phrases in the English language, so once the message has printed "litter", there's a very good chance the next word will be "box" and not "cords".  Feed your Markov chain algorithm tons of English text, it will "read" them, and be able to construct sentences with sensible word pairing, if not overall message.

 

An excerpt of a mock Shakespeare passage written using this technique, from Princeton :

I care not for meed!
	This I must woo yours: your request than your father: the time,
	That ever love I broke
	my sword upon some kind of men
	Then, heigh-ho! sing, heigh-ho! sing, heigh-ho! sing, heigh-ho! unto the needless stream;
	'Poor deer,' quoth he,
	'Call me not so keen,
	Because thou the creeping hours of the sun,
	As man's feasts and women merely players:
	Thus we may rest ourselves and neglect the cottage, pasture?

If I was bored and reading that in 11th grade, I may not have noticed the difference!

One famous parody of this technique is Josh Millard's Garkov, which auto-generates Garfield comics: Garkov

 

Interestingly, the tool that generates these spam messages is the same tool used to detect and prevent them!  Markov chains are built to weed out spam and protect your inbox,  see for example this undergraduate's poster:  Durham This turns the scene into a butter-battle arms race between spammers and spam blockers (splockers?) to further their revenues.

 

For more details about Markov chains and spam, see Markov Chains and Spam

If you just want to have some fun with it yourself, you can jump in without any coding at : Demo

 

Until next time, Consider their curls got entangled together at one of these cakes!

 

Elevator Optimization

Attending a conference recently, I stayed on the 6th floor of a hotel that had 8 floors.  2 elevators scuttled visitors to their destinations, although 1 was closed for renovation.  Despite the elevator travelling at a good speed, I noticed some long waits for the elevator to arrive.  I also noticed an inefficient operating plan: when the elevator deposited me on my 6th floor, it would stop and wait there until called again.  As a mathematician and efficiency expert is bound to do, this got me thinking: surely it would be more efficient for the idle elevator to move nearer to floors that it was more likely to be called from.  Exactly half the elevator calls must originate from the lobby, so shouldn't the elevator's resting state be much closer to the ground floor so as to minimize waiting times?  As Professor Farnsworth said, "I'm afraid we're going to have to use ... math!"

 

 A Stochastic Model of Elevators

Assume the structure in question (hotel, skyscraper, etc.)  has 1 ground floor (or "lobby") and m additional floors.  For simplicity, assume an equal amount of traffic visits each of the m floors.  Therefore, exactly half the elevator calls will originate in the lobby, and the remaining calls will be spread uniformly across the m floors. Furthermore, for simplicity assume the elevator travels at unit speed (one floor per time unit). Under the elevator operation method described in the introduction, which I shall call the "lazy model", when idle the elevator will stay where it is. Let us first consider the case of a single elevator:  If we let X denote which floor the elevator is on, where X takes values in {1,..m+1}, the pdf of X (equivalently, where the elevator is called from) is given by

f(x) = \begin{cases} \frac{1}{2}, & \text{ for } x=1; \\\frac{1}{2m}, & \text{ for } 2 \leq x \leq m+1 \end{cases}

If a guest at floor x calls the elevator, and the elevator is currently at floor y, the guest must wait |x-y| time units.  Therefore a guest at floor x waits, in expectation,

W_x=\sum_{y=1}^{m+1} |x-y| f(y) = \sum_{y=1}^{x-1}(x-y)f(y) + \sum_{y=x+1}^{m+1}(y-x)f(y)

For the lobby (x = 1), this simplifies to

W_1=\sum_{y=2}^{m+1}(y-x)\frac{1}{2m}=\frac{1}{2m} \sum_{a=1}^m a = \frac{m+1}{4}

And for other x,

W_x=(x-1)\frac{1}{2}+\sum_{y=2}^{x-1}(x-y)\frac{1}{2m}+\sum_{y=x+1}^{m+1}(y-x)\frac{1}{2m}

=(x-1)\frac{1}{2}+\sum_{a=1}^{x-2}a+\sum_{a=1}^{m+1-x}a

=\frac{2x^2-6x+\color{red}{m^2+m+4}}{\color{red}{4m}}

Don't worry, this expression is not as complicated as it looks, and for a constant building size, all the red terms are a constant.  Note that the wait time is quadratic in x.  To examine the expected average wait time of a guest, we calculate

E[W]=\sum_{x=1}^{m+1}W_xf(x)=\frac{1}{2}W_1+\frac{1}{2m}\sum_{x=2}^{m+1}W_x

=\frac{1}{3}m+\frac{1}{4}-\frac{1}{12m}

So the expected average wait time from calling an elevator until its arrival is linearly increasing in m.  If m=5 the average wait is 1.9 time units; if m=10 the average wait is 3.575 time units; and if m=50 the average wait is 16.915 time units.  But can we do better?

 

Minimizing wait time

An optimal solution is for the elevator to relocate itself to the expected spot it will be called from.  This expected call is

E[f(x)]=\frac{1}{2} + \frac{1}{2m}\sum_{i=2}^{m+1}i = \frac{5}{4} + \frac{m}{4}

Note that at this spot, the elevator is not in general at an integer floor!  That is, the elevator may stop halfway between floors.  This will slightly complicate the analysis.  We shall denote the wait under these conditions at floor x by W_x^{'}.  Let \bar{x}=\sup\{x:x\leq\frac{5}{4}+\frac{m}{4},x \in Z\} denote the highest floor at or below the elevator.  Then

W_x^{'}=\begin{cases} \frac{5}{4}+\frac{m}{4}-x, & \text{ for } x\leq\bar{x}; \\x-\frac{5}{4}-\frac{m}{4}, & \text{ for } x > \bar{x} \end{cases}

Compared to the previous case, wait time is only linear in x.  Calculating the expected average wait time of a guest:

E[W^{'}]=\sum_{x=1}^{m+1}W_x^{'}f(x)=\frac{1}{2}W_1^{'}+\frac{1}{2m}\sum_{x=2}^{\bar{x}}W_x^{'}+\sum_{x=\bar{x}+1}^{m+1}W_x^{'}

=\frac{1}{2}(\frac{1}{4}+\frac{m}{4})+\frac{1}{2m}\sum_{x=2}^{\bar{x}}(\frac{5}{4}+\frac{m}{4}-x)+\frac{1}{2m}\sum_{x=\bar{x}+1}^{m+1}(x-\frac{5}{4}-\frac{m}{4})

If m=5 the average wait is 1.6 time units; if m=10 the average wait is 3 time units; and if m=50 the average wait is 14.25 time units.  This represents a 16% improvement in expected wait time over the naive approach.  This may not seem like a lot, but is a significant savings for no extra cost by simply reprogramming the elevators!  Next time, I will examine the case of multiple elevators.

 

Lottery Math

Mention the word "lottery" around any mathematician, and you're likely to hear a disparaging outcry to the effect of "lotteries are a tax on the math illiterate!"  But can it ever make sense to buy a ticket?  Let's use some concepts from basic probability to examine this question.

 

For this analysis, I will be examining Mega Millions, a popular national lottery played in 43 states.  In the game, you pick 5 numbers (consistent with their description, I shall call these "white numbers") from 1 to 75 without replication and one number (the "yellow number") from 1 to 15.  A ticket costs $1, and the goal is to match as many numbers as possible to the random numbers drawn and declared winners (How they ensure the true randomness of these winners is a post for another day).  From the Mega Millions website, the payout structure is as follows:

White matched Yellow matched Prize
5 1 Jackpot
5 0 $1000000
4 1 $5000
4 0 $500
3 1 $50
3 0 $5
2 1 $5
1 1 $2
0 1 $1

I will analyze the jackpot and non-jackpot prizes separately, starting with the latter.

 

Non-jackpot Prizes

To begin, consider the case where a player chooses only a single white number and a single white number is declared the winner.  Of course, the probability the chosen number matches the winning number is \frac{1}{75}.  For the general case in which k white numbers from 1 to n are chosen/drawn, using basic counting principles from combinatorics, the number of possible outcomes in which numbers are selected correctly is  {k \choose c}{n-k \choose k-c}. In particular, for our problem we have n = 75, k = 5.  Independently, we have to consider the yellow number (of which there is 1 outcome in which it is matched, and 14 outcomes in which it is no matched).  Dividing by the total number of outcomes {75 \choose 5}{15 \choose 1}, the probability of selecting  correct white numbers and the yellow number is

\frac{{5 \choose c}{70 \choose 5-c}}{{75 \choose 5}{15 \choose 1}}

and the probability of selecting c correct white numbers but not the yellow number is 

\frac{14{5 \choose c}{70 \choose 5-c}}{{75 \choose 5}{15 \choose 1}}

.

These probabilities are calculated for every value of c in the table below.  To calculate the expected value of the non-jackpot prizes, we multiply the probabilities of each outcome by the prize for that outcome, and sum across all outcomes.

White matched Yellow matched Prize Chances of Outcome Expected Value of Outcome
5 0 $1000000 1 in 18492204 $0.0541
4 1 $5000 1 in 739688 $0.00676
4 0 $500 1 in 52835 $0.00946
3 1 $50 1 in 10720 $0.00466
3 0 $5 1 in 765.7 $0.00653
2 1 $5 1 in 472.9 $0.01057
1 1 $2 1 in 56.5 $0.0354
0 1 $1 1 in 21.4 $0.0467
Sum $0.174

So, the expected value from non-jackpot prizes on a $1 ticket is 17 cents.  Hrm, not much reason to buy a ticket.  Let's see if we can do any better with the jackpot.

 

Jackpot Prizes

For our expected value of the ticket to equal $1, we must recover 83 cents ($1 - 17 cents expected from non-jackpot prizes). An immediate calculation in the same style as the previous section reveals the chances of winning the jackpot by matching all 5 white numbers and the yellow number are 1 in 258,890,850. To achieve the 83 cents, the jackpot needs to be ~$215 million.  You might be tempted to think that for jackpots over this $215 million, your total expected value is greater than $1, so you should buy a ticket.  Not so fast!

The complication is that multiple entries can match all 6 numbers and be declared winners.  In such a situation, the pot is split evenly between all winners.  From anecdotal evidence, more people play when the jackpot goes higher, thus reducing your chances of being the only winner if you do win.  To calculate these probabilities, we need some data on the number of entries to the lottery.  The image below shows a scatter plot of the tickets purchased against jackpot prize for 12/03/13-2/18/14, data courtesy of www.lottoreport.com.

Prize vs Tickets

 

This data follows a near perfect exponential curve.  Interpolating from the plot, this means we expect ~43,000,000 tickets to be purchased at the ~$215 million jackpot level.  Using the probability density function of the binomial distribution, we know that the probability of obtaining k successes from trials each with probability is given by  {n \choose k}p^k(1-p)^{n-k} Thus, if you win the jackpot, the probability you are the only winner is ~0.847, the probability there is one other winner is ~0.141, two winners ~0.0117, three winners ~0.000647, four winners ~0.0000269, and insignificant for five or more winners.  Thus we have to modify the expected value of our jackpot winning by these probabilities, and the actual expected value is ~$198 million, below our cutoff to declare purchasing a ticket a rational decision.  If we repeat this analysis for a ~$250 million prize, we interpolate ~51,000,000 tickets are purchased at this jackpot level.  In this case, we find the discounted expected value is ~$227 million, sufficient to declare purchasing a ticket a rational decision!

 

The Conclusion

Regardless of the jackpot, buying a $1 lottery ticket has an expected value of $0.17 in non-jackpot prizes.  If the jackpot prize is ~$250 million, based on interpolating the number of tickets purchased following an exponential curve, we have that the expected value of the jackpot prize is ~$0.87.  If the prize goes higher, the increase in ticket sales will not be enough to lower the expected value of the jackpot prize.  So, I declare if the jackpot is $250 million or more, it is a rational decision to buy a lottery ticket.  Eat that, fellow mathematicians!