This has to be the most pointless science “experiment” that I have ever come across. That article was in today’s print edition of The Times of India; the original researcher, Jesse Anderson’s report is here. The claim is that a bunch of computer-simulated “monkeys” have typed all of Shakespeare’s works — as, theoretically, it is widely claimed, is possible.
The reason it annoys me is that probability theory is already confusing enough, not just to lay people but even to experts, that there is no need for headlines like this to mess things up more. The probability of monkeys, typing insanely fast, reproducing a single page of Shakespeare accurately — let alone his entire oeuvre — is vanishingly small. If it is not likely to happen in the age of the universe, it is fair to say that it is impossible. This equally applies to virtual monkeys, on present-day computers (and any imaginable future computers).
And, if you read the TOI article, it turns out that this is not what is happening. The virtual monkeys are generating random text. Any sequence of 9 characters that happens to appear in Shakespeare is deemed to be “correct”. Once all “9-mers” in a Shakespeare work have been typed (in arbitrary order), that work is deemed to be complete.
Let’s simplify things and reduce the Shakespeare works to the uppercase and lowercase letters; the ten digits; the space; and nine punctuation marks (single and double quotes, full stop, comma, semicolon, colon, dash, question mark, exclamation point). That gives us 72 characters. How many 9-mers can be constructed of these characters? The answer is 729 = 51998697814228992. If the monkeys typed a million characters a second, they would need 1648 years to reproduce a single string of 9 characters.
So how do Anderson’s “monkeys” do it? By simplifying even further. Anderson considers only the 26 lowercase letters and no punctuation (not even spaces). Then there are 269 = 5.5 trillion possible 9-mers, a feasible number to explore exhaustively, which is all his monkeys are doing. Every time a 9-mer “agrees” with a 9-mer in Shakespeare, it is deemed a “hit”, and a Shakespeare work is deemed reproduced if it is entirely covered in “hits”.
In a little over a month, over 5 trillion of these 5.5 trillion 9-mers have been reproduced by the monkeys. Why 9-mers? Obviously to make it interesting. On the same computers, all possible 8-mers would have been produced in about 1-2(*) days — hardly very newsworthy. (And, to take a trivial example, all possible 1-mers or 2-mers would have taken a few milliseconds.) All possible 10-mers would have taken a couple of years(*) — perhaps the media would have lost interest, or perhaps the computer time would have been too expensive.
Having produced each one of the 5.5 trillion possible sequences of 9 letters, the monkeys will, by the author’s definition of “reproduced”, have reproduced not only all of Shakespeare, but all of the literature ever written in the English language (and other languages in the Roman script) since the beginning of time — and done that in barely a month. And if the authors had chosen 7-mers instead of 9-mers, it would have taken only a few hours. And by typing “a b c d e f g h i j k l m n o p q r s t u v w x y z”, I have reproduced all of Shakespeare in 1-mers: just strike off every character there against Shakespeare’s folio, ignoring case, space, punctuation and all non-letter symbols, and see what is left.
The only thought that occurs to me is — what a waste of computer resources.
(*)edit — these numbers corrected from first draft