The Secret Message within Wikipedia

At the University of Taiwan, several researchers are bringing message encoding to the next plateau. Funded by Taiwan’s Ministry of Education, under the “Aiming for the Top University” (sic) program, two researchers have uncovered a way to hide secret messages in the revision log of Wikipedia. Their findings are detailed in the paper “A New Data Hiding Method via Revision History Records on Collaborative Writing Platforms.”

By analyzing revision logs of Wikipedia, the researchers have found that certain word pairs are often switched during revisions. For instance, an author may write “U.S.”, while an editor rewrites it to “United States.” The Taiwanese team then constructed a database of the word replacements, and stored them for use in their algorithm.

Instead of focusing on the actual words in the final article like most stenographic techniques, the team encodes messages based on the number of words changed during a revision and the words that were changed. For instance, one author may change 5 words. Another may change 8. The author who changed the word, the word they changed it to, and the number of words changed all combine using a Huffman code to create a binary message.

As a result of the encoding, a document with four revisions can encode one secret bit with approximately 1.44 words. For example, in order to encode the infamous King Edward II assassination letter, “Eduardum occidere nolite timere bonum est”, and assuming an extra byte for the comma, it would take 495 words, or approximately the length of this article.

In addition to encoding in the English language, secret messages can be encoded in Chinese and other languages. Unfortunately, the system does have several weaknesses.

First, similar to most encoding systems, once the technique and secret keys are known, they are easy to decode. When the encrypted messages are publicly available and stored for all eternity, as with Wikipedia, all previous messages can be cracked as well.

Next, the presented use case for the algorithm was a little weak. Perhaps in a nod to the Ministry of Education, which funded the grant, the authors imagined a world where teachers would transmit secret messages to parents, through a school-hosted Wiki, so that students would not know what was said about them. When I went to school, that was called “Parent-teacher conference,” and I usually found out exactly what was discussed afterward from my parents, often accompanied by the disappearance of my video games and TV privileges.

Finally, the actual text replacement database seems suspect, since in the author’s primary example, they changed the phrase “you are not wrong” to “you is right.”

There are a few interesting next steps, however. For one, this algorithm could be applied to images or video. For instance, tyrannous creative directors could use a host of seemingly sadistic change requests as a way to transfer secret messages to their designers. What would really take the algorithm to the next level, however, is if the authors making the revisions were real authors, making real revisions, and unknowingly part of the plot. This could be accomplished with a root-kit virus infection on their PC, turning off their computer or pushing the “submit” button at just the right time to encode a secret message in their document.

All in all, the concept is a fun read, although not entirely practical. Possibly the paper bears more significance near China, where government censorship is commonplace and Internet communication is filtered. When totalitarian authorities prevent individuals from expressing their true thoughts and feelings, even a silly concept like this provides a breath of fresh air.

Written by Andrew Palczewski

About the Author
Andrew Palczewski is CEO of apHarmony, a Chicago software development company. He holds a Master's degree in Computer Engineering from the University of Illinois at Urbana-Champaign and has over ten years' experience in managing development of software projects.
Google+

Leave a Reply Cancel reply