In the late 1940’s Alan Turing wrestled with the thorny problem of how we’d recognize (computer) intelligence if it walked up and bit us. His answer was his famous Turing Test, which essentially takes the position that if a computer program can consistently fool us into thinking it’s a human then we have to agree that it is intelligent.
Some clever ducks at MIT have turned this on its head to demonstrate the lack of intelligence among certain groups of humans. 3 MIT grad students wrote a program (SCIgen – An Automatic CS Paper Generator) which:
…is a program that generates random Computer Science research papers, including graphs, figures, and citations. It uses a hand-written context-free grammar to form all elements of the papers. Our aim here is to maximize amusement, rather than coherence.
The papers so generated are filled with long, vaguely coherent phrases of buzzwords and jargon, as can be seen from this splendid abstract:
Many physicists would agree that, had it not been for congestion control, the evaluation of web browsers might never have occurred. In fact, few hackers worldwide would disagree with the essential unification of voice-over-IP and publicprivate key pair. In order to solve this riddle, we confirm that SMPs can be made stochastic, cacheable, and interposable.
On its own, this would be a reasonably clever, the nerds’ equivalent of a nifty party trick. These folks, however, took it to the next level and actually submitted two SCIgen papers to conferences for review. They cleverly picked on WMSCI 2005, where one can assume the reviewing standards can’t be too high given that they had 2904 papers at their Orlando conference last year! (How in the world can they possibly provide comprehensive review of that many papers?!? The working assumption is the conference exists primarily as a money making venture for its organizers and a vacation opportunity for its participants.)
They in fact succeeded in bamboozling WMSCI and got one of their SCIgen papers accepted! The SCIgen folks immediately started celebrating their victory and raising funds to attend the conference, where they intended to give a randomly generated talk. WMSCI has, not surprisingly, pulled their paper after learning of its provenance. Their web site has this lengthy response to the situation, with the primary “justification” being that the paper was accepted as a non-reviewed paper. Apparently none of the reviewers assigned this paper actually responded, and the organizers seem to have a policy of accepting papers that receive no reviews.
They justify this at some length by going on about the value of not having reviews and allowing posterity and reference counts sort it all out. This is an idea that has some merit, and plays key role here in the world of blogs, where there is no peer review before the fact (anyone can publish whatever they want). In this case, though, it comes off as a fairly lame excuse, especially since they tried to get reviews and just failed to receive any.
Apparently a letter from a conference organizer included this remarkable statement:
I am not sure how unethical are these bogus submissions, and if there is some way to detect all of them in a large conference.
Leaving the ethics aside for the moment, I would think that if one’s reviewing process was unable to successfully identify wackiness of this order, then there are fairly serious problems. I’ve been marking calls for submissions from these conferences as junk mail for several years now, and this certainly makes me feel better about that :).
All this reminds me of a cool CACM article from 1997 entitled “The Ultimately Publishable Computer Science Paper for the Latter ’90s: A Tip for Authors”. The conclusion from that gem was:
Blah blah blah Internet. Blah blah blah blah. blah blah blah blah Web. Blah blah. Blah blah Java. Blah blah blah blah. Yadda yadda yadda.
A key difference, however, is that the CACM editors understood the joke when they accepted that article for publication.
Emily started all this by pointing me at this CNN article. Thanks!
The SCIgen folks will even allow you to generate your own papers. Sub-Evil Boy and I now have a paper entitled The Influence of “Smart” Technology on Algorithms co-authored with Albert Einstein, Paul Erdös, and Isaac Newton. I now have an Erdös number of 1 – Hurrah!
So, if one pulls suspicious phrases from the paper and googles them, will they lead you to a slashdot post or the like? Where DID they get the phrases that they put in?
This is way better than the insight a member of my Senior Marketing project team had. She had it on good authority that the prof did most of her project grading with a ruler. We gathered lots of charts and clippings and survey data, then I spent a Sunday weaving some appropriate prose around the whole mess. It measured almost 40mm thick, and we got an A for our insightful analysis of the Oregon Wine industry’s marketing. We also did pleny of sampling.
Sheer genious. I have written software to randomly generate web sites – as an amusing experiment. I have proven that if tweaked toward a topic – the sites are diffucult to prove that they are made up. Of course they do contriute to the data pollution which is the vast majority of the internet :(