William Beutler on Wikipedia

Archive for August 2010

Charted Territory: When Good Infographics Go Bad

Tagged as , , , , , , , , , , , , , , , ,
on August 12, 2010 at 8:32 pm

I will be blunt: the new infographic from David McCandless (Information is Beautiful), called “Articles of War: Wikipedia’s lamest edit wars“, is so lazy as to be misleading, glib as to be condescending, and generally unhelpful that I’m inclined to say that it sets back the public understanding of how Wikipedia works all by itself.

Up front: I respect McCandless and like what he does, which includes some interesting and thoughtful work, especially his print of Left vs. Right (U.S. and Rest of the World editions) that is better than most professional political analysts could produce. Separately, I am collaborating with friends on a Wikipedia visualization project of our own, so call me an interested observer, but note also that I’ve been thinking about this kind of thing lately.

I have reproduced only the top section of “Articles of War” below, for the purposes of commentary (click through to see the full thing on McCandless’ site):

Articles of War (excerpt)

The first thing to know about “Articles of War” is that it was based on an essay to be found in the recesses of Wikipedia called “Lamest edit wars” that is specifically kept in the site’s intra-wiki space because, as it states at the top: “This page contains material that is kept because it is considered humorous.” McCandless & Co. do give credit where it is due, but that Wikipedia page surely does not and never did intend to be definitive — it’s just a series of cheekily-written paragraphs about various arguments occurring over time, so there is nothing like meaningful numbers to be gleaned from it.

Instead, McCandless and his researchers decided to generate data to visualize these edit wars by counting the total number of edits over each article’s lifetime, counting not just the edits specifically related to that particular dispute (a difficult and time-consuming thing to research, it goes without saying) but every single edit, ever, thereby giving a grossly distorted view of each article’s history. I’ll give them the fact that if one looks to the legend in the top lefthand corner, it indicates that the number listed (and I presume the size of each box) relates to the “Total no. of edits” but even if readers do notice that, it is at best confusing.

Likewise, the articles’ relative position on the chart accords to their creation, not when the described dispute took place. If you think 2,000+ edits were expended on a photograph in the Cow-tipping article in the middle of 2001, that’s too bad, but you were reasonably misled. Nor would would you know that the article did not include a photograph until several years later.

What you are left with is a decent visualization of how frequently edited some randomly selected articles — some popular, some timely, some but not all controversial — happen to be. Why not simply show that? Focusing on this alone we can see that the following articles have attracted tens of thousands of edits over the years:

  • The Beatles
  • Jesus
  • Wikipedia
  • Christianity
  • Ann Coulter
  • Star Wars
  • Wii

That’s not linkbait enough for you? Then please do the research.

Meanwhile, the infographic is also a little too snarky for its own good, especially toward its chosen subject. Color-coding is used to categorize certain types of edit wars; one is labeled “American Cultural Superiority” and exists mainly to identify debates between U.S. and British spellings. Which I find a little… superior itself, but hey, I suppose it’s a misdemeanor violation. Worse is that edit wars involving Wikipedia and site co-founder Jimmy Wales are coded as “Religion.” Too cute. Or maybe just an oversight?

Another oversight concerns an on-wiki debate about whether the most famous Palin was, at the time of its occurrence, Monty Python’s Michael or Alaska’s former governor Sarah. (Since then, I believe the one with decades of contributions to comedy has been definitively usurped by the mavericky one’s more recent, er, contributions.) According to “Articles of War” this happened in 2003. But if you think about it, this makes no sense at all — of course this happened in 2008, when John McCain chose Sarah Palin as his running mate. And the Lamest edit wars essay itself mentions that this happened in 2008. Pure oversight to be sure, but I have to wonder what other mistakes the research team made.

To their partial credit, they have opened their Google Spreadsheets for public inspection, so it’s clear they at least intended to impart real information. And there you can see that they are indeed using the total number of edits over time and that their “Palin” error was made early on. That seems to put the responsibility on the researchers, rather than McCandless himself, but of course it’s a total package.

I hold McCandless to a standard that I don’t the jokers at Cracked* or Something Awful because their job is to make you laugh, while McCandless’ job, according to his website’s own tagline, is to take “issues, ideas, knowledge, data” — and make it easier to understand by visualizing it. There are certainly issues and ideas to be found in “Articles of War” — but knowledge and data, not so much. And though I am getting a little more rant-y than usual about this, I do aim to be constructive, so I would very much like to see this infographic re-done with some extra research. This blog post may serve as a guide if they so choose. I hope they do.

P.S. The Gizmodo thread — where I found it — on this is hilarious, with many people re-fighting the same disputes that once arose on Wikipedia. However, only one that I saw came anywhere near noticing the fact that the methodology was suspect.

P.P.S. Am I being nitpicky to add that “Articles of War” appears to convey that Wikipedia’s articles about The Beatles and Jesus were created prior to 2001? That is to say before Wikipedia itself began? I don’t actually think so.

*Actually, about Cracked — a.k.a. Digg’s favorite website — as I have seen a prominent Wikipedian point out elsewhere, it often does a pretty good job using information from Wikipedia responsibly. Among their articles about Wikipedia, the title of “5 Terrifying Bastardizations of the Wikipedia Model” alone gives away that it’s implicitly pro-Wikipedia, as does “5 Celebrity Wikipedia Entries they Clearly Wrote Themselves“. Even “8 Most Needlessly Detailed Wikipedia Entries” knows what’s good about Wikipedia, even when it isn’t. Cracked writers clearly know their way down through a history page — like say, Corey Feldman’s — but it doesn’t appear that McCandless and his researchers looked as closely.

They Send You a Cease and Desist Letter, You Send One of Theirs to the Morgue

Tagged as , , , , , , , ,
on August 4, 2010 at 6:46 am

Apparently the Federal Bureau of Investigation, the nation’s top cops, the G-Men, the public enemies of all public enemies, have found a new target: Wikipedia! The New York Times ran a short article yesterday about a funny-if-it-wasn’t-serious situation whereby the FBI recently sent a letter to the San Francisco offices of the Wikimedia Foundation

demanding that it take down an image of the F.B.I. seal accompanying an article on the bureau, and threatened litigation: “Failure to comply may result in further legal action. We appreciate your timely attention to this matter.”

But the Foundation won’t budge:

The problem, those at Wikipedia say, is that the law cited in the F.B.I.’s letter is largely about keeping people from flashing fake badges or profiting from the use of the seal, and not about posting images on noncommercial Web sites. Many sites, including the online version of the Encyclopedia Britannica, display the seal.

Other organizations might simply back down. But Wikipedia sent back a politely feisty response, stating that the bureau’s lawyers had misquoted the law. “While we appreciate your desire to revise the statute to reflect your expansive vision of it, the fact is that we must work with the actual language of the statute, not the aspirational version” that the F.B.I. had provided.

The relevant statute, helpfully linked by the New York Times, states:

§ 701. Official badges, identification cards, other insignia

Whoever manufactures, sells, or possesses any badge, identification card, or other insignia, of the design prescribed by the head of any department or agency of the United States for use by any officer or employee thereof, or any colorable imitation thereof, or photographs, prints, or in any other manner makes or executes any engraving, photograph, print, or impression in the likeness of any such badge, identification card, or other insignia, or any colorable imitation thereof, except as authorized under regulations made pursuant to law, shall be fined under this title or imprisoned not more than six months, or both.

I do find it ironic, considering that Wikipedia and other projects administered by its parent organization are among the most scrupulous on the whole of the Internet about respecting copyright law.

In most circumstances, Wikipedia requires that images used on the site be in the public domain or released under a free license explicitly permitting such use. Only in circumstances where there is no hope a suitable alternative may be available does the site allow copyrighted images, and only then under very limited circumstances. If you want to use the Nike swoosh on your user page or the article about Michael Jordan, no such luck but you will certainly find it on the company’s corporate profile.

The FBI seal, as a work of the United States government, falls under the first category — it is considered public domain — but its use is nevertheless limited to pages about certain FBI-specific subjects. And the photo’s page on the Wikipedia server even includes this helpful advisory:

fbi_logo_wikipedia_licensing

With no sources inside The House J. Edgar Hoover Built, I’m puzzled as to why they would do this. Perhaps they got the site confused with WikiLeaks?