William Beutler on Wikipedia

Search for “wikigroaning”

Verifiability and Truth: What John Siracusa Doesn’t Get About Wikipedia

Tagged as , , , , , , , , ,
on February 2, 2012 at 6:50 pm

One of my favorite podcasts is Hypercritical, co-hosted by and principally featuring the thoughtful criticisms of John Siracusa, a sometime columnist for Ars Technica and Internet-famous Apple pundit. The show’s tagline calls it: “A weekly talk show ruminating on exactly what is wrong in the world of Apple and related technologies and businesses. Nothing is so perfect that it can’t be complained about.” Last week’s edition—“Marked for Deletion”—was about something far from perfect, but of great interest to this blog: Wikipedia.

If you want to listen for yourself, jump to about 1:11:55 (yes, more than an hour into the show) where Siracusa and co-host Dan Benjamin turn the discussion to Wikipedia. And a warning: this is going to be long. Consider it homage.

♦     ♦     ♦

Promisingly, Siracusa begins by asking his co-host to answer, if he can, “what Wikipedia is”. The answer is pretty good for an outsider: it’s a place for sharing information and collaboratively building a resource for (hopefully) accurate information on almost any topic. In general, this will do. But it’s not quite right, as Siracusa explains by recounting his personal experience of trying, in vain, to defend an article from deletion. With five years to reflect on it, Siracusa describes his efforts as a “prototypical example of someone who does not understand what Wikipedia is, proving that he does not understand what Wikipedia is.”

All of this is a way of getting to Siracusa’s fascination—one might say morbid fascination—with Wikipedia’s policy of “Verifiability”. The first paragraph of the policy says:

Verifiability on Wikipedia is the ability to cite reliable sources that directly support the information in an article. All information in Wikipedia must be verifiable, but because other policies and guidelines also influence content, verifiability does not guarantee inclusion. The threshold for inclusion in Wikipedia is verifiability, not truth—whether readers can check that material in Wikipedia has already been published by a reliable source, not whether editors think unsourced material is true.

Or as Siracusa summarizes it: “Something can be as true as you want it to be, if it is not verifiable, it doesn’t go in.” Well said.

He also discusses the related policy of “No original research”. This includes a good explication of the different types of sources that may or may not be used on Wikipedia: primary sources (original documents and first-hand accounts), secondary sources (news articles interpreting primary sources) and tertiary sources (encyclopedias and academic articles summarizing the former). This is advanced stuff, and for a longtime Wikipedian, it’s no small thrill to hear a smart outsider explain why secondary sources are preferred, and work through the fundamental policies of Wikipedia. Siracusa correctly observes: “Wikipedia is not a place where you write down stuff that you know. … Wikipedia writes about other people writing about things.”

Except here’s the thing: Siracusa understands Wikipedia’s core content policies. He just doesn’t like them.

In his particular example, a former standalone article called FTFF (here’s what it used to look like) didn’t survive the process not because it wasn’t true, but (he says) because it contained material that wasn’t verifiable, and constituted original research. This is partly true, but it owes more to a guideline that got only passing mention on the show (and, frankly, in the deletion debate): “Notability”, and specifically the “General notability guideline”. It’s closely tied in with WP:VERIFY and WP:ORIGINAL, and basically says that a topic must have sufficient coverage in secondary sources to be given its own standalone page. FTFF was not, and the result of the debate was to merge the topic to Finder_(software)#Criticism.

Anyway, this pedantry about WP:NOTE and WP:GNG doesn’t affect Siracusa’s main point: If something is true but unverifiable, he would like to see it included in Wikipedia anyway. Nor does it affect his corollary argument, that Wikipedia’s complex rules discourage many would-be participants.

He’s undoubtedly right about the second point: many people try to get involved with Wikipedia who have no idea what it’s really about, and they tend to have a really bad experience. Wikipedia struggles to explain itself to outsiders, and it probably always will.

As to the former, the problem is that he fails to grapple with the implications of the Wikipedia he describes, and this is disappointing. By privileging “truth” above “verifiability”, one gets the impression he’s describing a Rashomon-like Wikipedia where all possible viewpoints are explored, and somehow eventually Wikipedia just makes the right call. This assumes a lot, not least that contentious topics wouldn’t simply devolve into edit wars of unchecked aggression. In a world where Wikipedia aims for truth but eschews verifiability, there are no footholds upon which to steady an argument. There is no way to know what should be considered credible or otherwise.

At times it actually sounds like he’s advocating something that already exists: reliance on “Consensus” for determining how Wikipedia will address the topics it covers. Wikipedia policies and guidelines don’t cover everything, and this is where consensus steps in, however imperfectly. If you’ve ever wondered why there is sometimes an observable discrepancy in the depth or quality of coverage between topics, consensus is the big reason why, and moreso the self-selection that shapes consensus. The current, real-world Wikipedia refers to outside authorities as well as consensus among editors; Siracusa’s Bizarro World Wikipedia would jettison the former and rely solely on the latter.

Meanwhile, Siracusa ascribes Wikipedia’s Byzantine rule structure to Wikipedians’ desire for approval from educators and academics, which he thinks is holding back Wikipedia from what it could become. He repeatedly says “Wikipedia should be something different” and refers to “what’s different about online” but he never gets prescriptive and never actually says why the old methods are outmoded. He does say his Wikipedia would seek to “arrive at truth using every tool necessary” and would, for example, allow original research… but what then is the mechanism for (dare I say) verifying it?

At one point, Siracusa compares the popular, widely-viewed Ars Technica forums to a hypothetical low-circulation print magazine, and complains that the widely-read former site is an invalid source while the unpopular latter publication is acceptable. It’s true that Wikipedia does not necessarily take a populist approach to evaluating sources, but he’s far off the mark in his attempt to explain this: “They’re not cool with the old librarians, because they’re not paper.”

I hope that he was just being lazy and doesn’t actually think that Wikipedia editors prefer paper (if anything they actually prefer online sources, which are easier to check) but he completely misses a key dynamic that ties back to verifiability: the paper magazine with poor circulation at least will have editors who are presumed to care about fact-checking and accuracy. A web forum, however popular it may be, may have moderators, but that’s not the same thing as having an editor. A discussion group is not an editorial operation, period. The forum is a primary source, and so should only be used to support reliable sources.

There are, however, reliable web sources. One of them is the editorial side of Ars Technica; no less an authority than John Siracusa has been cited in approximately 150 different Wikipedia articles about the Macintosh and other technology subjects.

♦     ♦     ♦

I’m sorry to say this, but in the show’s last fifteen minutes, Siracusa pretty much descends into total incoherence. Here’s his summary statement, close to verbatim:

[There are] many flaws in verifiability and reliability of sources. It’s built on a foundation of sand. Notability, what’s a reliable source, those things become so key to making Wikipedia crappy or good, and those sands are constantly always shifting, you know? And so if Wikipedia was centered on truth and that was its final goal, yeah, it would have to include citations and verifiability and stuff like that, but there would never be any argument when the two are in conflict. You know, if you could prove that a series of events happened here, then you could say, well, it’s verifiable, it appeared in a reliable source, but it’s not the truth. And so therefore we should expunge that. Because the final goal of Wikipedia is truth. But the final goal of Wikipedia is not truth, it’s verifiability.

There would “never be any argument” about what is the truth? In the parlance of Wikipedia: [citation needed].

Look, this is an epistemological issue, one much larger than just Wikipedia. The reason Wikipedia’s goal is verifiability, not truth, is because verifiability is an achievable goal. In fact, verifiability is a necessary step toward establishing truth, as Siracusa at this point seems to acknowledge in his imagined alternate, truth-seeking Wikipedia.

It’s not that Wikipedia is actively hostile to the truth: it’s just agnostic as to what it might be. Wikipedia articles are like road signs; truth itself may be unknowable, and we may never arrive at our destination, but Wikipedia can point in the right direction. Wikipedia’s policies and guidelines are designed to make sure that its content does that, although it’s fair to acknowledge that it’s not guaranteed. But what is? And what is truth?

Anyway, there’s a user essay on Wikipedia called “Verifiability, not truth” that says this better than I am going to. Here’s the key point:

That we have rules for the inclusion of material does not mean Wikipedians have no respect for truth and accuracy, just as a court’s reliance on rules of evidence does not mean the court does not respect truth. Wikipedia values accuracy, but it requires verifiability. Unlike some encyclopedias, Wikipedia does not try to impose “the truth” on its readers, and does not ask that they trust something just because they read it in Wikipedia. We empower our readers. We don’t ask for their blind trust.

If you want to upset the old system and do something new, you actually do need to think through what should replace it. Siracusa never does.

If he thinks Wikipedia’s adherence to “old world” rules is driving away contributors, he should consider what the free-for-all alternative would look like. It isn’t a Wikipedia I would spend any time with, it’s not one that Google would be eager to rank so highly, and it wouldn’t be the most important reference site on the Internet.

Rick Santorum’s Wikipedia Problem and its Discontents

Tagged as , , , , , , , , , , , , , , , , ,
on August 10, 2011 at 9:16 am

When former U.S. Senator Rick Santorum started gearing up to launch his presidential campaign earlier this year, there was one question he could not avoid. It had to do with the matter of alt-weekly editor and advice columnist Dan Savage, who has for years positioned himself as Santorum’s most prominent critic. Many politicians have fierce opponents, but few did what Savage did in 2003, and that was hold a contest to give an alternate meaning to the word “santorum”. I hope you’ll forgive me for declining to quote the winning definition, but you can find it here, and suffice to say that it has stuck. So much so, in fact, that eight years later Savage’s term has come to dominate the web search results for Rick Santorum’s name.

In news stories this year it was mostly described—by ABC News, Roll Call, Slate, and Huffington Post, among others—as Santorum’s “Google problem”. Indeed, one of the top three results for Santorum’s name is Dan Savage’s website promoting the campaign. But Google and Wikipedia are often joined at the hip, and one of the top results has been a Wikipedia article, not about Rick Santorum per se, but in fact about the campaign against him… or about the word itself… it hasn’t always been clear. And by mid-summer 2011, the article—then called Santorum (neologism)—had grown to several thousand words, and had itself become the focus of controversy among Wikipedians.

This blog post traces the history of the article’s evolution in some detail—not exhaustive, but getting there—because it’s an interesting window into how Wikipedia deals with controversial topics. Wikipedians can’t always agree, and in fact the article in question still remains a matter of dispute. But after 200,000 words and numerous debates in various forums around Wikipedia, the community has arrived at something approaching a satisfactory conclusion. Below, I aim to show how things got out of control, and how the Wikipedia community worked it out.

·     ·     ·

August 2006—To start from the beginning, let’s start from the beginning. The first version of this article was created five years ago this week, simply as Santorum.

(I should take a moment here to point out that—spoiler alert—because the article today is called Campaign for “santorum” neologism that is what appears at the top of all historical versions of the article; generally speaking, for each version I’ll link here, I will boldface article’s name at the time upon each reference.)

At this point the article was just a few paragraphs, outlining the circumstances that led to Savage’s coinage and a few examples of the term’s usage in the U.S. media. Prior to becoming its own article, most of the relevant material had been contained in a sub-section of the article about Savage’s sex advice column: Savage Love#Santorum.

It didn’t take very long at all before editors questioned the article’s suitability for a standalone article—what Wikipedia calls “notability”. In fact, the same day the article was first created, it was nominated for deletion. The reason for the nomination is one that would be echoed many times over the next half-decade:

The neologism referred to, created by Savage Love, does not have any evidence of real currency as a neologism. It should be treated as a political act by Savage Love, and described under that article.

The nomination failed and the article remained, as it certainly had received some media attention, but it was decided a renaming was in order. The suggestion was made that it be called Santorum (neologism), or possibly Santorum (sexual slang). Recent followers of this controversy might assume that the former was selected, because that was the name of the article for a long while. However, it was the latter, with a large reason being that Wikipedia has an explicit policy against creating articles about neologisms.

But that hardly settled the matter; the next issue concerned which Wikipedia page readers should find when they search for the word “santorum”, which now was considered to have—and here you could say that Savage had already won—two legitimate meanings. So the question was taken to a “straw poll”. For now, the article was still called Santorum, but what would the average Internet user be looking for when they looked up that term? How should the ambiguity be handled—in Wikipedia terminology, “disambiguated”? And what exactly should they call the article about the coinage?

Related to the word “Santorum”, the options included, and I quote:

  • Santorum should be an article about Savage’s attempt to define the word “santorum”
  • Santorum should be a disambiguation page, with its “traditional” content
  • Santorum should be a disambiguation page, with some other content (explain)
  • Santorum should be a redirect to Rick Santorum, and Rick Santorum should have a dablink…
  • Santorum should be a redirect to Rick Santorum, with no reference to the Savage neologism in the Rick Santorum article

Related to the article about Savage’s coinage, the options included, and I quote:

  • The article on the Savage neologism should be titled Santorum (neologism)
  • The article on the Savage neologism should be titled Santorum (sexual slang)
  • The Savage neologism needs no article; sufficiently covered at Savage Love#Santorum

And the result was… inconclusive. Nevertheless, a proposal was made, and subsequently accepted, to keep Rick Santorum as it always was, to call the Savage Love-inspired article Santorum (neologism), and to make Santorum a disambiguation page with links to relevant pages, among other details. The best summary of the considerations involved was stated by User:Dpbsmith, a veteran and still-active editor, who wrote:

Frankly I’ll support anything meeting these criterion:
A user who types in “santorum” as the Go word intending to find information about the Senator can find it very easily.
A user who types in “santorum” as the Go word intending to find information about the neologism can find it easily.
A user who types in “santorum” as the Go word is not presented immediately with the details of the neologism, but must click on a link, and the link must have some kind of label that communicates that fact that they are about to read about a political attack on the the [sic] Senator.
There should be no implication that Wikipedia endorses the neologism as somehow being “the real meaning” of the word.

Oh, did I mention there was also then a page called Santorum controversy, which is now called Santorum controversy regarding homosexuality, that also came up in the discussion? Well, now I have. Just wanted to be clear about that.

·     ·     ·

Late 2006-Early 2007—Although the matter seemed to have been handled appropriately, that didn’t stop editors from raising objections—even the very same objections—in the months following. In fact, someone had changed the article’s title back to Santorum (sexual slang) by the time the article came up for a second deletion debate in December 2006. The nominator focused on the fact that the media hits for the article were trivial—sure, The Daily Show and The Economist had used it, but neither had focused on it as a topic—while several less well-known sources appeared to be joining Savage’s campaign to popularize the term. Meanwhile, the nominator’s first argument was that the primary information was already covered in the Santorum controversy article (now you see why I mentioned it). Following a week’s worth of debate involving approximately two dozen Wikipedians and several thousand words…

The result was hopeless, hopeless lack of consensus.

(Emphasis in the original.) Lack of consensus to delete an article always means that it stays, and so it did. Some editors had suggested moving the article’s content to Wiktionary, Wikipedia’s dictionary sister project, where in fact the term had registered its own entry (without controversy) several months ahead of Wikipedia.

Later in December, one of the editors involved in the previous debate suggested moving the article from Santorum (sexual slang) to the oddly-titled Santorum (sexual slang activism), though the article stayed put. In January, a suggestion was made to merge the article back into the Savage Love entry, but that didn’t happen either.

·     ·     ·

Late 2007—Debate continued. In September, someone renamed it to Santorum (fluid)—ugh—and it was returned to Santorum (neologism), as it was then called. By this point, the article had grown substantially, was attracting the efforts of serious Wikipedians, and was… well, it was actually getting pretty good. In September 2007, the article was nominated for “Good article” (GA) status, and it looked like this. Later that day, the reviewing editor failed the article for including unsourced and “poorly sourced” material—The Onion in particular was singled out, although it was really an interview with Savage in the sister publication, AV Club—and for being a “BLP liability”.

That is to say, the article skirted the line of Wikipedia’s Biographies of living persons (BLP) policy, which aims to keep out scurrilous and weakly-sourced material about living persons that could be damaging to a living person’s reputation. As you might imagine, that had long been an issue; one couldn’t write about this topic without it being an issue. One could argue that Savage’s campaign was all about damaging Santorum’s reputation—I presume Dan Savage would agree to that—and yet it was nonetheless notable. Many editors then, and to this day, wished it would simply go away. And yet some wanted to make it as “good” as possible.

·     ·     ·

2008-2010—We can skip ahead, because after October 2007, fewer than 160 edits occurred in the three years intervening, and it was not changed substantially in that time. Santorum had lost his re-election bid in late 2006, re-entered private life in January 2007, and ceased to make headlines. In December 2007, the article looked like this. In January 2011, it looked like this. It was the same old back-and-forth, and not much happened.

·     ·     ·

Early 2011—As Santorum started making moves to run for president, activity picked up. In mid-February, Roll Call was first to write about Santorum’s “Google problem”, and this was dutifully added. The article continued to draw attention (including from vandals) through the end of February, until it was put under temporary “semi-protection”. When Stephen Colbert mentioned the controversy on his show, a not-so-brief summary was added, then removed, with the point made that “not everything Colbert says needs to be repeated in Wikipedia”. (Imagine that!) March and April were months of relative calm before the proverbial storm: nearly 1,000 direct edits, from May to this writing, lay just ahead.

·     ·     ·

May 2011—In early May, a very active and respected editor-administrator, User:Cirt, began a series of more than 300 edits to the article, starting with a long-overdue link to Wiktionary. By this point, the article contained some 1,600 words, excluding links and references. Cirt announced his intention to add “some research in additional secondary sources”, and four days later he had expanded the article to some 4,300 words. On the discussion page, one editor objected:

Expanding an article about a vile attack on a living person – it’s twice the size now and refs have gone from 33 to 95 – has got to be against the spirit of least of our BLP policy. My proposal, and my intention, stated right now, is to return this article to the content it had on May 9th.

This kicked off the first sustained debate in years—one that has arguably not yet come to a close. A proposal was made to “stub” the article, meaning to reduce the article’s length to a mere stub of an entry; the argument went, because the arguably unfair subject obviously met Wikipedia’s previously-determined standards for inclusion, a possible solution was to reduce it to the shortest possible version. This proposal quickly failed, with Cirt himself citing an earlier comment by veteran Wikipedian (and current Wikimedia Foundation fellow) Steven Walling:

The BLP policy is not a blank check for deleting anything negative related to a living individual. Criticism, commentary, and even base mockery of a public figure like a Senator is protected free speech in the United States. While it would be ridiculous for anyone to try and make Wikipedia a platform for creating the kind of meme Savage did, it is perfectly prudent for Wikipedia to neutrally report on the overwhelming amount of coverage given to the topic.

Remember that part about using Wikipedia as a platform—it will come up later. Meanwhile, Cirt continued to add significant information about media usage and analysis of the term and events surrounding Savage’s campaign, all backed up with acceptable references. In particular, he focused on adding uses of “santorum”, in slang dictionaries and even erotica, to support the article’s focus as legitimately about the neologism, and not Savage’s campaign per se.

For those who did not wish for Wikipedia to contribute to the so-called problem of making Savage’s campaign seem more important than it arguably was, it must have been more frustrating still to observe that the article was quite well-written and scrupulously followed Wikipedia’s style and sourcing guidelines. Cirt was nothing if not sophisticated. Many had the impression that the article itself was now an attack on Santorum, although that conclusion was only in the eye of the beholder. Cirt knew what he was doing and, for lack of a better phrase, Cirt knew exactly what he was doing. One editor objected:

I realize you will defend this bloated attack piece with all your skills (that is actually what I find most disturbing) but you have to realize or at least have noticed that many experienced editors disagree with your massive expansion of it and at some point it will require wider input and a community RFC.

By the end of May, the article had grown to more than five times the length of the article Santorum controversy regarding homosexuality and more than two-thirds the length of the primary Rick Santorum biographical article. Discrepancies of this sort have been well observed, most significantly on the Internet forum Something Awful, but no Wikipedia policy exists to require proportionality among articles.

At its greatest length, on May 31, the article surpassed 5,500 words, including headers but excluding photo captions, links and references—a total of over 77,000 bytes of data.

·     ·     ·

June 2011-Present— Were I to adequately summarize the debates and discussions that occurred beginning in late May and continuing sustainedly—with most debate occurring in June—this blog post could be three times its already considerable length. Instead I will attempt to summarize, although “considerable length” is unavoidable still.

From early June, Cirt pretty much stopped editing the article. To a significant extent, he’d become part of the issue, not just regarding this article but others as well, as can be seen on the discussion page for Cirt’s user account.

Among the many solutions offered around this time, one focused not on the article content itself, but rather its visibility on search engine results pages (SERPs). The editor offered, even if just for the sake of argument:

While I don’t really like the precedent, there’s nothing to say that every article needs to be indexed by search engines. … The majority of the concerns here seem to be focused on how people are coming across this article (via Google bombing, etc.), not necessarily that the article exists. … Both sides have legitimate points in their favor, so a compromise might be best here.

Other editors agreed it would set a bad precedent, and the suggestion did not go any further.

By now the topic had come to involve some of Wikipedia’s most influential editors, and a lengthy debate opened on Jimmy Wales’ discussion page. Wales’ take was as follows:

My only thought about the whole thing is that WP:COATRACK applies in spades. There is zero reason for this page to exist. It is arguable whether this nonsense even belongs in his biography at all, but at a bare minimum, a merger to his main article seems appropriate.

The “Coatrack” argument—one of many analogies Wikipedians have created over the years to illustrate key concepts—is not a policy or a guideline, but an informal essay, yet one with much currency. It states:

A coatrack article is a Wikipedia article that ostensibly discusses the nominal subject, but in reality is a cover for a tangentially related biased subject. The nominal subject is used as an empty coat-rack, which ends up being mostly obscured by the “coats”. The existence of a “hook” in a given article is not a good reason to “hang” irrelevant and biased material there.

In retrospect, it’s a little surprising that the “Coatrack” issue hadn’t been raised in any significant way before—and Wales is neither considered infallible nor is he always that involved in day-to-day Wikipedia issues—but this may yet have been a turning point. The next day, the highly respected User:SlimVirgin opened an RfC (Request for Comment) called “Proposal to rename, redirect, and merge content”. This led to the article being renamed, for a time, Santorum Google problem. Later, it was pointed out that “Google is not the only search engine in the world”, and so the search (as it were) continued.

The argument that the “neologism” had not evolved organically, but was the result of an organized campaign by Savage and his allies, had begun to exert some influence. For one thing, it was now quite clear that the majority of sources focused on the political campaign to bring relevance to the term, as opposed to the term’s relevance itself. In this way, one might say that Savage’s campaign had become a little too successful. Yes, the term was notable, but the controversy itself had become even more so.

Prior to the renaming mentioned above, editors in an adjacent thread had discussed several alternative names for the article. These included:

  • Santorum neologism controversy
  • Dan Savage santorum neologism controversy
  • Dan Savage santorum neologism campaign
  • Santorum neologism campaign
  • Spreading santorum (the name of Savage’s website)

Here one can start to see where the article’s current title would eventually emerge. Meanwhile, the article faced two more AfD (Articles for deletion) nominations, the first under its old name and the second under its current one. These were the fourth and fifth nominations overall, and surely the most futile.

As part of the ongoing RfC discussion in June, it had been strongly suggested that the article needed to be condensed, especially as Cirt’s expansion had contributed so significantly to the controversy. Besides the article expansion, in mid-May Cirt had created a new “footer” template, Template:Sexual slang, which further linked Rick Santorum’s name to dozens of NSFW topics. That template still exists, but on June 11 the link to Santorum (neologism) was removed. Again, it’s hard to say if this was another turning point, but a discussion about this template on Wales’ discussion page supports the notion that a consensus was coming into view: the article in its present form had itself become part of the campaign—that Wikipedia was being used as a platform for the campaign in the manner Walling had suggested.

A day later, a request for arbitration (RfAr)—a petition to the Arbitration Committee, Wikipedia’s equivalent of the Supreme Court—was opened against Cirt on the basis that his concerted efforts on the subject constituted “political activism”. On June 18 the request was rejected, but not before several dozen editors had contributed more than 28,000 words of opinion. One committee member wrote:

Decline for now, I’m inclined to think that this is more of a content dispute, and the community is able to cope with it.

On June 17, the community finally hit on a name that stuck: Campaign for “santorum” neologism. Initially, this was only intended as an interim move while further discussion took place. Among the names considered at this time, not all were serious, but most were:

  • Dan Savage santorum campaign
  • Dan Savage campaign
  • Dan Savage’s verbal attack on Rick Santorum
  • Santorum (sexual slang)
  • Santorum neologism campaign
  • Santorum neologism campaign
  • Santorum neologism controversy
  • Rick Santorum and homosexuality
  • Rick Santorum homosexuality controversy
  • Savage Santorum campaign
  • Dan Savage santorum neologism controversy
  • Dan Savage santorum neologism campaign
  • Spreading Santorum
  • Rick Santorum’s Google problem
  • Rick Santorum’s “Google problem”
  • Santorum Google problem
  • Rick Santorum Google problem
  • ‘Spreading santorum’ campaign
  • Campaign for “santorum” neologism
  • Dan Savage campaign for “santorum” neologism
  • Savage–Santorum affair (a reply: “Oh Please God No.”)
  • Savage–Santorum controversy
  • santorum (neologism)
  • The problem Rick Santorum is facing because every search engine in the world’s top search results says santorum is an anal sex by-product
  • Santorum (googlebomb)
  • SEO Campaign for “santorum” neologism
  • Santorum (cyberattack)
  • Santorum (cyberbullying)
  • Santorm (SEO attack)
  • Dan Savage’s “spreading santorum” campaign against Rick Santorum’s anti-gay stance
  • Santorum Google ranking problem
  • Dan Savage Google-bomb Attack on Rick Santorum
  • Campaign to attack Santorum’s name
  • Campaign to create ‘santorum’ neologism
  • Campaign to associate Santorum to neologism

In the end, inertia and the current title’s inherent virtues won out. Of the eventual “winner”—Campaign for “santorum” neologism—a veteran Wikipedian commented:

This one is growing on me – neutral, correct, to-the-point, and succinctly informative to readers both familiar and unfamiliar with the subject as to what the article will be about.

All that was left was to whittle the article down from its extreme length to a shape that covered the topic adequately, balancing relevance with discretion. While many edits were to follow, the key edit was made on June 21, when SlimVirgin replaced a 4,800-word version of the article (minus links and references) with a 1,400-word version. This is substantially the version of the article that remains in place today.

·     ·     ·

Comparing the late May version of the article, at its longest point, to the trimmed-down and refocused current version, here’s what we find:

  • The earlier version focused on the term in and of itself, with the opening sentence including a definition and describing its use. The current version focuses on the events, explaining the aim of Savage’s campaign—though the definition remains.
  • Excluding the lead section, references and external links, there are only three sections in the current version, compared with seven in the earlier (not including “See also” and “Further reading”, which were also removed).
  • The content of the “Background” section was almost entirely removed, leaving just the key facts about Rick Santorum’s statements in the 2003 Associated Press interview.
  • The section about the website “Spreading Santorum” was removed, details added into the “Campaign by Dan Savage” section.
  • Almost all of the “Recognition and usage” section was removed.
  • “Media analysis” and “Political impact” were combined into one, shorter, summarized section, focusing on the reception of the campaign in the media and its political impact.
  • Santorum’s response to the controversy was kept in the current article, however condensed.

Up to the present day, in the Talk page discussions alone (including the RfC discussion), more than 200,000 words have been written about the article. That is probably well short of the true number.

Perhaps surprisingly, the impact on Rick Santorum’s Wikipedia article was not that great—the article had long summarized the events in a short final paragraph concluding a heading relating to his statements about homosexuality—83 words at this count.

Meanwhile, Santorum’s “Google” problem continues. Conduct a logged-out search today, and here are the top three results:

And let’s not imagine the argument is completely over on Campaign for “santorum” neologism. Visit today, and one will find at the very top:

Images courtesy Wikipedia and Wikimedia Commons, licensed under Creative Commons. Additional research and analysis provided by Rhiannon Ruff.

Charted Territory: When Good Infographics Go Bad

Tagged as , , , , , , , , , , , , , , , ,
on August 12, 2010 at 8:32 pm

I will be blunt: the new infographic from David McCandless (Information is Beautiful), called “Articles of War: Wikipedia’s lamest edit wars“, is so lazy as to be misleading, glib as to be condescending, and generally unhelpful that I’m inclined to say that it sets back the public understanding of how Wikipedia works all by itself.

Up front: I respect McCandless and like what he does, which includes some interesting and thoughtful work, especially his print of Left vs. Right (U.S. and Rest of the World editions) that is better than most professional political analysts could produce. Separately, I am collaborating with friends on a Wikipedia visualization project of our own, so call me an interested observer, but note also that I’ve been thinking about this kind of thing lately.

I have reproduced only the top section of “Articles of War” below, for the purposes of commentary (click through to see the full thing on McCandless’ site):

Articles of War (excerpt)

The first thing to know about “Articles of War” is that it was based on an essay to be found in the recesses of Wikipedia called “Lamest edit wars” that is specifically kept in the site’s intra-wiki space because, as it states at the top: “This page contains material that is kept because it is considered humorous.” McCandless & Co. do give credit where it is due, but that Wikipedia page surely does not and never did intend to be definitive — it’s just a series of cheekily-written paragraphs about various arguments occurring over time, so there is nothing like meaningful numbers to be gleaned from it.

Instead, McCandless and his researchers decided to generate data to visualize these edit wars by counting the total number of edits over each article’s lifetime, counting not just the edits specifically related to that particular dispute (a difficult and time-consuming thing to research, it goes without saying) but every single edit, ever, thereby giving a grossly distorted view of each article’s history. I’ll give them the fact that if one looks to the legend in the top lefthand corner, it indicates that the number listed (and I presume the size of each box) relates to the “Total no. of edits” but even if readers do notice that, it is at best confusing.

Likewise, the articles’ relative position on the chart accords to their creation, not when the described dispute took place. If you think 2,000+ edits were expended on a photograph in the Cow-tipping article in the middle of 2001, that’s too bad, but you were reasonably misled. Nor would would you know that the article did not include a photograph until several years later.

What you are left with is a decent visualization of how frequently edited some randomly selected articles — some popular, some timely, some but not all controversial — happen to be. Why not simply show that? Focusing on this alone we can see that the following articles have attracted tens of thousands of edits over the years:

  • The Beatles
  • Jesus
  • Wikipedia
  • Christianity
  • Ann Coulter
  • Star Wars
  • Wii

That’s not linkbait enough for you? Then please do the research.

Meanwhile, the infographic is also a little too snarky for its own good, especially toward its chosen subject. Color-coding is used to categorize certain types of edit wars; one is labeled “American Cultural Superiority” and exists mainly to identify debates between U.S. and British spellings. Which I find a little… superior itself, but hey, I suppose it’s a misdemeanor violation. Worse is that edit wars involving Wikipedia and site co-founder Jimmy Wales are coded as “Religion.” Too cute. Or maybe just an oversight?

Another oversight concerns an on-wiki debate about whether the most famous Palin was, at the time of its occurrence, Monty Python’s Michael or Alaska’s former governor Sarah. (Since then, I believe the one with decades of contributions to comedy has been definitively usurped by the mavericky one’s more recent, er, contributions.) According to “Articles of War” this happened in 2003. But if you think about it, this makes no sense at all — of course this happened in 2008, when John McCain chose Sarah Palin as his running mate. And the Lamest edit wars essay itself mentions that this happened in 2008. Pure oversight to be sure, but I have to wonder what other mistakes the research team made.

To their partial credit, they have opened their Google Spreadsheets for public inspection, so it’s clear they at least intended to impart real information. And there you can see that they are indeed using the total number of edits over time and that their “Palin” error was made early on. That seems to put the responsibility on the researchers, rather than McCandless himself, but of course it’s a total package.

I hold McCandless to a standard that I don’t the jokers at Cracked* or Something Awful because their job is to make you laugh, while McCandless’ job, according to his website’s own tagline, is to take “issues, ideas, knowledge, data” — and make it easier to understand by visualizing it. There are certainly issues and ideas to be found in “Articles of War” — but knowledge and data, not so much. And though I am getting a little more rant-y than usual about this, I do aim to be constructive, so I would very much like to see this infographic re-done with some extra research. This blog post may serve as a guide if they so choose. I hope they do.

P.S. The Gizmodo thread — where I found it — on this is hilarious, with many people re-fighting the same disputes that once arose on Wikipedia. However, only one that I saw came anywhere near noticing the fact that the methodology was suspect.

P.P.S. Am I being nitpicky to add that “Articles of War” appears to convey that Wikipedia’s articles about The Beatles and Jesus were created prior to 2001? That is to say before Wikipedia itself began? I don’t actually think so.

*Actually, about Cracked — a.k.a. Digg’s favorite website — as I have seen a prominent Wikipedian point out elsewhere, it often does a pretty good job using information from Wikipedia responsibly. Among their articles about Wikipedia, the title of “5 Terrifying Bastardizations of the Wikipedia Model” alone gives away that it’s implicitly pro-Wikipedia, as does “5 Celebrity Wikipedia Entries they Clearly Wrote Themselves“. Even “8 Most Needlessly Detailed Wikipedia Entries” knows what’s good about Wikipedia, even when it isn’t. Cracked writers clearly know their way down through a history page — like say, Corey Feldman’s — but it doesn’t appear that McCandless and his researchers looked as closely.

David Petraeus’ Big Month

Tagged as , , , ,
on June 28, 2010 at 8:34 am

In the views of many, Wikipedia tends toward frivolity. After all, the concept of Wikigroaning assumes that articles on pop culture subjects will be given less attention than articles on weighty subjects. While Wikipedia does include plenty of material that Britannica could and would never address, I’ve pointed out before that this isn’t always the case.

Here’s another reason to retain your faith in humanity, and this time not just Wikipedia’s contributors but also its visitors: this month’s traffic to the Wikipedia article about Gen. David Petraeus. He was in the news twice this month, and for very different reasons. First, on June 15, Petraeus fainted while testifying before the Senate Armed Services Committee. It’s just the kind of TMZ DC-ready story that gets attention, including video, which always helps. Indeed, the story caused traffic on his Wikipedia article to spike.

But as the chart below indicates, that was only about a tenth of the traffic to his page once President Obama nominated him to replace Gen. Stanley McChrystal as the U.S. commander in Afghanistan, following the latter general’s unsolicitous remarks about the Obama administration in Rolling Stone magazine. Perhaps this does not reveal too much, as this is undoubtedly the bigger news story, but it is also a much more complicated one, and at least indicates that no matter how many articles about Pokemon characters Wikipedia may hold, people can still find what’s important.

As for the fact that the top day for traffic to McChrystal’s Wikipedia article this month nearly doubled the traffic on Petraeus’ top day, well, I’ll let you judge that for yourself.

Snapshot of traffic to Wikipedia article about David Petraeus, June 2010.

Snapshot of traffic to Wikipedia article about David Petraeus, June 2010.

Traffic statistics courtesy User:Henrik.

“Treme” vs. “Treme (TV series)”

Tagged as , , , , , , , , , , ,
on April 18, 2010 at 10:04 am

In more than one post on this blog I’ve written skeptically about the concept of “Wikigroaning” — the notion that important subjects sometimes have shorter articles than arguably less-important subjects that appeal to geek sensibilities. In the case of Raphael (archangel, artist or ninja turtle) and lightsaber vs. modern warfare, the complaint did not quite hold up. But I don’t mean to indicate the charge is never without basis.

treme_neighborhood_wptreme_tv_series_wp

With the second episode of HBO’s latest dramatic series, “Treme,” set to air this evening, I decided to compare two related Wikipedia articles — one about the New Orleans neighborhood, and the new TV drama from David Simon. What did I find?

In the first place, Treme is currently 10 Kb long while Treme (TV series) is closer to 17 Kb. On the face of it, the article about the series is substantially longer at present. And this is the case even though the former article has existed since April 2004 whereas the latter was created in March 2009.

It’s fair to say that both articles are in decent shape. The article about the neighborhood has a quality infobox featuring geographic and demographic information, and a concise History section is informative, if perhaps too concise. I compare this to the article about my neighborhood of Adams Morgan in Washington, DC, which has much more information (though fewer references to support them) and no comparable infobox of data. Each article could stand to learn something from the other.

But there is no use arguing that Treme (TV series) is not the better article. It is simply more carefully and completely written, with a more sophisticated article structure utilizing subsections for more in-depth coverage of certain aspects of the show. Plus, it has already spawned a secondary page, List of Treme episodes.

Is there a silver lining here? I think there may be. If the show becomes popular — at least popular enough to inspire a following similar to Simon’s earlier work — then it may well inspire someone or a few someones to become more interested in the neighborhood itself. To be sure, the series itself has already caused a spike of interest in the subject. And all it takes is one person to make it a personal project. If “Treme (TV series)” can do that, “Treme” will be the better for it.

Images via Wikipedia. Neighborhood photograph licensed under Creative Commons by Wikipedia contributor Infrogmation.

The Archangel, the Renaissance Master and the Ninja Turtle

Tagged as , , , , , , , , , , , ,
on November 22, 2009 at 3:37 pm

raphael-angel     raphael-painter     raphael-tmnt

Back in March I considered the subject of “wikigroaning”—a joke / criticism about Wikipedia popularized on the Something Awful Internet forum in 2007. The idea is this: Sometimes, Wikipedia articles on weighty subjects are shorter and less well-developed than articles about similar, less-weighty subjects.

What I found was that this critique no longer applied to a comparison of “Lightsaber combat” vs. “Modern warfare“; the former entry no longer strictly exists, as the page now redirects to the larger topic of “Lightsaber” while the latter is essentially a hub for accessing articles on various sub-topics (assymetric warfare, biological warfare, etc.).

Today, let’s look at another one suggested by Something Awful members: Raphael (archangel) vs. Raphael (ninja turtle). How do the two compare?

Superficially, the joke is on Wikipedia: the main text of the article about the comic book character is approximately 3,000 words long, whereas the one about the Judeo-Christian figure is about 1,350. But here’s the thing—the TMNT-related article is basically devoid of any citations, and was clearly written by fans of the various comic books, TV shows and movies in which he appears. One might assume that the details should be relatively accurate, as it doesn’t seem to be a contentious subject, but who is to say? One citation is provided for the entire article, and indeed the article has been tagged as needing citations since December 2007:

wiki-raphael-warning

That’s almost two years in which fans have been stopping by to work on the article, but no one has yet bothered to clean up problems identified by a non-fan editor, nor have they bothered to provide citations to verify any of it. From this we can infer that most editors on this particular article are focused on this particular topic and are not involved with Wikipedia otherwise.

Meanwhile there is another problem with this article. While much of it summarizes discrete events that occur in the TMNT series, other sections read as commentary on / interpretations of the character. For example:

He has an extremely loyal side and is the first to react when another of his brothers is in trouble. This happens on numerous occasions, like when he stops a blow from hitting Donatello using only his sais or kicks the Shredder away from Leonardo when the latter is about to attack.

So one could certainly verify the existence of a particular scene by citing directly from the comics. Yet the interpretation of Raphael’s actions is left to the reader, and adding this information directly to Wikipedia is a clear-cut case of original researchexpressly forbidden by Wikipedia guidelines.

What is one to do if there is no published commentary on this aspect of the character’s personality? Is it then to be left out of Wikipedia entirely? In theory, yes. In practice, no. I could remove this section immediately and much more of the article if it so pleased me. But you know what? I won’t do it. The article isn’t hurting anyone, so in that way its relative frivolity helps. Moreover, it’s entirely possible that many or most of these interpretations could be found in published reviews, and without having done this I’m disinclined to delete someone’s sincere work, however inexpert. As a known issue, Wikipedia has an informal term for this type of material: fancruft. Fancruft is often deleted, but this much is so far not offensive enough to merit outright deletion.

tmnt-coverAnd how about the archangel? For an article about a Biblical figure I am surprised that it is not better. Only seven citations have been provided, and sections including “Raphael in Islam” and “Raphael in Paradise Lost” have none whatsoever. The quality of the writing is likewise uneven. Clearly, different sections within the article are substantially the work of different editors, and I would probably base my trust in each section according to the quality of the prose. Unsurprisingly, the better-written sections are also the ones with more sources.

But let’s now finally address the obvious: Something Awful seems to have made a mistake, because the Raphael the turtle is not named for Raphael the archangel. He is named for the Renaissance artist, just like his ninja turtle brothers Leonardo, Michaelangelo and Donatello.

Before we come to a final conclusion, let us consider the article about the real person, which is simply titled Raphael. And guess what? It’s the best of the bunch, and it’s not even close. The article is more than 6,000 words, well-written, well-sourced (84 in-line citations, nearly all from serious biographies) and well-illustrated (easy to do when the subject’s work is all public domain). There is not even a mention of the TMNT character, although it has been suggested before and appropriately rejected.

Did Something Awful purposefully avoid making the comparison? Hard to say. In 2007 the article about the Renaissance master was much shorter and completely unsourced, though carefully-written. At that time, the article about the ninja turtle was certainly longer but also less sophisticated.

According to the original Something Awful post, the criteria was simply an assesment of which article is “longer.” But this is too simplistic—it should be obvious that not all words are equal. Just as Something Awful seeks to highlight the mistake of determine a subject’s importance by the space allotted on Wikipedia, it’s also a mistake to assume that the quality of an article is directly correlated with the number of words contained within.

Both are important to keep in mind when reading Wikipedia. How many readers approach the site with these considerations in mind? That’s what I’d like to know.

Images via Wikipedia.

Wikipedia On Dead Tree Redux

Tagged as , , ,
on June 20, 2009 at 3:31 pm

More than a week ago I posted a photo that’s been making the rounds lately — and even wound up as the basis for a joke on Conan O’Brien this past week — about a student artist who had created a physical book of Wikipedia’s Featured articles, one taking up approximately 5,000 pages. I noted at the time that the explanatory text

Reproducing Wikipedia in a dysfunctional physical form helps to question its use as an internet resource.

wasn’t terribly satisfying to me, and I asked at the time

Would printing all of Google’s search results also question its use as an Internet resource? Would printing an image of a sundial question its use as a physical timekeeping device?

and I resolved to find out more if I could. In fact I did hear back from the book’s creator, Rob Matthews, not long after. When posed with the question above, he responded at first:

I’m comparing the Internet Wikipedia to a traditional encyclopedia, by putting it in the same format, therefore suggesting that Wikipedia is dysfunctional compared to a normal encyclopedia. This is suggested by how I’ve conveyed Wikipedia physically.

I still wasn’t satisfied with this, but after a bit of back and forth, Matthews confirmed that his intention was to point out, compared to a traditional paper-based encyclopedia, it’s less reliable because of its radical openness, or hard to find what’s important among the incomplete and unbalanced articles that exist on the site. Those are my words, but he agreed with this much.

I actually do not agree with this view. Not that I don’t agree there is some truth to the point, because there is, but because I do not actually see how anyone is impeded from finding what they want because of Wikipedia. Moreover, “what’s important” is always in flux, and Wikipedia is a reflection of that.

wikipedia-in-print-rob-matthewsIt’s also nothing new. Those who lament the fact that Wkipedia gives disproportionate coverage to trivial matters — a criticism voiced by none other than Stephen Colbert, who sarcastically riffed on the subject, “any site that’s got a longer entry on ‘truthiness’ than on Lutherans has its priorities straight” — should also recognize that these imbalances are often corrected.

I’ve never been one to take my social commentary from visual art such as painting or sculpture, in significant part because it is rare that an image or an object can convey a subtle point while also succeeding as art. For such a purpose — in this case offering commentary on a subject which is overwhelmingly composed of words — I think nonverbal art is inferior to something like the novel, the essay or even the sitcom.

Even if I thought Matthews had a strong argument about Wikipedia to make, I think this fails as standalone commentary. But if Matthews does actually sell copies of this book, consider me interested (price dependent). Mr. Matthews doesn’t have answers for his questions, but his artwork would make for an excellent conversation piece.

Wikigroaning: Less Random than a Blaster

Tagged as , , , ,
on March 9, 2009 at 3:24 pm

On July 31, 2006, Stephen Colbert said of Wikipedia:

Any site that has a longer entry on truthiness than on Lutherans has its priorities straight.

This probably wasn’t the first time someone has noticed the tendency of Wikipedia to feature more information about arguably trivial subjects than arguably significant ones, but it certainly was not the last. Less than a year later, a contributor to Something Awful created (or popularized) a game called “Wikigroaning”:

something_awful_logoThe premise is quite simple. First, find a useful Wikipedia article that normal people might read. For example, the article called “Knight.” Then, find a somehow similar article that is longer, but at the same time, useless to a very large fraction of the population. In this case, we’ll go with “Jedi Knight.” Open both of the links and compare the lengths of the two articles. Compare not only that, but how well concepts are explored, and the greater professionalism with which the longer article was likely created. Are you looking yet? Get a good, long look. Yeah. Yeeaaah, we know, but that is just the tip of the iceberg.

The article included a list of amusingly juxtaposed concepts, such as Modern warfare vs. Video Game Crash of 1983 and while the concept is funny, Wikipedia recognizes this as a systemic bias they must deal with.

This seems to me like an opportunity a) to ask whether they have and in doing so b) launch an occasionally recurring feature, wherein we compare the Wikipedia of today (beginning in early 2009) to the Wikipedia of 2007. So let’s see how those specific articles from 2007 compare then, and now.

First, let’s benchmark the articles at June 1, 2007, just a few days prior to the article’s publication and about the time author Johnny “DocEvil” Titanium was doing his research. Naturally, the piece implies that the “Lightsaber combat” article was much longer than the one about “Modern warfare.” Unfortunately (sort of) the former article no longer exists: if you click the link you are now redirected to the article Lightsaber, and as I am not an administrator, I cannot see the old pages. No matter. If we substitute Lightsaber on June 1, 2007, that article was 9,500+ words long. Modern warfare on June 1, 2007 was just shy of 2,000.

Now, here are the two side-by-side as of today:

wp_modern_warfare_vs_lightsaber

As you can see, the Modern warfare article is now somewhat longer than the Lightsaber article. Of course, length is not everything. For one thing, the Lightsaber article is now well-sourced (in 2007 it had just one in-line citation) whereas Modern warfare in fact has none. But there are mitigating circumstances here, as well. One thing that “Wikigroaning” doesn’t take into account is the amount of material on other pages, and here Modern warfare is nearly a list, serving primarily as a jumping-off point to other articles describing different aspects of modern warfare in greater detail. Some of these are well-sourced, whereas others are not. Another consideration is edit frequency: Lightsaber has been edited many, many more times than has Modern warfare, which speaks partially to the number of “experts” in the former and partially to the stability of the latter.

A more apt comparison might be to the AK-47 article, which I think is a better article still, and much better than the one about the Blaster.

This being the first post in a series I have yet to fully develop, I may develop a rating system and return to this post at another date to include it. Additionally, what I write is guaranteed valid for March 9, 2009 only and may warrant revisiting at another time. But let’s see where this takes us in the meantime.

Oh, and if you really want to know all about Lightsaber combat, Wookiepedia has an article of that name which runs more than 3,300 words — but no in-line citations.