William Beutler on Wikipedia

Posts Tagged ‘SEO’

Rick Santorum’s Wikipedia Problem and its Discontents

Tagged as , , , , , , , , , , , , , , , , ,
on August 10, 2011 at 9:16 am

When former U.S. Senator Rick Santorum started gearing up to launch his presidential campaign earlier this year, there was one question he could not avoid. It had to do with the matter of alt-weekly editor and advice columnist Dan Savage, who has for years positioned himself as Santorum’s most prominent critic. Many politicians have fierce opponents, but few did what Savage did in 2003, and that was hold a contest to give an alternate meaning to the word “santorum”. I hope you’ll forgive me for declining to quote the winning definition, but you can find it here, and suffice to say that it has stuck. So much so, in fact, that eight years later Savage’s term has come to dominate the web search results for Rick Santorum’s name.

In news stories this year it was mostly described—by ABC News, Roll Call, Slate, and Huffington Post, among others—as Santorum’s “Google problem”. Indeed, one of the top three results for Santorum’s name is Dan Savage’s website promoting the campaign. But Google and Wikipedia are often joined at the hip, and one of the top results has been a Wikipedia article, not about Rick Santorum per se, but in fact about the campaign against him… or about the word itself… it hasn’t always been clear. And by mid-summer 2011, the article—then called Santorum (neologism)—had grown to several thousand words, and had itself become the focus of controversy among Wikipedians.

This blog post traces the history of the article’s evolution in some detail—not exhaustive, but getting there—because it’s an interesting window into how Wikipedia deals with controversial topics. Wikipedians can’t always agree, and in fact the article in question still remains a matter of dispute. But after 200,000 words and numerous debates in various forums around Wikipedia, the community has arrived at something approaching a satisfactory conclusion. Below, I aim to show how things got out of control, and how the Wikipedia community worked it out.

·     ·     ·

August 2006—To start from the beginning, let’s start from the beginning. The first version of this article was created five years ago this week, simply as Santorum.

(I should take a moment here to point out that—spoiler alert—because the article today is called Campaign for “santorum” neologism that is what appears at the top of all historical versions of the article; generally speaking, for each version I’ll link here, I will boldface article’s name at the time upon each reference.)

At this point the article was just a few paragraphs, outlining the circumstances that led to Savage’s coinage and a few examples of the term’s usage in the U.S. media. Prior to becoming its own article, most of the relevant material had been contained in a sub-section of the article about Savage’s sex advice column: Savage Love#Santorum.

It didn’t take very long at all before editors questioned the article’s suitability for a standalone article—what Wikipedia calls “notability”. In fact, the same day the article was first created, it was nominated for deletion. The reason for the nomination is one that would be echoed many times over the next half-decade:

The neologism referred to, created by Savage Love, does not have any evidence of real currency as a neologism. It should be treated as a political act by Savage Love, and described under that article.

The nomination failed and the article remained, as it certainly had received some media attention, but it was decided a renaming was in order. The suggestion was made that it be called Santorum (neologism), or possibly Santorum (sexual slang). Recent followers of this controversy might assume that the former was selected, because that was the name of the article for a long while. However, it was the latter, with a large reason being that Wikipedia has an explicit policy against creating articles about neologisms.

But that hardly settled the matter; the next issue concerned which Wikipedia page readers should find when they search for the word “santorum”, which now was considered to have—and here you could say that Savage had already won—two legitimate meanings. So the question was taken to a “straw poll”. For now, the article was still called Santorum, but what would the average Internet user be looking for when they looked up that term? How should the ambiguity be handled—in Wikipedia terminology, “disambiguated”? And what exactly should they call the article about the coinage?

Related to the word “Santorum”, the options included, and I quote:

  • Santorum should be an article about Savage’s attempt to define the word “santorum”
  • Santorum should be a disambiguation page, with its “traditional” content
  • Santorum should be a disambiguation page, with some other content (explain)
  • Santorum should be a redirect to Rick Santorum, and Rick Santorum should have a dablink…
  • Santorum should be a redirect to Rick Santorum, with no reference to the Savage neologism in the Rick Santorum article

Related to the article about Savage’s coinage, the options included, and I quote:

  • The article on the Savage neologism should be titled Santorum (neologism)
  • The article on the Savage neologism should be titled Santorum (sexual slang)
  • The Savage neologism needs no article; sufficiently covered at Savage Love#Santorum

And the result was… inconclusive. Nevertheless, a proposal was made, and subsequently accepted, to keep Rick Santorum as it always was, to call the Savage Love-inspired article Santorum (neologism), and to make Santorum a disambiguation page with links to relevant pages, among other details. The best summary of the considerations involved was stated by User:Dpbsmith, a veteran and still-active editor, who wrote:

Frankly I’ll support anything meeting these criterion:
A user who types in “santorum” as the Go word intending to find information about the Senator can find it very easily.
A user who types in “santorum” as the Go word intending to find information about the neologism can find it easily.
A user who types in “santorum” as the Go word is not presented immediately with the details of the neologism, but must click on a link, and the link must have some kind of label that communicates that fact that they are about to read about a political attack on the the [sic] Senator.
There should be no implication that Wikipedia endorses the neologism as somehow being “the real meaning” of the word.

Oh, did I mention there was also then a page called Santorum controversy, which is now called Santorum controversy regarding homosexuality, that also came up in the discussion? Well, now I have. Just wanted to be clear about that.

·     ·     ·

Late 2006-Early 2007—Although the matter seemed to have been handled appropriately, that didn’t stop editors from raising objections—even the very same objections—in the months following. In fact, someone had changed the article’s title back to Santorum (sexual slang) by the time the article came up for a second deletion debate in December 2006. The nominator focused on the fact that the media hits for the article were trivial—sure, The Daily Show and The Economist had used it, but neither had focused on it as a topic—while several less well-known sources appeared to be joining Savage’s campaign to popularize the term. Meanwhile, the nominator’s first argument was that the primary information was already covered in the Santorum controversy article (now you see why I mentioned it). Following a week’s worth of debate involving approximately two dozen Wikipedians and several thousand words…

The result was hopeless, hopeless lack of consensus.

(Emphasis in the original.) Lack of consensus to delete an article always means that it stays, and so it did. Some editors had suggested moving the article’s content to Wiktionary, Wikipedia’s dictionary sister project, where in fact the term had registered its own entry (without controversy) several months ahead of Wikipedia.

Later in December, one of the editors involved in the previous debate suggested moving the article from Santorum (sexual slang) to the oddly-titled Santorum (sexual slang activism), though the article stayed put. In January, a suggestion was made to merge the article back into the Savage Love entry, but that didn’t happen either.

·     ·     ·

Late 2007—Debate continued. In September, someone renamed it to Santorum (fluid)—ugh—and it was returned to Santorum (neologism), as it was then called. By this point, the article had grown substantially, was attracting the efforts of serious Wikipedians, and was… well, it was actually getting pretty good. In September 2007, the article was nominated for “Good article” (GA) status, and it looked like this. Later that day, the reviewing editor failed the article for including unsourced and “poorly sourced” material—The Onion in particular was singled out, although it was really an interview with Savage in the sister publication, AV Club—and for being a “BLP liability”.

That is to say, the article skirted the line of Wikipedia’s Biographies of living persons (BLP) policy, which aims to keep out scurrilous and weakly-sourced material about living persons that could be damaging to a living person’s reputation. As you might imagine, that had long been an issue; one couldn’t write about this topic without it being an issue. One could argue that Savage’s campaign was all about damaging Santorum’s reputation—I presume Dan Savage would agree to that—and yet it was nonetheless notable. Many editors then, and to this day, wished it would simply go away. And yet some wanted to make it as “good” as possible.

·     ·     ·

2008-2010—We can skip ahead, because after October 2007, fewer than 160 edits occurred in the three years intervening, and it was not changed substantially in that time. Santorum had lost his re-election bid in late 2006, re-entered private life in January 2007, and ceased to make headlines. In December 2007, the article looked like this. In January 2011, it looked like this. It was the same old back-and-forth, and not much happened.

·     ·     ·

Early 2011—As Santorum started making moves to run for president, activity picked up. In mid-February, Roll Call was first to write about Santorum’s “Google problem”, and this was dutifully added. The article continued to draw attention (including from vandals) through the end of February, until it was put under temporary “semi-protection”. When Stephen Colbert mentioned the controversy on his show, a not-so-brief summary was added, then removed, with the point made that “not everything Colbert says needs to be repeated in Wikipedia”. (Imagine that!) March and April were months of relative calm before the proverbial storm: nearly 1,000 direct edits, from May to this writing, lay just ahead.

·     ·     ·

May 2011—In early May, a very active and respected editor-administrator, User:Cirt, began a series of more than 300 edits to the article, starting with a long-overdue link to Wiktionary. By this point, the article contained some 1,600 words, excluding links and references. Cirt announced his intention to add “some research in additional secondary sources”, and four days later he had expanded the article to some 4,300 words. On the discussion page, one editor objected:

Expanding an article about a vile attack on a living person – it’s twice the size now and refs have gone from 33 to 95 – has got to be against the spirit of least of our BLP policy. My proposal, and my intention, stated right now, is to return this article to the content it had on May 9th.

This kicked off the first sustained debate in years—one that has arguably not yet come to a close. A proposal was made to “stub” the article, meaning to reduce the article’s length to a mere stub of an entry; the argument went, because the arguably unfair subject obviously met Wikipedia’s previously-determined standards for inclusion, a possible solution was to reduce it to the shortest possible version. This proposal quickly failed, with Cirt himself citing an earlier comment by veteran Wikipedian (and current Wikimedia Foundation fellow) Steven Walling:

The BLP policy is not a blank check for deleting anything negative related to a living individual. Criticism, commentary, and even base mockery of a public figure like a Senator is protected free speech in the United States. While it would be ridiculous for anyone to try and make Wikipedia a platform for creating the kind of meme Savage did, it is perfectly prudent for Wikipedia to neutrally report on the overwhelming amount of coverage given to the topic.

Remember that part about using Wikipedia as a platform—it will come up later. Meanwhile, Cirt continued to add significant information about media usage and analysis of the term and events surrounding Savage’s campaign, all backed up with acceptable references. In particular, he focused on adding uses of “santorum”, in slang dictionaries and even erotica, to support the article’s focus as legitimately about the neologism, and not Savage’s campaign per se.

For those who did not wish for Wikipedia to contribute to the so-called problem of making Savage’s campaign seem more important than it arguably was, it must have been more frustrating still to observe that the article was quite well-written and scrupulously followed Wikipedia’s style and sourcing guidelines. Cirt was nothing if not sophisticated. Many had the impression that the article itself was now an attack on Santorum, although that conclusion was only in the eye of the beholder. Cirt knew what he was doing and, for lack of a better phrase, Cirt knew exactly what he was doing. One editor objected:

I realize you will defend this bloated attack piece with all your skills (that is actually what I find most disturbing) but you have to realize or at least have noticed that many experienced editors disagree with your massive expansion of it and at some point it will require wider input and a community RFC.

By the end of May, the article had grown to more than five times the length of the article Santorum controversy regarding homosexuality and more than two-thirds the length of the primary Rick Santorum biographical article. Discrepancies of this sort have been well observed, most significantly on the Internet forum Something Awful, but no Wikipedia policy exists to require proportionality among articles.

At its greatest length, on May 31, the article surpassed 5,500 words, including headers but excluding photo captions, links and references—a total of over 77,000 bytes of data.

·     ·     ·

June 2011-Present— Were I to adequately summarize the debates and discussions that occurred beginning in late May and continuing sustainedly—with most debate occurring in June—this blog post could be three times its already considerable length. Instead I will attempt to summarize, although “considerable length” is unavoidable still.

From early June, Cirt pretty much stopped editing the article. To a significant extent, he’d become part of the issue, not just regarding this article but others as well, as can be seen on the discussion page for Cirt’s user account.

Among the many solutions offered around this time, one focused not on the article content itself, but rather its visibility on search engine results pages (SERPs). The editor offered, even if just for the sake of argument:

While I don’t really like the precedent, there’s nothing to say that every article needs to be indexed by search engines. … The majority of the concerns here seem to be focused on how people are coming across this article (via Google bombing, etc.), not necessarily that the article exists. … Both sides have legitimate points in their favor, so a compromise might be best here.

Other editors agreed it would set a bad precedent, and the suggestion did not go any further.

By now the topic had come to involve some of Wikipedia’s most influential editors, and a lengthy debate opened on Jimmy Wales’ discussion page. Wales’ take was as follows:

My only thought about the whole thing is that WP:COATRACK applies in spades. There is zero reason for this page to exist. It is arguable whether this nonsense even belongs in his biography at all, but at a bare minimum, a merger to his main article seems appropriate.

The “Coatrack” argument—one of many analogies Wikipedians have created over the years to illustrate key concepts—is not a policy or a guideline, but an informal essay, yet one with much currency. It states:

A coatrack article is a Wikipedia article that ostensibly discusses the nominal subject, but in reality is a cover for a tangentially related biased subject. The nominal subject is used as an empty coat-rack, which ends up being mostly obscured by the “coats”. The existence of a “hook” in a given article is not a good reason to “hang” irrelevant and biased material there.

In retrospect, it’s a little surprising that the “Coatrack” issue hadn’t been raised in any significant way before—and Wales is neither considered infallible nor is he always that involved in day-to-day Wikipedia issues—but this may yet have been a turning point. The next day, the highly respected User:SlimVirgin opened an RfC (Request for Comment) called “Proposal to rename, redirect, and merge content”. This led to the article being renamed, for a time, Santorum Google problem. Later, it was pointed out that “Google is not the only search engine in the world”, and so the search (as it were) continued.

The argument that the “neologism” had not evolved organically, but was the result of an organized campaign by Savage and his allies, had begun to exert some influence. For one thing, it was now quite clear that the majority of sources focused on the political campaign to bring relevance to the term, as opposed to the term’s relevance itself. In this way, one might say that Savage’s campaign had become a little too successful. Yes, the term was notable, but the controversy itself had become even more so.

Prior to the renaming mentioned above, editors in an adjacent thread had discussed several alternative names for the article. These included:

  • Santorum neologism controversy
  • Dan Savage santorum neologism controversy
  • Dan Savage santorum neologism campaign
  • Santorum neologism campaign
  • Spreading santorum (the name of Savage’s website)

Here one can start to see where the article’s current title would eventually emerge. Meanwhile, the article faced two more AfD (Articles for deletion) nominations, the first under its old name and the second under its current one. These were the fourth and fifth nominations overall, and surely the most futile.

As part of the ongoing RfC discussion in June, it had been strongly suggested that the article needed to be condensed, especially as Cirt’s expansion had contributed so significantly to the controversy. Besides the article expansion, in mid-May Cirt had created a new “footer” template, Template:Sexual slang, which further linked Rick Santorum’s name to dozens of NSFW topics. That template still exists, but on June 11 the link to Santorum (neologism) was removed. Again, it’s hard to say if this was another turning point, but a discussion about this template on Wales’ discussion page supports the notion that a consensus was coming into view: the article in its present form had itself become part of the campaign—that Wikipedia was being used as a platform for the campaign in the manner Walling had suggested.

A day later, a request for arbitration (RfAr)—a petition to the Arbitration Committee, Wikipedia’s equivalent of the Supreme Court—was opened against Cirt on the basis that his concerted efforts on the subject constituted “political activism”. On June 18 the request was rejected, but not before several dozen editors had contributed more than 28,000 words of opinion. One committee member wrote:

Decline for now, I’m inclined to think that this is more of a content dispute, and the community is able to cope with it.

On June 17, the community finally hit on a name that stuck: Campaign for “santorum” neologism. Initially, this was only intended as an interim move while further discussion took place. Among the names considered at this time, not all were serious, but most were:

  • Dan Savage santorum campaign
  • Dan Savage campaign
  • Dan Savage’s verbal attack on Rick Santorum
  • Santorum (sexual slang)
  • Santorum neologism campaign
  • Santorum neologism campaign
  • Santorum neologism controversy
  • Rick Santorum and homosexuality
  • Rick Santorum homosexuality controversy
  • Savage Santorum campaign
  • Dan Savage santorum neologism controversy
  • Dan Savage santorum neologism campaign
  • Spreading Santorum
  • Rick Santorum’s Google problem
  • Rick Santorum’s “Google problem”
  • Santorum Google problem
  • Rick Santorum Google problem
  • ‘Spreading santorum’ campaign
  • Campaign for “santorum” neologism
  • Dan Savage campaign for “santorum” neologism
  • Savage–Santorum affair (a reply: “Oh Please God No.”)
  • Savage–Santorum controversy
  • santorum (neologism)
  • The problem Rick Santorum is facing because every search engine in the world’s top search results says santorum is an anal sex by-product
  • Santorum (googlebomb)
  • SEO Campaign for “santorum” neologism
  • Santorum (cyberattack)
  • Santorum (cyberbullying)
  • Santorm (SEO attack)
  • Dan Savage’s “spreading santorum” campaign against Rick Santorum’s anti-gay stance
  • Santorum Google ranking problem
  • Dan Savage Google-bomb Attack on Rick Santorum
  • Campaign to attack Santorum’s name
  • Campaign to create ‘santorum’ neologism
  • Campaign to associate Santorum to neologism

In the end, inertia and the current title’s inherent virtues won out. Of the eventual “winner”—Campaign for “santorum” neologism—a veteran Wikipedian commented:

This one is growing on me – neutral, correct, to-the-point, and succinctly informative to readers both familiar and unfamiliar with the subject as to what the article will be about.

All that was left was to whittle the article down from its extreme length to a shape that covered the topic adequately, balancing relevance with discretion. While many edits were to follow, the key edit was made on June 21, when SlimVirgin replaced a 4,800-word version of the article (minus links and references) with a 1,400-word version. This is substantially the version of the article that remains in place today.

·     ·     ·

Comparing the late May version of the article, at its longest point, to the trimmed-down and refocused current version, here’s what we find:

  • The earlier version focused on the term in and of itself, with the opening sentence including a definition and describing its use. The current version focuses on the events, explaining the aim of Savage’s campaign—though the definition remains.
  • Excluding the lead section, references and external links, there are only three sections in the current version, compared with seven in the earlier (not including “See also” and “Further reading”, which were also removed).
  • The content of the “Background” section was almost entirely removed, leaving just the key facts about Rick Santorum’s statements in the 2003 Associated Press interview.
  • The section about the website “Spreading Santorum” was removed, details added into the “Campaign by Dan Savage” section.
  • Almost all of the “Recognition and usage” section was removed.
  • “Media analysis” and “Political impact” were combined into one, shorter, summarized section, focusing on the reception of the campaign in the media and its political impact.
  • Santorum’s response to the controversy was kept in the current article, however condensed.

Up to the present day, in the Talk page discussions alone (including the RfC discussion), more than 200,000 words have been written about the article. That is probably well short of the true number.

Perhaps surprisingly, the impact on Rick Santorum’s Wikipedia article was not that great—the article had long summarized the events in a short final paragraph concluding a heading relating to his statements about homosexuality—83 words at this count.

Meanwhile, Santorum’s “Google” problem continues. Conduct a logged-out search today, and here are the top three results:

And let’s not imagine the argument is completely over on Campaign for “santorum” neologism. Visit today, and one will find at the very top:

Images courtesy Wikipedia and Wikimedia Commons, licensed under Creative Commons. Additional research and analysis provided by Rhiannon Ruff.

Google’s Gift to Wikipedia Probably Not Evil

Tagged as , , , , , , , , , , , , ,
on March 3, 2010 at 11:29 pm

This is a few days old now, but if you haven’t already heard, Google gave Wikipedia $2 million dollars to help with its never-sated appetite for bandwidth and “increasing … multimedia needs.” Here are two of the Internet’s most important websites getting together, and I’d have thought it would’ve been worth more than a small roundup on Techmeme.

Reported the Wall Street Journal on Feb. 18:

Google Inc., the Internet’s most profitable company, is giving $2 million to support Wikipedia, a volunteer-driven reference tool that has emerged as one of the Web’s most-read sites.

Good.

Wikimedia Foundation, owner of Wikipedia, said Wednesday that Google has donated $2 million to further develop the popular encyclopedia and other projects.

Awesome. Right.

Jimmy Wales, Wikipedia’s founder, broke the news on Twitter on Tuesday, followed by a formal announcement from the nonprofit organization.

Twitter, well played.

Google co-founder Sergey Brin, in a statement, called Wikipedia “one of the greatest triumphs of the Internet…this vast repository of community-generated content is an invaluable resource to anyone who is online.”

You bet. Of course. But why now?

To some this raises the question of what Wikipedia might do for Google; after all, a sizable donation could be said to create the possibility of a Conflict of Interest. Previous donations, such as that from a conspicuous Silicon Valley VC and partner of Elevation Partners (not Bono), have raised eyebrows. And everyone knows about Jimmy Wales’ occasional willingness to cut special someones (and Google is) a break — at least until the community gets involved.

But this question is probably backward. Wikipedia already helps Google, and by helping Wikipedia, Google helps itself.

Google depends on Wikipedia to provide topical, authoritative results at the top of its search results pages (SERPs, in SEO-speak) on more subjects than any other website. One occasionally-discussed, conspiracy-tinged theory has Google purposefully privileging Wikipedia precisely because it “cleans up” their search results. That’s possible.

But that isn’t needed to explain Wikipedia’s prominence on Google. It guarantees, for a range of topics functionally as vast as Google searches are regularly performed, an end result that is usually informative, free (as in beer, but liberty too) and not-for-profit, “not evil” and reliably neutral in a Switzerland kind of way. From what we know about Google’s recommendations for webmasters, no website is so organized as well around the Google algorithm as Wikipedia, whether we’re talking about software, community or purpose. It’s basically Google’s perfect website.

Yeah, I would give Wikipedia $2 million, too. And even though it’s positively swimming in cash, I’d probably give it some more.

Ken Auletta on Wikipedia

Tagged as , , , , , ,
on October 30, 2009 at 11:46 am

On Sunday November 1, New Yorker media writer Ken Auletta will appear on C-SPAN‘s “Q&A” with host and network founder Brian Lamb. In the three-minute excerpt below, Auletta talks about Google’s algorithm, search engine optimization, and Wikipedia:

Auletta’s expertise stretches far beyond the media mogul interviews he writes for his magazine’s editors — in 2001 he wrote a book on Microsoft and its enemies — but wait for the part where Lamb stumps Auletta on Google search results.