William Beutler on Wikipedia

Archive for 2012

The Top 10 Wikipedia Stories of 2012 (Part 2)

Tagged as , , , , , , , , , , ,
on December 31, 2012 at 9:02 am

For the past two years The Wikipedian has compiled a list of the top 10 news stories about Wikipedia (2010, 2011), focusing on topics that made mainstream news coverage and those which affected Wikipedia and the larger Wikimedia community more than any other. Part 1 ran on Friday; here’s the dramatic conclusion:

♦     ♦     ♦

5. The Gibraltarpedia controversy — Like the tenth item in our list, file this one under prominent members of the UK Wikimedia chapter behaving badly. In September, board member Roger Bamkin resigned following complaints that he had used Wikipedia resources for personal gain—at just about the worst possible time.

Bamkin was the creator of an actually pretty interesting project, Gibraltarpedia, an effort to integrate the semi-autonomous territory of Gibraltar with Wikipedia as closely as possible, writing every possible Wikipedia article about the territory, and posting QR codes around the peninsula connecting visitors to those articles. It was closely modeled on a smiliar project, with which Bamkin was also involved, called Monmouthpedia, which had won acclaim for doing the same for the Welsh town of Monmouth.

Problem is, the government of Gibraltar was a client of Bamkin’s, and Bamkin arranged for many of these improved articles to appear on the front page of Wikipedia (through a feature of Wikipedia called “Did you know”). Too many of them, enough that restrictions were imposed on his ability to nominate new ones. At a time when the community was already debating the propriety of consultant relationships involving Wikipedia (more about this below) Bamkin’s oversight offended many within the community, and was even the subject of external news coverage (now of course the subject of a “Controversy” section on Gibraltarpedia’s own Wikipedia page).

(Note: A previous version of this section erroneously implied that Bamkin was not involved with Monmouthpedia, and was then board chair as opposed to trustee. Likewise, it suggested that disclosure was the primary concern regarding DYK, however the controversy focused on issues of volume and process. These errors have been corrected.)

4. Wikipedia’s gender imbalance — This one is down one spot from last year, but the undeniable fact that Wikipedia is overwhelmingly male (like 6-1 overwhelmingly) seems to have replaced Wikipedia’s falling editor retention as the primary focus of concerns about the long-term viability of Wikipedia’s mission. The topic was given center stage during the opening plenary at the annual Wikimedia conference, Wikimania DC, and has been the subject of continuing news coverage and even the focus of interesting-if-hard-to-decipher infographics. Like Wikipedia’s difficulty keeping and attracting new editors, the Wikimedia Foundation is working on addressing this as well, and no one knows precisely how much it matters or what to do about it. For further reading: over the last several weeks, my colleague Rhiannon Ruff has been writing an ongoing series about Wikipedia and women (here and here).

3. Wikipedia’s relationship with PR — I’m reluctant to put this one so high up, because one could say that I have a conflict of interest with “conflict of interest” as a topic (more here). But considering how much space this took up at the Wikipedia Signpost and on Jimmy Wales’ Talk page over the past 12 months, it would be a mistake to move it back.

This one is a continuation from last year’s #8, when a British PR firm called Bell Pottinger got caught making a wide range of anonymous edits to their client’s articles. The discussion continued into early 2012, including a smart blog post by Edelman’s Phil Gomes that focused the discussion on how Wikipedia and PR might get along, a public relations organizations in the UK developing a set of guidelines for the first time, and a similar organization in the US releasing a survey purporting to demonstrate problems with Wikipedia articles about companies, though it wasn’t quite that.

For the first time since 2009, the topics of “paid editing” and “paid advocacy” drew significant focus. New projects sprung up, including WikiProject Cooperation (to help facilitate outside requests) and WikiProject Paid Advocacy Watch (to keep tabs on said activity). Jimmy Wales spelled out his views in as much detail as he had before, and the Wikipedia Signpost ran a series of interviews over several months (called “Does Wikipedia Pay?”), covering the differing views and roles editors play around the topic. But after all that, no new policies or guidelines were passed, and discussion has quieted a bit for now.

2. Britannica admits defeat — In the year of our lord 2012, Encyclopædia Britannica announced that it would stop publishing a print edition and go online-only. Which means that Britannica essentially has ceased to exist. The 244-year-old encyclopedia, the world’s most famous until about 2005 or so, has no real web presence to speak of: its website (which is littered with annoying ads) only makes previews of articles available, and plans to allow reader input have never gone anywhere. Wikipedia actually had nothing to do with Britannica’s decline, as I pointed out earlier this month (Microsoft’s late Encarta started that), but the media narrative is already set: Britannica loses, Wikipedia wins. Britannica’s future is uncertain and the end is always near, while Wikipedia’s time horizon is very, very long.

Wikipedia SOPA blackout announcement

1. Wikipedia’s non-neutral protest on U.S. Internet law — Without question, the most significant and widely-covered Wikipedia-related topic in the past year was the 24-hour voluntary blackout of Wikipedia and its sister sites on Wednesday, January 18. Together with a few other websites, notably Reddit, Wikipedia shut itself down temporarily to protest a set of laws under consideration in the U.S. House and Senate, called the Stop Online Piracy Act (SOPA) and PROTECT IP Act (PIPA), supported by southern California (the music and movie industry) and opposed by northern California (i.e. the Silicon Valley).

The topic basically hit everyone’s hot buttons, and very different ones at that: the content companies who believe that online piracy is harming their business, and the Internet companies who feared that if the bills became law it would lead to censorship. You can imagine which side Wikipedia took.

But here’s the problem: Wikipedia is not one entity; it’s kind of two (the Foundation and volunteer community), and it’s kind of thousands (everyone who considers themselves a Wikipedian). While there seemed to be a majority in favor of the protest, the decision was arrived at very quickly, and many felt that even though they agreed with the message, it was not Wikipedia’s place to insert itself into a matter of public controversy. And one of Wikipedia’s core content policies is that it treats its subject matter with a “neutral point of view”—so how could anyone trust Wikipedia would be neutral about SOPA or PIPA?

But the decision had been made, and the Foundation (which controls the servers) had made the call, and even if you didn’t like it, it was only for 24 hours. And it certainly seemed to be effective: the blackout received the abovementioned crazy news attention, and both bills failed to win wide support in Congress (at least, for now). And it was a moment where Wikipedia both recognized its own power and, perhaps, was a little frightened of itself. For that alone, it was the biggest Wikipedia story of 2013.

The Top 10 Wikipedia Stories of 2012 (Part 1)

Tagged as , , , , , , , , , , , , , , ,
on December 28, 2012 at 12:18 pm

In these waning days of 2012, let’s take this opportunity—for a third year in a row—to look back and come up with a list of the most important Wikipedia news and events in the last 12 months. Like our first installment in 2010 and our follow-up in 2011, the list will be arbitrary but hopefully also entertaining. There is no methodology to be found here, just my own opinion based on watching Wikipedia, its sister projects and parent organization, and also thumbing through the Wikipedia Signpost, Google News and other news sites this past week. So what are we waiting for?

Wait, wait, one more thing: this post ended up being much longer than I expected, and so I’ve decided to split this in two. Today we publish the first five items in the list, 10-6. On Monday 12/31 we’ll publish the final five. Enjoy!

♦     ♦     ♦

10. Wikipedia bans a prominent contributor — Let’s start with something that did not make the news outside of the Wikipedia / Wikimedia community at all, but which took up a great deal of oxygen within it. It’s the story of a prominent editor and administrator who goes by the handle Fæ. In April of this year, he was elected to lead a new organization within the community based on his leadership of the UK chapter. The move was not without controversy: Fæ’s actions both on Wikipedia and the sister site Wikimedia Commons (best known as a vast image repository) and interactions with editors became the subject of intense scrutiny, and even an ArbCom case (the Arbitration Committee is sort of like Wikipedia’s Supreme Court). Fæ ended up resigning his adminship—he basically jumped to avoid being pushed—and the end result had him banned from editing Wikipedia, which he still is. Not that he’s gone away—he’s still a contributor to Commons, and a very active one.

This might sound like a lot of insider nonsense, and I’m not about to dissuade you from this viewpoint. (Sayre’s law applies in spades.) But the key issue involved is about governance: is the Wikimedia community’s organizational structure and personnel capable of the kind of leadership necessary to maintain and build on this important project? The Fæ incident (along with other incidents in this list) suggests the answer may be no.

9. Confusing software development — Not all of Wikipedia’s contributors are focused on editing articles. Some are also developers, working on the open source software to keep Wikimedia sites running and, perhaps, improving. Some (but not all) are paid staff and contractors, and the hybrid part-volunteer, part-professional organizational structure can make it difficult to get projects off the ground.

One longtime project that has yet to see wide implementation is a “visual editor” for Wikipedia articles, to make editing much easier for users. Everyone knows that the editing interface for Wikipedia articles feels like software programming, and almost surely turns away some potential contributors (though it’s not the main reason people don’t contribute, as a 2011 Wikimedia survey showed). But the visual editor is a bigger technical challenge than one might think (as recently explained by The Next Web), and the outcome of a current trial run (also not the first) is anyone’s guess.

Another announced with a great deal of hype but which no one really seems to understand is Wikidata. It calls itself a “common data repository” which by itself sounds fairly reasonable, but no one really knows how it will work in practice, even those now developing it. Wikidata could be a terrifically innovative invention and the very future of Wikimedia… but first we need to find out what it does.

Other projects have been released, but have received thoughtful criticism for adding little value while diverting resources from more worthy projects. For example, a feature briefly existed asking you to choose whether a smiley face or frowny face best represented your Wikipedia experience. Uh, OK? Some projects have been better-received: the Wikipedia iPhone app, for example, is a definite improvement over the mobile site. But there are some odd decisions here, as well: does Wikipedia really need an app for the failed Blackberry Playbook?

8. Sum of human knowledge gets more human knowledge — If you’ve ever seen a [citation needed] tag on Wikipedia—and I know you have—then you know that, well, citations are needed. And while citations do actually kind of grow on trees (if by “trees” we mean “the Internet”) there is a lot of information out there which isn’t readily searchable on Google, and sometimes that information costs money. This year, some of those paid services cracked the door open just a bit.

The interesting story to the HighBeam Research partnership is that there really isn’t one. First of all, HighBeam is a news database which charges for reader access to its vast collection of articles. But in March, a volunteer Wikipedia editor who goes by the name Ocaasi reached out to HighBeam and asked if they would be willing to grant free access to Wikipedia editors. They said yes—and supplied one-year, renewable accounts to editors with at least one year’s experience and 1,000 edits. For Wikipedia, it meant greater access to information. For Highbeam, it meant a 600% increase in links to the site in the first few months of the project. Seems like a fair trade.

More recently, the Wikimedia Foundation announced an agreement with the academic paper storehouse JSTOR, making one-year accounts available to 100 of the most-active Wikipedia editors. With almost 240 editors petitioning for access, if you haven’t spoken up yet, chances are you’re a bit too late.

7. The first person to 1 million edits — OK, how about a fun one? In April, a Wikipedia editor named Justin Knapp, who uses the handle Koavf, became the first person to make 1 million edits to Wikipedia. To the surprise of everyone, perhaps none more than Knapp himself, this made him an overnight international celebrity of the Warhol variety. Jimmy Wales even declared April 20 “Justin Knapp Day” on Wikipedia.

It’s worth pointing out that most editors with many, many edits to their name typically are involved in janitorial-style editing activities, such as fighting vandals or re-organizing categories. And many very active editors spend a lot of time squabbling with others on the so-called “drama boards” such as Administrators’ noticeboard/Incidents. Not Knapp: his edits over time have overwhelmingly focused on creating new articles, plus researching and improving content in existing ones. In short: Wikipedia doesn’t need more editors—it needs more Justin Knapps.

Also, this is one I actually played a small role in, as verified by Knapp’s own timeline of events. I’d happened to see someone note the fact on Jimmy Wales’ Talk page that day, which I tweeted, and was then picked up by Gawker’s Adrian Chen, and the rest is history. Actually, then Knapp kept right on editing Wikipedia. As of this writing, he’s closing in on 1.25 million edits.

6. Philip Roth’s Complaint — Wikipedia has been extraordinarily sensitive to complaints by living people the subject of articles ever since a 2005 incident where a veteran newspaper editor found his article maliciously vandalized to implicate him in the murder of the brothers Kennedy.

In what was arguably the biggest row since then, in September 2007 the celebrated, prickly author of Portnoy’s Complaint, American Pastoral and numerous other novels took to the pages of The New Yorker to issue “An Open Letter to Wikipedia” complaining that the site had the inspiration for his 2000 novel The Human Stain all wrong. And this wasn’t his first resort: Roth’s first attempt had been to authorize his biographer to change the article directly, which was rebuffed. His consternation here: not inexplicable.

But Roth’s complaint was not really with Wikipedia. Several book reviewers had speculated (apparently incorrectly) about the real-life basis for the novel’s central figure, and it was these speculations which had been introduced to Wikipedia. Roth’s publicity campaign brought the issue to much wider attention, which got his personal explanation of the novel’s inspiration into Wikipedia. However, in a twist on the Streisand effect, the controversy is now the subject of a longish and somewhat peevish section written by editors perhaps irked by Roth’s campaign. So he got what he wanted, plus more that he didn’t. Shall we call it the Roth effect?

♦     ♦     ♦

Look here on Monday for the thrilling conclusion to The Top 10 Wikipedia Stories of 2012!

Wikipedia is Not Finished, But Its Needs are Changing

Tagged as , , , ,
on December 18, 2012 at 9:14 am

Earlier this fall, a very interesting and not too-academicky paper on how Wikipedia’s article about the War of 1812 (by historian and Wikipedian Richard Jensen) somehow begat an Atlantic web story with the wishy-washy subheading “Wikipedia is Nearing Completion, in a Sense” which begat this less subtle, more alarming headline in the UK Independent: “Is Wikipedia Complete?

Wikipedia doomsaying is a popular pastime among technology writers (one can’t exclusively rely on Apple doomsaying, after all) and this isn’t even the first go around for this particular variant. But this one is more annoying than the usual complaint that Wikipedia is losing editors, because proclaiming Wikipedia complete is more likely to suggest that one shouldn’t consider get involved. Why bother? Wikipedia’s finished.

Of course, it’s not. The Atlantic’s Rebecca J. Rosen acknowledges this briefly, quoting Jensen as follows:

Wikipedia is now a mature reference work with a stable organizational structure and a well-established reputation. The problem is that it is not mature in a scholarly sense.

Just so. Yes, Wikipedia already has more than 4 million articles in the English language. The problem is that a great many of them just aren’t very good. An article may exist, but it might not contain much information. It may contain some decent information, but some of it may be wrong. It may have been correct at one time, but has since become outdated. Or an article may have lots of information, but it may not be well-organized. Just because an article exists does not mean the job is done. What it really means is the job of cultivating that specific slice of human knowledge—whether about the War of 1812 or the 18½ minute gap or —has only just begun.

The problem Wikipedia faces is that it has many, many more readers than editors (only 6% of readers have ever tried, according to a 2011 survey) even if the line between them is supposedly no thicker than choosing to click the “Edit” button at the top of a page.

For almost any topic you can thing of, it can seem like there is already an article. What’s more, the topics which are most well-known, especially those related to current events, tend to be extremely well-developed and already saturated with editors. An edit on a page like President of the United States is likely not to last long before someone else comes along and changes it. The uncomfortable truth is that the veteran editor is probably right, insofar as Wikipedia’s standards are concerned. But that doesn’t make it any less discouraging to new editors.

♦     ♦     ♦

So, where can new Wikipedians gain confidence, knowledge of Wikipedia’s editing style, and make edits that really make a difference? The answer lies with Wikipedia’s vast collection of underdeveloped articles—those far outside of the daily news cycle, focused on topics dating to the pre-Wikipedia age, and which could be much better, but have lacked for sustained interest from foregoing editors.

As someone who reads Wikipedia daily, I come across these all the time. I also decided to ask some colleagues about what kind of article categories might be particularly neglected. Here are just a few topics that we see (and please note that we are all native English speakers from the U.S. and UK in our late 20s and early 30s, so YMMV) where new editors can dive in and start adding information and sources:

  1. 1990s rock albums: A surprisingly large number of rock albums from the ’90s have just a stub article—one that has very little information other than a basic description of the album. Follow the link, start by clicking on titles that you’re familiar with, and it won’t take long to find one that needs some help. The wider Internet has no shortage of reviews from music publications, which should be just what you need to add new details.
  2. 1990s comedy films: There’s a theme here, and one that speaks to the demographics of Wikipedia: the missing age group of 29- to 40-year-olds has left the encyclopedia with a gap in its collective knowledge: the 1990s! Once again, you can follow the link, pick any film and help improve it. Just remember: you can’t use IMDb (not a reliable source!) but you probably can use articles IMDb links to.
  3. Historical novels: If you’re not into reminiscing about the 1990s, perhaps you’d like to look back a bit further in time. In which case, the historical novel stubs listed here might be right up your alley—or galley, since there are a few of C.S. Forester’s nautical-themed Hornblower novels listed here…
  4. Fairy tales: Still on a literary note, a surprising number of articles on well-known fairy tales are lacking references or still in stub form. See if any of your childhood favorites need some work.
  5. Cartoonists: Biographies are a good topic area for any beginner on Wikipedia and there are no shortage of sub-topics to choose from that need development. There’s a whole list of cartoonists here whose articles are currently just stubs, why not dive in and see if there’s one you’re familiar with?

If you’re thinking about starting to edit Wikipedia and the thought of trying to improve a whole article seems overwhelming, here’s a few ideas for small fixes that you can make in any article of your choosing:

  1. Read through an article and fix any typos or formatting errors.
  2. Remove any obvious vandalism or pure nonsense you come across.
  3. Look at information in infoboxes (the sidebars that appear at the top right of articles) and check that it is correct and up-to-date.
  4. Rewrite sentences that don’t make sense or are obtusely worded.
  5. Fact-check: choose a claim from an article with no citation, then find a book or another quality source to verify the statement.

I fully acknowledge that all of the above is easier said than done. Even though Wikipedia is the encyclopedia anyone can edit, that doesn’t mean everyone does. But it is possible for anyone to learn, given the right inspiration. With this post—and who knows, maybe more like it to come?—I’d like to help others find it.

Thanks to Rhiannon Ruff, Morgan Wehling and Pete Hunt for help with this post.

Wikipedia Didn’t Kill Britannica—It Saved the Encyclopedia

Tagged as , , , , , ,
on December 11, 2012 at 11:40 am

Mary Meeker is a venture capitalist associated with the famous Silicon Valley VC firm Kleiner Perkins who is—as Wikipedia describes her—“primarily associated with the Internet”. Indeed, her annual “Internet Trends” report is highly anticipated in the Valley. Her 2012 report is no different, and it includes a couple of slides focused on Wikipedia vs. Britannica (see also: “Regarding the Uncertain Future of Encyclopædia Britannica”, March 14, 2012). Here’s the important one:

My first reaction, as I tweeted last week, was to be fairly unimpressed:

But looking at it again, it’s quite obvious that for all the discussion of Wikipedia “killing” Britannica, this is not the case at all. First of all, as Wired’s Tim Carmody correctly observed earlier this year, Britannica’s sales began to falter with the introduction of Microsoft Encarta in 1993. If Meeker’s numbers are accurate, then the debut of Wikipedia in 2001 had no impact whatsoever on Britannica’s declining fortunes. Nor does Britannica’s downward slope appear to have accelerated with the rapid adoption of the Internet from the late 1990s onward.

The y-axis of Meeker’s chart, if anything, downplays Wikipedia’s ubiquity compared to Britannica’s sales. Being logarithmic scales charting different numbers, truth be told, I think it’s kind of a terrible chart, but it’s still readily apparent that Wikipedia is vastly more accessible to readers than Britannica ever was. Anecdotal evidence obviously supports this: I’ll bet anything you look at Wikipedia more now than you ever did Britannica, and there are millions who never had access to Britannica before, but can read Wikipedia now.

One thing I would have liked to see here is Britannica.com’s online traffic; writing as one who was in college during the late 1990s and used Britannica.com when it was a free resource, I’d imagine its true relevance nosedived when the site erected a paywall sometime around the year 2000, not that this would necessarily influence print sales.

The bottom line is clear: Britannica’s failure and Wikipedia’s triumph have nothing to do with one another, apart from the inexorable migration of information from analog to digital, and from physical to cloud-based storage. And here is the vastly more interesting trend question: what will eventually replace that?

For the full Meeker report, click here.

Linux distributions vs. wedding dresses: the gender gap impact

Tagged as , ,
on November 19, 2012 at 3:10 pm

Editor’s note: The author of this post is Rhiannon Ruff (User:Grisette) and is part of a series on female editors of Wikipedia. Her most recent post—the first in the series—was “All The Women Who Edit Wiki, Throw Your Hands Up At Me” on November 8, 2012.

Continuing this series on women and Wikipedia, this week I’d like to give a quick overview of the gender gap and its impact. Let’s start with what we already know: female Wikipedia editors are in the minority of those making edits to the site’s articles and Talk pages on a regular basis. Earlier this year, a research project by Santiago Ortiz found that on average there are 12.9 male editors to each female editor editing a given article. This is an issue that Wikipedians are very familiar with. For many, the real concern is not just that women aren’t participating, but that their relative absence may have led to gaps in Wikipedia’s collective knowledge.

In early 2011, Noam Cohen wrote an oft-cited article for the New York Times which made the point that Wikipedia’s coverage of topics more likely to be of interest to women tended to be much less well developed than for corresponding topics of interest to men. Indeed, anecdotal evidence exists for a gendered take on notability: in some cases, articles on female-oriented topics have been nominated for deletion, not considered “notable” by (mostly) male editors. In particular, Torie Bosch wrote on Slate.com about the deletion debate around the Wikipedia article Wedding dress of Kate Middleton, which survived after editors including Jimbo Wales fought for it to remain. Bosch also described how several new articles on female historical figures created during a Smithsonian archives “edit-a-thon” were later nominated for deletion—one more than once.

(As an aside: I personally find it offputting how this gender gap topic is often addressed. For instance, Cohen’s article specifically mentions the poor state of the articles on the TV series Sex and the City and fashion designer Jimmy Choo as indicators of missing female editors. Examples like these are more than a little patronizing and hard to take seriously. I’m not the only one who feels this way.)

The gender gap doesn’t just affect what articles get created (and don’t get deleted): the quality of certain articles may be affected by the dearth of female editors, too. In January 2011, Wikipedia’s newsletter, The Signpost, included a piece in which Wikipedia article quality was compared between the most famous male and female scientists from Science magazine’s Science Hall of Fame. The author of the Signpost article found that the top ten male scientists’ articles are mostly rated a “B” on Wikipedia’s article quality grading scheme, and include one Good Article and one Featured Article, while the top ten female scientists’ articles are all rated Stub or Start class (with the exception of Marie Curie). Worth noting: the author explained the conclusion isn’t a clear cut case of gender imbalance, since the female scientists were generally less well-known than the men, which could have an impact on both number of editors interested in the articles and availability of material to improve them.

An interesting question in light of all the above: what exactly are women editing on Wikipedia? If we look at one of Wikipedia’s most well-known female editors, SlimVirgin, who’s had a key role in 10 Featured Articles—no mean feat—we can get an idea of what a prolific female editor works on. Her Featured Articles span a range of topics, from the biographical article for Palestinian political leader Abu Nidal to the article on the Brown Dog Affair, an Edwardian-era political controversy about vivisection. No obvious gender bias here. Nor is there any big difference between male and female editors in terms of types of edit according to a 2011 study titled Gender Differences in Wikipedia Editing. The study’s authors found there was no evidence that men and women tend to make different sized edits or that one gender prefers fixing text to adding new text. In short, it seems the gender gap issue isn’t as simple as “get female editors, solve knowledge gaps”; it may have a lot to do with the types of article or information that people drawn to Wikipedia editing are most interested in. (Yes, I’m saying that Wikipedia editors are likely to be more interested in Linux than dresses, sorry Jimmy Wales!)

While writing this post I was intrigued to see if picking 10 editors at random from the Female Wikipedians category and looking at their most recent edits would provide any insight. Disappointingly, seven out of the ten hadn’t edited in over two years, and of the remaining three only one had made an edit in article space in the last year. This result is certainly indicative of Wikipedia’s broader problem of editor retention, but it also speaks to the particular issues Wikipedia has had retaining female editors. Which leads nicely to the topic of my next post… the issues involved in recruitment and retention of female editors. Look for that here soon, meanwhile (for U.S. readers) have a wonderful Thanksgiving!

All The Women Who Edit Wiki, Throw Your Hands Up At Me

Tagged as , , , ,
on November 8, 2012 at 2:16 pm

Editor’s note: The author of this post is Rhiannon Ruff (User:Grisette) who last wrote “Public Lives: Jim Hawkins and Wikipedia’s Privacy Dilemma” for The Wikipedian in April 2012.

It’s no secret that the majority of those editing Wikipedia on a regular basis are men. It’s one of the best-known facts about the Wikipedia community and a situation that doesn’t appear to be changing over time. In fact, from 2010 to 2011, the proportion of women editors actually dropped, from 13% to just 9%, according to an independent survey by Wikipedian Sarah Stierch. And it does seem, at least from the media coverage, that this contributes to some bias in content. This issue not taken lightly by the Wikimedia Foundation, which has set a goal of “doubling the percentage of female editors to 25 percent” by 2015, as part of its Strategic Plan.

Over the next few weeks, I’ll be writing here about content bias and what women are actually editing on Wikipedia, and the issues involved in encouraging more women into such a male-dominated space. First, though, let’s round up recent efforts to get more women involved with Wikipedia.

  1. The Wikipedia gender gap mailing list: Founded back in January 2011, subscribers to the list offer up ideas, share experiences, discuss issues and help to develop events and programs. Among recent updates, the list shared news of the latest Wikipedia Editor Survey and the launch of the new WikiProject Women scientists. 295 people are subscribed to the list.
  2. WikiWomen Camp: The inaugural camp was held in Argentina in May 2012. While not focusing on the gender gap, the conference was for female Wikipedia editors to network and discuss projects. A total of twenty women from around the world attended.
  3. WikiWomen’s History Month: March 2012 was the first WikiWomen’s History Month, where editors were encouraged to improve articles related to women in history. During the month 119 new women’s history articles were created and 58 existing articles were expanded.
  4. Workshop for Women in Wikipedia: This project to create in-person workshops encouraging women to edit Wikipedia was started in 2011 and is ongoing. So far, workshops sharing technical tips and discussing women’s participation have been held as part of the WikiConferences in Mumbai (2011) and Washington, D.C. (2012), as well as individual workshops held in D.C., Pune and Mumbai.
  5. The WikiWomens Collaborative: Launched at the end of September 2012, the Collaborative is a Wikimedia community project with its own Facebook page and Twitter account, designed to create a collaborative (hence the name) and supportive working space for women. Participants share ideas for projects, knowledge about Wikipedia and particularly support efforts to improve content related to women. Projects promoted by the Collaborative include Ada Lovelace Day, when participants were encouraged to improve articles related to women in math and science, including via an edit-a-thon organized by Wikimedia UK and hosted by The Royal Society in London. So far, the Collaborative has over 500 Twitter followers and 414 Likes on Facebook.

With all this activity, it’ll be interesting to see the results of the 2012 Wikipedia Editor Survey to see whether there has been any positive shift in the numbers of female editors. Look for those results early next year. Meanwhile, stay tuned here for my next post discussing gendered patterns of editing and Wikipedia’s knowledge gaps.

What I Did This Summer

Tagged as , , , , , ,
on September 7, 2012 at 4:13 pm

It’s been a few weeks since I last posted on The Wikipedian—at the time I had just finished covering Wikimania right here in Washington, DC, and I had made at least one promise to write a wrap-up post. Alas, that never happened: between work and travel and other obligations, I’m afraid “August 2012” will forever remain a blank spot in my archives. Well, it wouldn’t be the first time. But there is a good reason, and one related—just a bit—to Wikipedia.

Over the last two years, and more intensively during the past two months, I have been working on a very large, personal project, and on Monday it was finally ready for release. It’s called The Infinite Atlas Project. As I’ve described it elsewhere, the goal is to identify, place, and describe every cartographic point I could find in David Foster Wallace’s iconic 1996 novel Infinite Jest—whether real, fictional, real but fictionalized, defunct or otherwise.

The project is tripartite, and the first part launched in mid-July: Infinite Boston, a photo tour hosted by Tumblr, which I’m writing daily through the end of this month. Launched just this week are two more ambitious efforts: a 24″x36″ poster called Infinite Map, plotting 250 key locations from the novel’s futuristic North America (and available for purchase, just FYI); and one not constrained by the dimensions of paper: Infinite Atlas, an interactive world map powered by Google Maps including all 600+ global locations that I was able to find with the help of my researchers (i.e. friends who had also read the novel). You can read much more about this on the Infinite Boston announcement post or on the Infinite Atlas “About” page, but here are screen shots of each:

Infinite Map     

Meanwhile, there are some aspects to the project that I think will be of interest to Wikipedians. For example, on the Infinite Atlas website, every entry that has a relevant Wikipedia article links back to it—whether to the exact location, such as the Cambridge Rindge and Latin School—or to the closest approximation, like the Neponset exit ramp, I-93 South. Among the development projects related to the online atlas, this was one of the last, but I think one of the most helpful. Yes, it’s interesting to the reader to be reminded that a key character stays at McLean Hospital in Belmont, Massachusetts, but it’s even more useful to confirm that McLean Hospital is a real place with more than 200 years of history. And both sites will tell you that DFW himself was a notable former patient.

Additionally, and importantly, the site is published under a Creative Commons license. For a research and art project based on a copyrighted fictional work—quoting judiciously and keeping fair use in mind, I stress—I figured it was important to disclaim any interest in preventing people from using it how they see fit—so long as they attribute and share-alike, of course. And another big reason for doing so: readers are invited to submit their own photos, so long as they are willing to approve their usage under the less-restrictive CC-BY license. If you live in one of the many locations around the world (though mostly in the U.S. and Canada) featured in the book, and now in the atlas, consider yourself invited to participate.

Live though these projects are, they are not finished and might not ever be. Which is part of the fun. And in that way like Wikipedia itself. Now maybe I’ll finally get around to fixing up the Infinite Jest Wikipedia entry and taking it to FA…

My Wikitinerary: Day 3 at Wikimania DC

Tagged as on July 14, 2012 at 6:22 am

Wikimania logoWe have arrived at the last day (of official events) at Wikimania, which begins shortly with an opening plenary by the Wikimedia Foundation’s executive director, Sue Gardner. As expected, my Wikimania attendance yesterday was limited on account of other obligations; today I’ll be around for most of the events. Here are a few of the panels and presentations I’m interested in today:

♦     ♦     ♦

10:30 – 11:50

Title: Getting elected thanks to Wikipedia. Social network influence on politics.
Speaker: Damian Finol
Category: Wikis and the Public Sector
Description: Wikipedia and politicians is a contentious topic—one I wrote about for Campaigns & Elections in April 2010. This seems to be a bit different: it will be focused on Venezuelan politics, but the question: does having a good Wikipedia page help win elections? is one I’d like to hear how others would answer.

Title: Iterate your cross-pollinated strategic synergy, just not on my Wikipedia!
Speaker: Tom Morris
Category: WikiCulture and Community
Description: Like any small community focused on a unique project, Wikipedia and its Wikimedia sister projects have developed a kind of jargon all its own. This talk will focus on the language used on WMF and how it can be simplified for clarity, especially to encourage participation of new editors and non-native English speakers.

Title: Wikimedia on social media
Speaker: Jeromy-Yu Chan, Tango Chan, Slobodan Jakoski, Kiril Simeonovski, Guillaume Paumier, Naveen Francis, Christophe Henner
Category: WikiCulture and Community
Description: As I tweeted the other day, English-speaking Wikipedians are often disdainful of Facebook, for reasons that would take some time to unpack. Twitter too was disfavored for the similar service Identi.ca—the latter is open source, a plus for many—although I think the Twitter has gained a share of acceptance by now. Indeed, the proceedings of Wikimania have been heavily tweeted, just like any conference. So: “The goal of this panel is to share experience on the use of social media throughout the Wikimedia movement, and to share best practices to collectively improve our use of these communication channels.” What are best practices now?

12:10 -13:30

Title: What does THAT mean? Engineering jargon and procedures explained
Speaker: Sumana Harihareswara and possibly Rob Lanphier or additional members of the engineering staff of the Wikimedia Foundation
Category: Technology and Infrastructure
Description: Speaking of jargon, this is supposed to be a non-techie explanation of the technical aspects of Wikimedia. As a non-techie, I could stand for someone to explain how Wikipedia uses squids to me again.

Title: The bad assumptions of the copyright discussion; Blacking out Wikipedia
Speaker: James Alexander; panel
Category: Wikis and the Public Sector
Description: January’s Wikipedia blackout in protest of proposed U.S. legislation tightening copyright and intellectual property enforcement on the web (SOPA and PIPA) was very controversial, and remains so. Jimmy Wales, in his opening plenary, addressed the issue, suggesting blackouts would be considered only for similar issues. The first talk is shorter and appears to be on the issue of copyright. The panel is longer and will discuss the decision to blackout, and how the blackout worked, how the blackout page was designed and the media’s response.

14:30 – 15:50

Title: 11 years of Wikipedia, or the Wikimedia history crash course you can edit
Speaker: Guillaume Paumier
Category: WikiCulture and Community
Description: Exactly what it sounds like, a history lesson on the last 11 year years of Wikimedia/pedia history. This is a 70 minute talk. Having read Andrew Lih’s “The Wikipedia Revolution” and Andrew Dalby’s “The World and Wikipedia” there is probably not much here I won’t know about already, but I still find it interesting nonetheless.

Title: The end of notability
Speaker: David Goodman
Category: WikiCulture and Community
Description: Notability, on Wikipedia, refers to a widely-discussed guideline which recommends whether a given subject deserves a standalone Wikipedia article or not. It is very contentious, it is the inspiration for the ideological split between inclusionists and deletionists, and was a key focus of John Siracusa in the “Hypercritical” podcast episode I wrote about earlier this year. This talk will focus on the topic of notability guidelines and how we can’t always find two reliable sources providing substantial coverage for some topics that probably should have articles. Goodman seems to be suggesting that we have articles on topics people want information about regardless of standard notability, but with a twist: should there be a “Wikipedia Two” to satisfy the many non-notable college athletes and politicians whose fans and supporters would like to create articles about them. Plus, Goodman (DGG on Wikipedia) is a bit of a character, so that should be interesting, too.

♦     ♦     ♦

OK, I’ve got to race down to the GWU campus now if I’m to catch Gardner’s talk. Look for me on Twitter as @thewikipedian, and I’ll write more here soon!

My Wikitinerary: Day 2 at Wikimania DC

Tagged as on July 13, 2012 at 2:19 am

Wikimania logoWikimania Day 1 is on the books, and it was a busy one. Mary Gardiner’s keynote delivered on the mostly-male Wikimedia community’s promise that they care about female participation (and as many noted, the female presence at Wikimania is very strong) while Jimmy Wales fulfilled his role as the conference touchstone, while adding a dose of levity, or two.

Although, did anyone else notice he was credited as “Founder” of Wikipedia and not “Co-founder”? Well, I did.

My coverage of the first day of the conference was doled out in 140-characters-or-fewer bursts on Twitter as @thewikipedian, and so it will be on subsequent days.

As to the first subsequent day ahead: as much as I’d like to give my full day over to Wikimania, regular readers will know that I live here, and Friday I’m still basically on the clock. So I may not get to all the sessions I would like. But here is what I’m hoping to attend:

♦     ♦     ♦

9:00 – 10:20

Time will tell if I make it to the first of the breakout sessions. If I do, it will probably be:

Title: Ask the Operators
Speaker: Leslie Carr, Ben Hartshorne, Jeff Green, Ryan Lane, Rob Halsell
Category: Technology and Infrastructure
Description: Just what it sounds like, a chance to ask the people who keep Wikipedia up and running about how it works, their jobs, and apparently… unicorns? I doubt this session will actually be dominated by bronies, but if it is, then I concede I have been sufficiently warned.

I may also attend:

Title: Giving readers a voice: Lessons from article feedback v5
Speaker: Fabrice Florin
Category: WikiCulture and Community
Description: I missed his presentation on new tools yesterday, and I’m intrigued by this as well. Good feedback is hard to come by, as a Wikipedia editor, and I’m curious to find out how those most involved think the current feedback tool is working. When I wrote about it last year, I was skeptically optimistic.

10:50 – 12:10

If you’re keeping score at home, it seems that I am most interested in the “WikiCulture and Community” sessions, and why shouldn’t I be? The Wikipedian tries to be about making Wikipedia’s goings-on understandable to the non-editor, so this track is a natural fit.

Title: Wikipedia in the Twitter age
Speaker: Panel moderated by Andrew Lih
Category: WikiCulture and Community
Description: How does Wikipedia handle the fast pace of information in the Twitter age? Can Twitter be a reliable source? (I think the correct answer is: generally, no.) The role Twitter played with Wikipedia in the 2011 Egyptian revolution and other breaking news events will be discussed here. And I’m always a fan of Andrew Lih’s take on Wikipedia.

13:10 – 14:30

One of the panels I wanted to see yesterday was rescheduled last-minute for this time period, and I very well may still try to check that out. But I’m also fascinated by this one:

Title: Eternal December: How awful arguments are killing the Wiki, and why not to make them
Speaker: Oliver Keyes
Category: WikiCulture and Community
Description: For good or ill, Wikipedia is a place that many people go to argue about all kinds of things—some very important, and others not so much. This talk will cover the resistance and curmudgeonliness of “Power Editors” and how they prevent the implementation of new developments on Wikipedia and discourage newbies from contributing.

There are other good panels in this time slot, so room-hopping again is a thing I would like to try, although on day one I found it a challenge. If I manage, I like:

Title: Hey, its trending! Let’s update that Wikipedia article!
Speaker: Arkaitz Zubiaga, Taylor Cassidy, Heng Ji
Category: Research, Analysis and Education
Description: This one is a discussion of a possible system that suggests revisions for Wikipedia based on Twitter activity; much Wikipedia editing activity is driven by the news, and Twitter often breaks news before the media has had a chance to write a full story. The panelists will outline goals, details of the system and progress of this research project.

Title: Bots and Wikipedia: It’s OK to be lazy!
Speaker: Gaëtan Landry
Category: Technology and Infrastructure
Description: Although I lack the technical skills to write a real software program myself, I love me some bots. I.e. automated programs that wander around Wikipedia making changes based on an algorithm—fixing common misspellings, reverting obvious vandalism, and the like. The submission says it won’t be highly technical, which is probably good for yours truly.

15:10 – 16:30

I said above that Friday will have to be a working day for me, and it’s very possible that I’ll cut out in the afternoon to wrap some things up for the week. But if I’m still around, I think I may visit:

Title: Refighting the War of 1812 on Wikipedia
Speaker: Richard Jensen
Category: WikiCulture and Community
Description: From the description: “This year is the bicentennial of the War of 1812, and my presentation will examine how Canadian and American editors have handled the war in the main article. Sometimes they re-fought the war, as they balanced scholarship/RS and patriotism in a quest to tell the world what really happened.” I can go in for that.

♦     ♦     ♦

One last shameless plug: and if you’re not following me as @thewikipedian on Twitter, then you’re missing out on a lot of interesting tweets, including some very smart people that I am dedicating, and some things that I hope other people are smart.

I’ll see you there in a few hours!

My Wikitinerary: Day 1 at Wikimania DC

Tagged as on July 12, 2012 at 5:15 am

Wikimania logoIn a few hours, the first day of general activities at Wikimania—the official annual conference of Wikimedia Foundation—begins right here in Washington, DC. It is a global conference, in fact this is the first time Wikimania is being held in the United States since 2006, when it was hosted on the Harvard University campus in Cambridge, Massachusetts.

This year, it just happens to be outside my front door. What’s more, it is being held on the campus of the George Washington University, precisely where I launched this very blog at a (much smaller) conference in March 2009.

So: it’s a big day ahead—big weekend, but I have to focus for now. A review of the official schedule reveals an almost overwhelming number of events. After reading through the various panels and presentations, I think I have a pretty good idea of my day ahead, which I’d like to share here now:

♦     ♦     ♦

09:00 – 11:10

The only place to be, indeed the only official event at this time, is the opening ceremony, keynote and plenary. Most of the wider media attention that Wikimania generates will be probably be focused on Wikipedia co-founder and unofficial mascot Jimmy Wales on “The State of the Wiki”, but I’ll be interested to see the opening keynote by Mary Gardiner, an Australian computer programmer who is also a leader in “increasing participation of women in open technology and culture”. Wikipedia editors have long skewed heavily toward men, but in recent years more attention has been focused on how to change that. I am a skeptic—Wikipedia is hardly alone in this fact, particularly among technology tactics—but I am also interested in hearing what she says, on this very high-profile stage for such a topic.

11:40 – 13:00

Here the breakout sessions begin, and it is truly a poverty of riches from a Wikipedian perspective; there is too much to possibly take all in. What follows is an estimation of the panels I am likely to check out:

Title: “This is my voice”: the motivations of highly active Wikipedians
Speaker: Maryana Pinchuk, Steven Walling
Category: WikiCulture and Community
Description:One of the most common questions I am asked about Wikipedia, and also one of the hardest to answer with anything but anecdotal evidence, is why Wikipedians do what they do. Pinchuk and Walling have interviewed some of the most active Wikipedia editors to study the motives behind why they participate. Intriguingly, their submission includes the following teaser: “Note: after this talk, we will be making a special piece of conference swag available to any interested Wikipedians which will let them show off their own motivation for editing.”

Title: Engaging editors on Wikipedia: A roadmap of new features
Speaker: Fabrice Florin
Category: Technology and Infrastructure
Description: This talk will discuss new features on Wikipedia that make it easier to edit and the impact this will have on attracting new editors and retaining current ones. This follows a 20-minute talk so if I leave right after the Pinchuk / Walling’s talk and sneak in quietly I can probably catch most of this. At least, I presume this will work. You can never really tell how a conference will work until one arrives.

14:00 – 15:20

Title: A talk page is a broken message wall: Building a more efficient communication
Speaker: Danny Horn, Tomasz Odrobny
Category: WikiCulture and Community
Description: These days, I spend more time on Wikipedia’s discussion pages than I do editing the encyclopedia itself, so I am extremely familiar with how these pages work—and how they don’t. This presentation will demonstrate a new talk feature that will make it easier to track conversations you are interested in without receiving watchlist notifications about topics you don’t actually care about. Interesting! Although Wikipedia has put much more public attention on a forthcoming WYSIWYG editor, I think this could actually be a bigger deal. If it works, of course.

The above talk is followed by another one that I find fascinating for exploring the insider-outsider dynamic around Wikipedia, featuring the presenters from the first breakout session:

Title: Welcome to Wikipedia, now please go away? improving how we communicate with new editors
Speaker: Steven Walling, Maryana Pinchuk
Category: WikiCulture and Community
Description: On Wikipedia, veteran editors run across the same kind of activity by new editors so often that they have developed a deep reserve of templated messages—some friendly, many unfriendly. According to the session’s topic page, “On English Wikipedia and many other projects, automated warnings and welcomes currently make up about 80% of first messages to new editors.” Wow. I had not thought about it before, but it makes complete sense. I’ll be curious to see where the state-of-the-art thinking is on this topic.

15:40 – 17:00

For the final breakout session, there is one long sustained discussion of Wikidata that I am awfully tempted to spend my time at, but there is another talk that I find interesting within this period:

Title: How Wikidata fits into the global web of data; Wikidata implementation and integration; Wikidata as a platform
Speaker: Denny Vrandečić; Daniel Kinzler; Jeroen De Dauw
Category: Technology and Infrastructure
Description: What is Wikidata? Indeed, what is it precisely. It is only the most ambitious new Wikimedia Foundation project to launch in recent years. As the first panel description says: “Wikidata’s goal is to move the rich structured data currently encoded in Wikipedia templates into a central repository, which will be available for re-use on all Wikimedia projects, but also to 3rd party services. We will introduce what Wikidata aims to do and how: centralizing language links, centralizing data for the infoboxes, and all of that in the first new Wikimedia project since 2006.” Yeah, that’s not too ambitious. The first talk appears to be more of an overview and the two following it seem to be more technical.
Location: Grand Ballroom
Length: Each talk is 25 minutes

Title: Wikimedia relations with government, lobbying and public relations
Speaker: James Forrester, Philippe Beaudette
Category: Wikis and the Public Sector
Description: If Wikidata gets too technical for me, I’ll be heading over to this panel. In my professional life public relations is one of my primary activities, often involving Wikipedia—as I have written about before—and so I will be very interested to see where this discussion goes. If there is any presentation where I am likely to participate, this may be it, depending on where the discussion goes. Why not come find out?

♦     ♦     ♦

And that is the end of the official activities for the day. More events stretch into the evening, but I won’t be at them. Tonight, Roger Waters brings The Wall to the Verizon Center, which I will be seeing with a friend from high school and college in town just for this event. Actually, we’ll be seeing this if StubHub and FedEx combine to deliver these tickets during the day today, which they have so far been rather slow about.

I know… this has nothing to do with Wikipedia. But it’s highly relevant to my day ahead at Wikimania. Fingers crossed everything works out! Meantime, I will be tweeting the day’s activities from my @thewikipedian Twitter account, so please follow! And if all goes well I will post tomorrow’s wikitinerary here soon.

Two Wikipedia Co-Founders, Two Very Different Causes

Tagged as , , , , ,
on June 29, 2012 at 3:58 pm

The Wikipedian has been occupied with other projects, and fairly quiet as of late. The good news is that, with the Wikimania global conference just around the corner, I’ll be writing more here in the near future. And I really do mean just around the corner: Wikimania 2012 will be held in the city I call home, Washington, DC.

Meanwhile, here’s something I’ve noticed that I don’t think other Wikipedia commentators have remarked upon: the divergent activism of its two co-founders, its still closely involved spiritual leader and unofficial mascot Jimmy Wales, and estranged, erstwhile rival Larry Sanger. Although both men might be broadly described as libertarian—as legend has it, they first met on an Internet discussion forum for Objectivists—and yet their causes today are all but diametrically opposed.

In the last week, Wales has publicly opposed U.S. Department of Justice plans to extradite a British student, Richard O’Dwyer, for (allegedly) knowingly enabling copyright violations by users of a website he once operated (since shuttered). Although based in the UK, O’Dwyer’s domain was registered in the U.S.—hence the federal government’s interest. Wales’ point, made in a Guardian op-ed:

One of the important moral principles that has made everything we relish about the Internet possible, from Wikipedia to YouTube, is that Internet service providers need to have a safe harbour from what their users do.

A fair point? Sure. Self-serving? Most certainly! Wikipedia is always making someone mad because anonymous individuals use the site to spread malicious, sometimes defamatory, occasionally offensive material, true or false. In fact, someones like… none other than Larry Sanger.

In recent months, Larry Sanger has has taken up a more conservative cause, focused on some of Wikipedia’s more controversial content. Sanger is critical of Wikipedia for allowing the inclusion of sexually explicit photos on articles about sexually explicit topics, and moreso Wikipedia’s sister site Wikimedia Commons, for allowing users to upload even more graphic photos, many of which serve no purpose except to titillate the uploader, and disgust most others. Here’s an exhaustive report by Internet buzz beacon BuzzFeed, on one such example (highly NSFW, even with blurring).

Wales remains squarely within the camp of Internet libertarians, lending support to those who do things we may not like, but whom we may defend on principles of freedom. It is also consistent with his previous activism against U.S.-based SOPA and PIPA legislation, which I wrote about in January.

From a Wikipedia perspective, the key difference is this: in this case, Wales is seeking to use only his celebrity (which is considerable, in Internet terms) to draw attention to his cause, rather than enlisting the power of Wikipedia’s community as a force multiplier. The matter has been the subject of much discussion on Wales’ Talk page (basically a water cooler for Wikipedians) this week, led by the following comment:

As someone who strenuously opposed the political advocacy pursued by the Wikimedia Foundation early this year … I commend your decision to take action on the O’Dwyer case as Wikipedia founder and respected opinion leader as opposed to (additionally) trying to light a fire under the editing community.

Sanger has far less celebrity to wield (even in Internet cricles). Earlier in June, Sanger was interviewed by TechCrunch to discuss these topics, and as he said in a tweet aimed partially at yours truly:

Wikipedia, choose two: (1) call yourself kid-friendly; (2) host lots of porn; (3) be filter-free.

Not a bad point there, either.

I don’t mean to wade into this controversy myself. I find myself largely in agreement with both men on some broad points, contradictory as that may seem, although I think the long-run implications of both issues are more difficult to assess.

As for reservations about Wales’ petition: are we to be ISP freedom absolutists? Is there no “fire in a crowded theater” moment? As for reservations about Sanger’s cause: how are we to determine what serves a genuine informational purpose, and how do we balance this against Wikipedia’s longstanding and admirable policy that it is “not censored”?

I don’t know the answer, but if you think you do, I welcome your response in the comments.

30 for 30 (Divided by Ten)

Tagged as , ,
on May 25, 2012 at 4:46 pm

In Slovenian, the traditional name for the month of May translates to “the month when plants grow”, or so Wikipedia tells me. It’s apt then—if not altogether insufferable as a metaphor—that this month three “seedlings” I recently planted have all blossomed.

Horse ebooks Wikipedia article

First up is something I am almost embarrassingly prideful about: that I will go down in history as the person who created the Horse ebooks Wikipedia article. (The what, you ask? Read the article!)

Considering the meteoric rise to Internet fame of the Horse ebooks Twitter account—without a doubt, the most followed and most beloved Twitter spam account of all time—it’s rather surprising that when I first looked in late April, no such article existed. So, I wrote it. The article debuted on May 5, and graced Wikipedia’s front page with its presence—in the “Did you know” section—on May 12.

Read the Wikipedia article, follow the Twitter account, and then buy the T-shirt (note: I have no deal with the sellers, except that I did buy the shirt). And then take sides in the debate over whether the magic is gone since its automation became a subject of disagreement.

Best of Wikipedia Sandbox Tumblr screenshot

Another fun project that has taken off this month is The Best of Wikipedia:Sandbox, a Tumblr account.

There are many Tumblrs like it, but this one is mine. In fact, there are other Tumblrs about Wikipedia, including Best of Wikipedia, [Citation Needed], and—if you know Tumblr, you know this is coming—Fuck Yeah Wikipedia!

Ostensibly Wikipedia:Sandbox is a place to test edits and check formatting, but that’s not all it gets used for. What started out as a joke among my colleagues—sharing screen caps of the ridiculous things we’d seen in the Sandbox—has transformed into a Tumblr to share these largely unknown and unappreciated comic gems with the world. The Sandbox is an unlikely repository for strange world views, faceplam-worthy test edits, and—since this is the Internet—cat pictures.

I’ve saved the biggest announcement for last… this month I launched what is essentially my second Wikipedia-related website: Beutler Wiki Relations. Yes, it’s a business website.

Although I rarely write about my consultancy much here, close readers of The Wikipedian are likely aware that one of my professional focuses (focii?) is helping brands, companies and individuals work constructively with the Wikipedia community to improve articles. I’ve never sought to draw attention to this—and indeed, when I appeared on C-SPAN this January, the subject only came up briefly. But I feel like it’s worth posting a simple website explaining myself to skeptical Wikipedians and, sure, potential clients alike. Closer readers may recall the phrase “wiki relations” from my post about the Bell Pottinger mess, and how it could have been avoided.

Although “conflict of interest” and “paid advocacy” on Wikipedia remain contentious topics, I think it’s more important than ever to make them seem less mysterious. It won’t stop the Bell Pottingers, but it may stop people from hiring them to mess with Wikipedia.

And yes, I realize I have a conflict of interest in saying that. Can’t avoid it; might as well own it. Or as Horse ebooks says: “Discover the usefulness of wax.”

Disambiguate This!

Tagged as , , , , ,
on April 17, 2012 at 1:05 pm

If the Wikipedia article titled “Wikipedia in culture” is to be believed, the free, online encyclopedia’s primary contribution to popular culture is as a humorous reference, particularly in U.S. cable television programming.

Topic-wise, sometimes the joke relates to Wikipedia’s uneasy relationship to education, including T-shirts featuring leaping graduates thanking Wikipedia. More often than not, Wikipedia’s uneven reliability is the joke, such as The Onion’s classic 2006 article: “Wikipedia Celebrates 750 Years Of American Independence”.

If it has had any noticeable linguistic impact (aside from debate over the meaning of “Santorum”) it is probably in the phrase “Citation needed”. But the word that I wish Wikipedia could popularize is:

Disambiguation

It’s a perfectly cromulent word, and can be found in the dictionary (or at least on Dictionary.com), apparently dating to the 1960s, and unsurprisingly means:

to remove the ambiguity from; make unambiguous

And yet it’s not a word that I can recall having seen prior to Wikipedia, even though I have a degree in English and very nearly earned one in journalism. In a world of ambiguity, what more could we want than disambiguation to help us understand what’s real, and what matters? Well, maybe therein lies the problem: there are no easy diambiguations in the real world. But are they so easy, even on Wikipedia?

If you don’t know what disambiguation is, it’s pretty simple. Wikipedia has articles about many people named John Smith, most real and even some fictional. So many, I’m not even going to bother counting. Because no John Smith is considered vastly more famous than the other, none of them gets this URL:

Nope, that’s the disambiguation page, where one can find, among many others:

And, for fans of The A-Team, there is also:

In many cases, a word will have one primary meaning, and then multiple secondary uses. This is when the parenthetical expression “(disambiguation)” comes in. One example:

Typically, articles requiring some form of disambiguation require a “disambig” note at the top of the page (called a “hatnote”). Frequently, the phrasing is “Not to be confused with…” and here is one example, which I enjoy more than most:

McGraw-Hill disambiguation

As you may expect, there is a lengthy guideline detailing how disambiguation pages are to be governed. But on a website where not everyone knows the rules, nor does everyone agree about the relative importance of similarly-named subjects, there can be some glitches. This is especially true when one is being implored by unknown advisers “not to be confused by” a deceptively unrelated topic.

One errant disambiguation comes to mind immediately, because I’m the one who undid it.

First, Bob Dole should well-known to any American over the age of 25, if not for being the Republican presidential nominee in 1996, then perhaps for that one Pepsi ad with Britney Spears. Meanwhile, Robert Dold is a U.S. congressman from Illinois, whom I had never heard of until very recently, although I live in DC and have worked in and around U.S. politics for a decade. (Dold has only been in Washington since 2010, so there’s that.)

Then what explains the admonition not to confuse this:

With this:

Yeah, I didn’t get it either. So I removed the unnecessary disambiguation from Dole’s page, and I seriously doubt anyone has been wondering “What about Bob (Dold)?

There are other interesting unbalances, however often more justified. As I recently tweeted:

Joe Plummer vs. Joe the Plumber on Wikipedia

Indeed, compare this:

With this:

But I’m sure that’s right. Joe the Plumber is far better known, following his stint as the semi-official mascot of John McCain’s 2008 presidential campaign, than is Joe Plummer, who is probably a swell guy and earns bonus points from me for being from Portland. And with Mr. the Plumber now the Republican nominee to challenge Rep. Marcy Kaptur this fall, it’s looking even dimmer. Sorry, Joe (the Plummer).

But in the world of interesting disambiguations, undoubtedly this one is my favorite:

At least it doesn’t tell you to not to be confused.

The Agony and Ecstasy of Wikidata

Tagged as , , , , , , , , ,
on April 12, 2012 at 8:31 am

Although Wikipedia is by far the best-known of the Wikimedia collaborative projects, it is just one of many. Just this last week, Wikimedia Deutschland announced its latest contribution: Wikidata (also @Wikidata, and see this interview in the Wikipedia Signpost). Still under development, its temporary homepage announces:

Wikidata aims to create a free knowledge base about the world that can be read and edited by humans and machines alike. It will provide data in all the languages of the Wikimedia projects, and allow for the central access to data in a similar vein as Wikimedia Commons does for multimedia files. Wikidata is proposed as a new Wikimedia hosted and maintained project.

Possible Wikidata logo

One of a few Wikidata logos under consideration.

Upon its announcement, I tweeted my initial impression, that it sounded like Wikipedia’s answer to Wolfram Alpha, the commercial “answer engine” created by Stephen Wolfram in 2009. It seems to partly be that but also more, and its apparent ambition—not to mention the speculation surrounding it—is causing a stir.

Already touted by TechCrunch as “Wikipedia’s next big thing” (incorrectly identifying Wikipedia as its primary driver, I pedantically note), Wikidata will create a central database for the countless numbers, statistics and figures currently found in Wikipedia’s articles. The centralized collection of data will allow for quick updates and uniformity of statistical information across Wikipedia.

Currently when new information replaces old, as is the case with census surveys, elections results and quarterly reports are published, Wikipedians must manually update the old data in all the articles in which it appears, across every language. Wikidata would create the possibility for a quick computer led update to replace all out of date information. Additionally, it is expected that Wikidata will allow visitors to search and access information in a less labor-intensive method. As TechCrunch suggests:

Wikidata will also enable users to ask different types of questions, like which of the world’s ten largest cities have a female mayor?, for example. Queries like this are today answered by user-created Wikipedia Lists – that is, manually created structured answers. Wikidata, on the hand, will be able to create these lists automatically.

Though this project—which is funded by the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, and Google—is expected to take about a year to develop, but the blogosphere is already buzzing.

It’s probably fair to say that the overall response has been very positive. In a long post summarizing Wikidata’s aims, Yahoo! Labs researcher Nicolas Torzec identifies himself as one who excitedly awaits the changes Wikidata promises:

By providing and integrating Wikipedia with one common source of structured data that anyone can edit and use, Wikidata should enable higher consistency and quality within Wikipedia articles, increase the availability of information in and across Wikipedias, and decrease the maintenance effort for the editors working on Wikipedia. At the same time, it will also enable new types of Wikipedia pages and applications, including dynamically-generated timelines, maps, and charts; automatically-generated lists and aggregates; semantic search; light question & answering; etc. And because all these data will be available as Open Data in a machine-readable form, they will also benefit thrid-party [sic] knowledge-based projects at large Web companies such as Google, Bing, Facebook and Yahoo!, as well as at smaller Web startups…

Asked for comment by CNet, Andrew Lih, author of The Wikipedia Revolution, called it a “logical progression” for Wikipedia, even as he worries that Wikidata will drive away Wikipedians who are less tech-savvy, as it complicates the way in which information is recorded.

Also cautious is SEO blogger Pat Marcello, who warns that human error is still a very real possibility. She writes:

Wikidata is going to be just like Wikipedia in that it will be UGC (user-generated content) in many instances. So, how reliable will it be? I mean, when I write something — anything from a blog post to a book, I want the data I use in that work to be 100% accurate. I fear that just as with Wikipedia, the information you get may not be 100%, and with the volume of data they plan to include, there’s no way to vette [sic] all of the information.

Fair enough, but of course the upside is that corrections can be easily made. If one already uses Wikipedia, this tradeoff is very familiar.

The most critical voice so far is Mark Graham, an English geographer (and a fellow participant in the January 2010 WikiWars conference) who published “The Problem with Wikidata” on The Atlantic’s website this week:

This is a highly significant and hugely important change to the ways that Wikipedia works. Until now, the Wikipedia community has never attempted any sort of consistency across all languages. …

It is important that different communities are able to create and reproduce different truths and worldviews. And while certain truths are universal (Tokyo is described as a capital city in every language version that includes an article about Japan), others are more messy and unclear (e.g. should the population of Israel include occupied and contested territories?).

The reason that Wikidata marks such a significant moment in Wikipedia’s history is the fact that it eliminates some of the scope for culturally contingent representations of places, processes, people, and events. However, even more concerning is that fact that this sort of congealed and structured knowledge is unlikely to reflect the opinions and beliefs of traditionally marginalized groups.

The comments on the article are interesting, with some voices sharing Graham’s concerns, while others argue his concerns are overstated:

While there are exceptions, most of the information (and bias) in Wikipedia articles is contained within the prose and will be unaffected by Wikidata. … It’s quite possible that Wikidata will initially provide a lopsided database with a heavy emphasis on the developed world. But Wikipedia’s increasing focus on globalization and the tremendous potential of the open editing model make it one of the best candidates for mitigating that factor within the Semantic Web.

Wikimedia and Wikipedia’s slant toward the North, the West, and English speakers are well-covered in Wikipedia’s own list of its systemic biases, and Wikidata can’t help but face the same challenges. Meanwhile, another commenter argued:

The sky is falling! Or not, take your pick. Other commenters have made more informed posts than this, but does Wikidata’s existence force Wikipedia to use it? Probably not. … But if Wikidata has a graph of the Israel boundary–even multiple graphs–I suppose that the various Wikipedia authors could use one, or several, or none and make their own…which might get edited by someone else.

Under the canny (partial) title of “Who Will Be Mostly Right … ?” on the blog Data Liberate, Richard Wallis writes:

I share some of [Graham’s] concerns, but also draw comfort from some of the things Denny said in Berlin – “WikiData will not define the truth, it will collect the references to the data…. WikiData created articles on a topic will point to the relevant Wikipedia articles in all languages.” They obviously intend to capture facts described in different languages, the question is will they also preserve the local differences in assertion. In a world where we still can not totally agree on the height of our tallest mountain, we must be able to take account of and report differences of opinion.

Evidence that those behind Wikidata have anticipated a response similar to Graham’s can be found on the blog Too Big to Know where technologist David Weinberger shared a snippet of an IRC chat with he had with a Wikimedian:

[11:29] hi. I’m very interested in wikidata and am trying to write a brief blog post, and have a n00b question.
[11:29] go ahead!
[11:30] When there’s disagreement about a fact, will there be a discussion page where the differences can be worked through in public?
[11:30] two-fold answer
[11:30] 1. there will be a discussion page, yes
[11:31] 2. every fact can always have references accompanying it. so it is not about “does berlin really have 3.5 mio people” but about “does source X say that berlin has 3.5 mio people”
[11:31] wikidata is not about truth
[11:31] but about referenceable facts

The compiled phrase “Wikidata is not about truth, but about referenceable facts” is an intentional echo of Wikipedia’s oft-debated but longstanding allegiance to “verifiability, not truth”. Unsurprisingly, this familiar debate is playing itself out around Wikidata already.

Thanks for research assistance to Morgan Wehling.

Public Lives: Jim Hawkins and Wikipedia’s Privacy Dilemma

Tagged as , , ,
on April 6, 2012 at 9:15 am

Editor’s note: The author of this blog post is Rhiannon Ruff (User:Grisette), a friend and colleague, in what I hope is a continuing series. The Wikipedian published a previous guest blog post in December 2011.

Introduction to Jim Hawkins Wikipedia article.

As an occasional Wikipedian, I like to check out Jimmy Wales’ user Talk page every now and again; while user Talk pages are generally where editors leave messages for each other, notes of support, or even warnings, Jimbo Wales’ page is a hot-bed of intrigue, gossip and debate. It’s Wikipedia’s water cooler. And it’s the perfect place to go if you’re looking to find an example of the confusion that can result from the occasional collision of hot-headed editors, complex guidelines and individuals who are themselves the subjects of articles. Just today I came across a discussion that mentioned Jim Hawkins, a radio-presenter in the UK who has been struggling to deal with Wikipedia editors, and Jimmy himself, over privacy issues raised by his biographical article.

Contrary to what many people believe, the Wikipedia community and Wikimedia Foundation are very keen to protect individuals’ privacy. There’s a common misunderstanding that if you edit Wikipedia, anyone can find out who you are—an idea proliferated by media coverage of incidents where editors’ IP addresses were traced and companies outed for editing their own articles (or, worse, those of competitors). But there’s actually a simple solution: creating an account on the site hides your IP address when you edit. And as long as you only edit while logged into that account, there’s no way for anyone to find out who or where you are through your IP. There are also very strong rules against “outing” the real life identities of editors by posting their personal information on the site.

But what if you’re the subject of a Wikipedia article? Getting back to Jim Hawkins, here’s the real dilemma that people in the public eye are faced with: anyone can create an article about them, but how do they go about preventing their personal details from being included in it? Hawkins certainly wasn’t happy about the creation of an article about him, and he was even less impressed that it included details such as the county where he lives and his exact birthdate. He’s been trying to get the article deleted for five years now. Over time, his frustration in dealing with the Wikipedia community has led to increasing antagonism on both sides.

After a recent “edit war” where his birthdate was repeatedly added and removed, the date was removed once and for all after an official request was made on behalf of Hawkins. The edit was made in line with a privacy policy that allows subjects of biographical articles to request the removal of their date of birth from the site. But, the county remained and Hawkins continued to rail against the system on the article’s Talk page:

Why should the people who’ve been stalking, bullying and harassing me – and have been doing so again today! – have any say in what happens to the article?
Hooray for policies. Does common human decency come into this anywhere? Or am I going to get the same response I’ve had for five years, the borderline-fundamentalist ‘that’s not how Wikipedia works’?

In a lively discussion on Jimmy Wales’ User Talk page beginning on April 1, editors were divided over two issues:

  1. Should an individual who is on the cusp of notability (i.e. just about eligible for a Wikipedia article, according to guidelines) be allowed to choose whether or not they have an article?
  2. If personal information about a subject has been published in public sources, does it contravene Wikipedia’s privacy rules to include it in the article?

There’s no simple answer to either of these. The first one in particular is really rather tricky. It’s true that if an article about someone hasn’t been created, there’s nothing that says that it has to exist. If an article has been created, though, it isn’t clear whether there should be the option to delete if the subject isn’t very strongly notable. Wikipedians seem to fall into two roughly two camps on the issue: those with sympathy towards article subjects and those who are concerned with ensuring that information is available on Wikipedia, if sources exist to support it.

The main question that Hawkins raised was why there had to be an article about him, if he felt that it was unnecessary, inaccurate and infringed upon his privacy. At one point in discussion he asks:

Can I point out that the whole damn thing is an invasion of privacy?

And an experienced editor replies, summarising the crux of the issue here:

An invasion of privacy is, by definition, the release of private information. This information, however, is not private, but is stated by the subject in the very show he hosts.

So, the issue is: if information exists in the public sphere, why should it not be included in a Wikipedia article? The details are already out there, some editors argue, so adding it to a Wikipedia article can’t be infringing on the subject’s privacy as the information wasn’t private to begin with. The bright line that exists on Wikipedia is its governing principle of verifiability: information included in articles must always be verifiable, that is, they must be supported by reliable sources. So, if personal information about a subject isn’t supported by a reliable source—even if it’s true—it can’t be included. Unfortunately, as Hawkins has discovered, if the information does appear in a reliable source (in this case, in a local magazine and on the BBC website), whether it is included or not comes down largely to editors’ discretion.

In short, the lesson Jim Hawkins has learned the hard way is: if you don’t want something included in your Wikipedia article, make sure it isn’t published in the first place.

Death of a Wikipedian

Tagged as , , , , , , , , , , ,
on March 23, 2012 at 3:10 pm

Public memorials are a phenomenon found in every society and subset: from war memorials to police memorials and semi-permanent ghost bikes to impromptu, impermanent flower displays, mourning and remembrance are universal. Wikipedia is no exception.

Since early 2006, Wikipedia has maintained a public memorial page called Deceased Wikipedians. While public in the sense that it is accessible by anyone, it is perhaps useful to think of it as semi-public in that it’s not part of the actual encyclopedia. You won’t pass by it on your way to work, or to reading about (let’s say) the Syrian uprising. To date, 39 late Wikipedians have been added to the English version of this page. 14 other language editions have their own versions, including the German, French and even Esperanto editions.

The first added to the English-language Wikipedian memorial was Caroline Thompson, an Australian physics enthusiast who worked on articles about quantum mechanics. Afterward, other names were filled in. The earliest current listed was a French editor using the handle Treanna, who died in late summer 2005. Considering Wikipedia began in early 2001, surely some others passed before him, but we may never know who they were.

On a website where anonymity is granted to anyone who desires it, determining that an absent editor is deceased and not just one who has drifted away is a matter of luck, and sometimes detective work. The inclusion of an editor named Xulin depended on the synthesis of available information on external websites. As a contributor primarily to the French-language Wikipedia, a candlelight vigil of sorts remains in his userspace there.

Criteria for inclusion isn’t crystal clear, but the top of the page does give this advice:

People in this list are remembered as part of the Wikipedia community: they have made at least several hundred edits or are otherwise known for substantial contributions to Wikipedia.

The names included do not not appear to have been controversial to this point, although one stands out as different from the others: John Patrick Bedell, known less for his contributions as JPatrickBedell and more for his disturbing role in the 2010 Pentagon shooting (which I wrote about at the time: “John Patrick Bedell: Pentagon Shooter, Wikipedian”).

Two other deceased editors are the subjects of Wikipedia articles based on contributions to their fields outside of Wikipedia: Tron Øgrim, a Norwegian journalist and activist, and Steven Rubenstein, an American anthropologist.

The most recent addition is a young man named Ben Yates, better known around the site as Tlogmer, who passed away earlier this month. An active contributor from October 2003 to October 2008, he was known for several remarkable contributions to the community. This included the original design for the logo of Wikipedia’s annual gathering, Wikimania, still in use to this day. He was also a co-author on the book, How Wikipedia Works: And How You Can Be a Part of It, published in 2008 (free web version here). On a humorous note, he was the originator of the Wikipedia article “Metrosexual”. He also created some hilarious (to a Wikipedian) bumper stickers, which seem to be still available.

Of particular interest to me, he was also at one point the author of a blog about Wikipedia, simply called Wikipedia Blog. Yates’ self-selected favorite posts were three: “The Future of Open Source”, about Wikipedia and Linux; “Wikipedia helps show the economic value of social interaction”, about just what it sounds like; and “Wikipedia and COMMUNISM!”, ruminating on Wikipedia’s comparison to various “isms”. In the last one, he wrote:

Wikipedia will never fade away … its memories will not die with its members. As an open source project, it can always be forked, tweaked, sifted through various filters, read and written anew.

Very well said, and correct he was. So it goes.

Regarding the Uncertain Future of Encyclopædia Britannica

Tagged as , , , , , , ,
on March 14, 2012 at 5:01 pm

Yesterday, Encyclopædia Britannica made the startling announcement that they would discontinue their print edition after 244 years. Once the current edition has sold out, they’ll become a collector’s item. Which is essentially what they are now, if it’s not too uncharitable to point out. Britannica is not finished as an operation, however: it will continue to publish on the web. It’s a startling announcement, sure, but it makes more sense than if it went on as if nothing had changed. Britannica’s editors acknowledged as much in a post on their blog:

A momentous event? In some ways, yes; the set is, after all, nearly a quarter of a millennium old. But in a larger sense this is just another historical data point in the evolution of human knowledge.

But Britannica’s grip on the evolution of human knowledge isn’t what it used to be—you can see where I’m going, right? As a well-known quote from Jimbo Wales goes:

Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That’s what we’re doing.

Since its launch in 2001, and especially since a (much-debated) 2005 Nature article comparing the two, Wikipedia has been a thorn in Britannica’s side. And its influence has long since surpassed its much older rival. A Quantcast comparison suggests that Wikipedia’s traffic is 30x that of Britannica’s. And as I tweeted last night, news organizations have been quick to note the competition.

Under the title “Death By Wikipedia: Encyclopedia Britannica Stops Printing”, ReadWriteWeb observes:

The usefulness of such reference materials has been on the decline for years, especially since the advent of Wikipedia. Whatever flaws its open, crowd-sourced editorial model may invite, Wikipedia is generally regarded as a comprehensive and mostly-accurate source of information, which can be accessed for free.

And in a Venture Beat article titled “Encyclopaedia Britannica wiped out by Wikipedia, selling final print edition” we find:

The extremely thorough Wikipedia article on Encyclopaedia Britannica … serves as the perfect example of why Wikipedia is coming out on top.

It’s true—Wikipedia’s article about Encyclopædia Britannica is very thorough. Britannica’s article about Wikipedia is not bad, but it is far more limited than Wikipedia’s article about itself, and Britannica has those annoying pop-up advertisements that do nothing for readers.

Yet Britannica president Jorge Cauz tells the The Washington Post:

This has nothing to do with Wikipedia or Google. … This has to do with the fact that now Britannica sells its digital products to a large number of people.

This is a little bit like Microsoft saying Windows 8 has nothing to do with the the iPad, merely the shift in consumer purchasing habits toward the tablet and mobile markets. That’s not to say the statement isn’t necessarily untrue, just that it’s complete. I don’t know a great deal about Britannica’s current business model, but it’s safe to say that non-print revenues have become far more important, as Britannica’s print sales have fallen. Whether they will succeed is another question; PC World and doesn’t think so, pointing out the closure of—speak of the devil—Microsoft’s online encyclopedia Encarta in 2009 (which I wrote about at the time):

Microsoft shuttered its digital multimedia encyclopedia, Encarta, in 2009, and the last trace of it, the online dictionary, closed last year. Encarta, though a digital product, was also made obsolete by Wikipedia’s free availability, constantly updated content and thousands of editors, contributors and volunteers from around the world.

At The Atlantic, expert on evolution and Bloggingheads impresario Robert Wright offers this (small) consolation:

Maybe, long after even the electronic edition of Britannica is gone, the idea of Britannica can remain for us what it once was for me–a kind of Platonic ideal that we aspire to evolve toward even if we can never reach it, something that has a kind of reality even if we can never touch it.

As someone who devoured Britannica in my school library when growing up, not to mention someone who relied on Britannica as a college student in the late 1990s (before Britannica added a pay wall)—much the same way as students today (notoriously) rely on Wikipedia —I’m sorry to see it go. But we no longer live in a world where a 30,000 page, 15-volume encyclopedia can be printed on an annual basis for profit. In fact, even Britannica sees itself as a collector’s item now; as Cauz tells the News Observer:

This is going to be as rare as the first edition, because the last print run of our last copyright was one of the smallest print runs.”

I’d love to own one myself, but at $1,395.00 for the “Final Print Edition”, I’m afraid I’ll have to pass. And perhaps Cauz is wrong; maybe the death of Britannica will be more like the Death of Superman.

Verifiability and Truth: What John Siracusa Doesn’t Get About Wikipedia

Tagged as , , , , , , , , ,
on February 2, 2012 at 6:50 pm

One of my favorite podcasts is Hypercritical, co-hosted by and principally featuring the thoughtful criticisms of John Siracusa, a sometime columnist for Ars Technica and Internet-famous Apple pundit. The show’s tagline calls it: “A weekly talk show ruminating on exactly what is wrong in the world of Apple and related technologies and businesses. Nothing is so perfect that it can’t be complained about.” Last week’s edition—“Marked for Deletion”—was about something far from perfect, but of great interest to this blog: Wikipedia.

If you want to listen for yourself, jump to about 1:11:55 (yes, more than an hour into the show) where Siracusa and co-host Dan Benjamin turn the discussion to Wikipedia. And a warning: this is going to be long. Consider it homage.

♦     ♦     ♦

Promisingly, Siracusa begins by asking his co-host to answer, if he can, “what Wikipedia is”. The answer is pretty good for an outsider: it’s a place for sharing information and collaboratively building a resource for (hopefully) accurate information on almost any topic. In general, this will do. But it’s not quite right, as Siracusa explains by recounting his personal experience of trying, in vain, to defend an article from deletion. With five years to reflect on it, Siracusa describes his efforts as a “prototypical example of someone who does not understand what Wikipedia is, proving that he does not understand what Wikipedia is.”

All of this is a way of getting to Siracusa’s fascination—one might say morbid fascination—with Wikipedia’s policy of “Verifiability”. The first paragraph of the policy says:

Verifiability on Wikipedia is the ability to cite reliable sources that directly support the information in an article. All information in Wikipedia must be verifiable, but because other policies and guidelines also influence content, verifiability does not guarantee inclusion. The threshold for inclusion in Wikipedia is verifiability, not truth—whether readers can check that material in Wikipedia has already been published by a reliable source, not whether editors think unsourced material is true.

Or as Siracusa summarizes it: “Something can be as true as you want it to be, if it is not verifiable, it doesn’t go in.” Well said.

He also discusses the related policy of “No original research”. This includes a good explication of the different types of sources that may or may not be used on Wikipedia: primary sources (original documents and first-hand accounts), secondary sources (news articles interpreting primary sources) and tertiary sources (encyclopedias and academic articles summarizing the former). This is advanced stuff, and for a longtime Wikipedian, it’s no small thrill to hear a smart outsider explain why secondary sources are preferred, and work through the fundamental policies of Wikipedia. Siracusa correctly observes: “Wikipedia is not a place where you write down stuff that you know. … Wikipedia writes about other people writing about things.”

Except here’s the thing: Siracusa understands Wikipedia’s core content policies. He just doesn’t like them.

In his particular example, a former standalone article called FTFF (here’s what it used to look like) didn’t survive the process not because it wasn’t true, but (he says) because it contained material that wasn’t verifiable, and constituted original research. This is partly true, but it owes more to a guideline that got only passing mention on the show (and, frankly, in the deletion debate): “Notability”, and specifically the “General notability guideline”. It’s closely tied in with WP:VERIFY and WP:ORIGINAL, and basically says that a topic must have sufficient coverage in secondary sources to be given its own standalone page. FTFF was not, and the result of the debate was to merge the topic to Finder_(software)#Criticism.

Anyway, this pedantry about WP:NOTE and WP:GNG doesn’t affect Siracusa’s main point: If something is true but unverifiable, he would like to see it included in Wikipedia anyway. Nor does it affect his corollary argument, that Wikipedia’s complex rules discourage many would-be participants.

He’s undoubtedly right about the second point: many people try to get involved with Wikipedia who have no idea what it’s really about, and they tend to have a really bad experience. Wikipedia struggles to explain itself to outsiders, and it probably always will.

As to the former, the problem is that he fails to grapple with the implications of the Wikipedia he describes, and this is disappointing. By privileging “truth” above “verifiability”, one gets the impression he’s describing a Rashomon-like Wikipedia where all possible viewpoints are explored, and somehow eventually Wikipedia just makes the right call. This assumes a lot, not least that contentious topics wouldn’t simply devolve into edit wars of unchecked aggression. In a world where Wikipedia aims for truth but eschews verifiability, there are no footholds upon which to steady an argument. There is no way to know what should be considered credible or otherwise.

At times it actually sounds like he’s advocating something that already exists: reliance on “Consensus” for determining how Wikipedia will address the topics it covers. Wikipedia policies and guidelines don’t cover everything, and this is where consensus steps in, however imperfectly. If you’ve ever wondered why there is sometimes an observable discrepancy in the depth or quality of coverage between topics, consensus is the big reason why, and moreso the self-selection that shapes consensus. The current, real-world Wikipedia refers to outside authorities as well as consensus among editors; Siracusa’s Bizarro World Wikipedia would jettison the former and rely solely on the latter.

Meanwhile, Siracusa ascribes Wikipedia’s Byzantine rule structure to Wikipedians’ desire for approval from educators and academics, which he thinks is holding back Wikipedia from what it could become. He repeatedly says “Wikipedia should be something different” and refers to “what’s different about online” but he never gets prescriptive and never actually says why the old methods are outmoded. He does say his Wikipedia would seek to “arrive at truth using every tool necessary” and would, for example, allow original research… but what then is the mechanism for (dare I say) verifying it?

At one point, Siracusa compares the popular, widely-viewed Ars Technica forums to a hypothetical low-circulation print magazine, and complains that the widely-read former site is an invalid source while the unpopular latter publication is acceptable. It’s true that Wikipedia does not necessarily take a populist approach to evaluating sources, but he’s far off the mark in his attempt to explain this: “They’re not cool with the old librarians, because they’re not paper.”

I hope that he was just being lazy and doesn’t actually think that Wikipedia editors prefer paper (if anything they actually prefer online sources, which are easier to check) but he completely misses a key dynamic that ties back to verifiability: the paper magazine with poor circulation at least will have editors who are presumed to care about fact-checking and accuracy. A web forum, however popular it may be, may have moderators, but that’s not the same thing as having an editor. A discussion group is not an editorial operation, period. The forum is a primary source, and so should only be used to support reliable sources.

There are, however, reliable web sources. One of them is the editorial side of Ars Technica; no less an authority than John Siracusa has been cited in approximately 150 different Wikipedia articles about the Macintosh and other technology subjects.

♦     ♦     ♦

I’m sorry to say this, but in the show’s last fifteen minutes, Siracusa pretty much descends into total incoherence. Here’s his summary statement, close to verbatim:

[There are] many flaws in verifiability and reliability of sources. It’s built on a foundation of sand. Notability, what’s a reliable source, those things become so key to making Wikipedia crappy or good, and those sands are constantly always shifting, you know? And so if Wikipedia was centered on truth and that was its final goal, yeah, it would have to include citations and verifiability and stuff like that, but there would never be any argument when the two are in conflict. You know, if you could prove that a series of events happened here, then you could say, well, it’s verifiable, it appeared in a reliable source, but it’s not the truth. And so therefore we should expunge that. Because the final goal of Wikipedia is truth. But the final goal of Wikipedia is not truth, it’s verifiability.

There would “never be any argument” about what is the truth? In the parlance of Wikipedia: [citation needed].

Look, this is an epistemological issue, one much larger than just Wikipedia. The reason Wikipedia’s goal is verifiability, not truth, is because verifiability is an achievable goal. In fact, verifiability is a necessary step toward establishing truth, as Siracusa at this point seems to acknowledge in his imagined alternate, truth-seeking Wikipedia.

It’s not that Wikipedia is actively hostile to the truth: it’s just agnostic as to what it might be. Wikipedia articles are like road signs; truth itself may be unknowable, and we may never arrive at our destination, but Wikipedia can point in the right direction. Wikipedia’s policies and guidelines are designed to make sure that its content does that, although it’s fair to acknowledge that it’s not guaranteed. But what is? And what is truth?

Anyway, there’s a user essay on Wikipedia called “Verifiability, not truth” that says this better than I am going to. Here’s the key point:

That we have rules for the inclusion of material does not mean Wikipedians have no respect for truth and accuracy, just as a court’s reliance on rules of evidence does not mean the court does not respect truth. Wikipedia values accuracy, but it requires verifiability. Unlike some encyclopedias, Wikipedia does not try to impose “the truth” on its readers, and does not ask that they trust something just because they read it in Wikipedia. We empower our readers. We don’t ask for their blind trust.

If you want to upset the old system and do something new, you actually do need to think through what should replace it. Siracusa never does.

If he thinks Wikipedia’s adherence to “old world” rules is driving away contributors, he should consider what the free-for-all alternative would look like. It isn’t a Wikipedia I would spend any time with, it’s not one that Google would be eager to rank so highly, and it wouldn’t be the most important reference site on the Internet.

Wikipedia Gets on its SOPA Box

Tagged as , , , , ,
on January 17, 2012 at 9:46 am

Wikipedia SOPA blackout announcement
The Wikimedia Foundation announced on Monday that the English-language Wikipedia will go offline for 24 hours, starting at midnight tonight on the East Coast, in protest of the Stop Online Piracy Act (SOPA) and a related bill, the PROTECT IP Act (PIPA). The move follows a similar protest by the Italian-language Wikipedia last year, protesting proposed anti-privacy laws in Italy.

Over the past week, volunteer Wikipedia editors debated the proposition and, ultimately decided to go forward. The decision was accepted by the Foundation, which will implement it late tonight. An official public explanation includes the following:

Over the course of the past 72 hours, over 1800 Wikipedians have joined together to discuss proposed actions that the community might wish to take against SOPA and PIPA. This is by far the largest level of participation in a community discussion ever seen on Wikipedia, which illustrates the level of concern that Wikipedians feel about this proposed legislation. The overwhelming majority of participants support community action to encourage greater public action in response to these two bills. Of the proposals considered by Wikipedians, those that would result in a “blackout” of the English Wikipedia, in concert with similar blackouts on other websites opposed to SOPA and PIPA, received the strongest support.

The decision is not one that all are happy about. After all, Wikipedia’s core content guidelines emphasize a Neutral point of view in its approach to encyclopedia topics, so isn’t this a questionable decision?

Just this morning, a participant on a Wikipedia-related discussion group wrote:

Now that we have taken the necessary first step to regard the English Wikipedia and other Wikimedia projects as high-profile platforms for political statements, we ought to consider what other critical humanitarian problems we could use our considerable visibility and reputation to address. We could draw attention to the crises in Sudan or Nigeria, drone attacks against civilians in Afghanistan, the permanent occupation of the Palestinian territories, the Iranian effort to develop nuclear capabilities, police misconduct in virtually any country, the treatment of women and women’s rights in Saudi Arabia and elsewhere, and the list could go on and on.

Well, considering that it was a matter of debate, it surely is questionable and does not reflect the views of all Wikipedians. But I think it’s also fair to say that it reflects the majority of participants.

Wikipedia has its philosophical roots in the free software movement, which is the very antithesis of what SOPA and PIPA are about, so this particular viewpoint should surprise no one. Meanwhile, Wikipedia is well aware that it has its own systemic biases and has organized a project to answer them. In this case, however, Wikipedia’s bias shows through and most participants find this to be a good thing.

I’ll have to put myself more in the skeptic’s camp—not because I support SOPA, which I’m pretty sure I don’t—but because I would prefer that Wikipedia not become a platform for political activism. That said, I don’t think it will lead to similar efforts in the near future and, considering it’s already received significant news coverage, I think there is no question it will be effective in raising awareness about the issue.

For Wikipedians who are uncomfortable with the effort, there’s not much else to do. The band they’re in is playing a different tune, and we’ll see you on the dark side of the Wikipedia blackout.