William Beutler on Wikipedia

Archive for the ‘Wikipedia Statistics’ Category

Why can’t we have a better Wikipedia dialogue?

Tagged as , , , , , ,
on January 17, 2013 at 10:38 am

Earlier this week, Wikimedia executive director Sue Gardner explained how Wikipedia works (and sometimes doesn’t) in a Los Angeles Times op-ed:

Our weakest articles are those on obscure topics, where subtle bias and small mistakes can sometimes persist for months or even years. But Wikipedians are fierce guardians of quality, and they tend to challenge and remove bias and inaccuracy as soon as they see it.

The article on Barack Obama is a great example of this. Because it’s widely read and frequently edited, over the years it’s become comprehensive, objective and beautifully well sourced.

Using the Barack Obama article is cherry-picking, but it’s true: articles are generally as good as they have contributors for them. Yesterday the Times’ Letters section published a response from a (wait for it) high school teacher, arguing against taking Wikipedia seriously:

Why use Wikipedia when library databases such as Proquest and Opposing Viewpoints, which contain PDF files of peer-reviewed, scholarly articles, are available? When given a choice between an article written by an unknown Internet user and one written by an expert, shouldn’t the choice be obvious?

Wikipedia is the lazy researcher’s source of information. It’s useful for a quick answer to a trivia question or resolving a bet, but it should not be used for serious research.

I thought we stopped arguing about the content of Wikipedia as a source of information awhile back, with the standard reply “look to the sources used as references,” but apparently that hasn’t got around the school district yet.

The problem is that they’re both right as far as it goes, and we don’t really know how far that is.

Maybe what we need to figure out is: what’s the proportion of well-developed, well-cited articles to mediocre-to-worse articles covering important subjects, and how do we determine what that means and how to measure it? What this debate needs is some empirical data.

Wikipedia Didn’t Kill Britannica—It Saved the Encyclopedia

Tagged as , , , , , ,
on December 11, 2012 at 11:40 am

Mary Meeker is a venture capitalist associated with the famous Silicon Valley VC firm Kleiner Perkins who is—as Wikipedia describes her—“primarily associated with the Internet”. Indeed, her annual “Internet Trends” report is highly anticipated in the Valley. Her 2012 report is no different, and it includes a couple of slides focused on Wikipedia vs. Britannica (see also: “Regarding the Uncertain Future of Encyclopædia Britannica”, March 14, 2012). Here’s the important one:

My first reaction, as I tweeted last week, was to be fairly unimpressed:

But looking at it again, it’s quite obvious that for all the discussion of Wikipedia “killing” Britannica, this is not the case at all. First of all, as Wired’s Tim Carmody correctly observed earlier this year, Britannica’s sales began to falter with the introduction of Microsoft Encarta in 1993. If Meeker’s numbers are accurate, then the debut of Wikipedia in 2001 had no impact whatsoever on Britannica’s declining fortunes. Nor does Britannica’s downward slope appear to have accelerated with the rapid adoption of the Internet from the late 1990s onward.

The y-axis of Meeker’s chart, if anything, downplays Wikipedia’s ubiquity compared to Britannica’s sales. Being logarithmic scales charting different numbers, truth be told, I think it’s kind of a terrible chart, but it’s still readily apparent that Wikipedia is vastly more accessible to readers than Britannica ever was. Anecdotal evidence obviously supports this: I’ll bet anything you look at Wikipedia more now than you ever did Britannica, and there are millions who never had access to Britannica before, but can read Wikipedia now.

One thing I would have liked to see here is Britannica.com’s online traffic; writing as one who was in college during the late 1990s and used Britannica.com when it was a free resource, I’d imagine its true relevance nosedived when the site erected a paywall sometime around the year 2000, not that this would necessarily influence print sales.

The bottom line is clear: Britannica’s failure and Wikipedia’s triumph have nothing to do with one another, apart from the inexorable migration of information from analog to digital, and from physical to cloud-based storage. And here is the vastly more interesting trend question: what will eventually replace that?

For the full Meeker report, click here.

This Wikipedia Article Is Not Yet Rated

Tagged as , ,
on September 7, 2011 at 1:18 pm

Even if you’re a very casual Wikipedia reader (which I assume is not the case, or you wouldn’t be here right now) you might have noticed a few new features* at Wikipedia in recent weeks and months. Most noticeably, the Article Feedback Tool, pictured below.

And it takes a single click to see the ratings on a given article. In the following example, a number of readers have already expressed their opinion of the (very short and currently unreferenced) article about the new Clap Your Hands Say Yeah album, which isn’t supposed to be released until later this month (thanks, Spotify / BitTorrent!).

It’s not entirely clear what the long-range prospects for the tool may be. Unlike flagged revisions, it isn’t slated for a vote and approval or removal; indeed, it’s now listed on every Wikipedia article that you visit, and it will continue to be for the indefinite future.

But that doesn’t mean it will necessarily remain static. An invitation to “please take a moment to rate this page” has already been changed. More questions are surely in store, especially as some very good questions have been raised, such as who’s to say what it means to be “highly knowledgable” in a given subject area?

Certain aspects of its implementation, though, are quite clever. For example, any rating assigned to an article that itself may change often cannot be considered good for long, right? This has been anticipated: ratings expire after 30 edits have been made on a given page, and if you’ve rated a page before, you can re-rate it then.

Some Wikipedians have also asked for a statistical tool charting the data over time, which would be very cool to see. Like most Wikipedia projects, all information captured is available through its API, so anyone could build one if they wanted. A good example of this kind of ad hoc service is User:Henrik’s Wikipedia article traffic statistics tool.

Meanwhile, it also opens a new Pandora’s box for Wikipedia (as if it didn’t already have plenty). Perhaps the biggest concern ahead is that the ratings can be gamed; as Liam “Wittylama” Wyatt (known particularly for his work with the British Museum) has pointed out, the top-rated article (4.9 out of 5 stars) is something called the VAD 43 MRC Klang Chapter. About which, well, have a look for yourself.

I think the concept of article ratings is an idea whose time is coming, if that time is not yet now. These ratings have a long way to go before they should be considered a barometer of anything. It’s a good start, but still just that.

*The other is one asking how you feel about editing Wikipedia, complete with a choice of smiley and frowny faces, but I haven’t seen it lately.

Is Wikipedia “Slowly Dying”?

Tagged as , , , , , , , ,
on August 5, 2011 at 11:27 am

Here’s a provocative blog post from Gawker’s Adrian Chen yesterday: “Is Wikipedia Slowly Dying?”. It’s based on a provocative comment by none other than Wikipedia’s Jimmy Wales at Wikimania, the annual conference for Wikipedia and its sister wiki sites. Of course, that’s not quite what Wales said, but the Associated Press story Chen’s post is based on is not so far off:

“We are not replenishing our ranks,” said Wales. “It is not a crisis, but I consider it to be important.”

Administrators of the Internet’s fifth most visited website are working to simplify the way users can contribute and edit material. “A lot of it is convoluted,” Wales said. “A lot of editorial guidelines … are impenetrable to new users.”

It’s also not a new concern. In March the Wikimedia Foundation published its latest study of editor participation, showing a decline in editor participation compared with a couple years ago, although it certainly still has more contributors than a couple years before that. In my post on the subject, “Trendy Thinking: Contemplating Wikipedia Contributorship”, I included a Wikimedia-generated chart that shows what Wales is talking about:

From 2001 through 2006, participation grew exponentially, slowed at its peak in 2007, and has decreased at a steady rate in the years since. A number of theories have been floated to explain the decline. Via the AP, Wales offers a very common one: with almost 3.7 million articles in the English-language edition, the project of buiding Wikipedia has mostly already been done. But he also offers one that I hadn’t really considered before:

Wales said the typical profile of a contributor is “a 26-year-old geeky male” who moves on to other ventures, gets married and leaves the website.

There is some evidence for this in the survey results. Turn to page five of an earlier survey report (PDF) and you’ll see that more than 75% of editors (technically, survey respondents who called themselves editors) are younger than 30, and of the remaining quarter, half again are in their thirties. It may be that only 12.5% of Wikipedia editors are older than 40.

This situation points toward a perhaps unlikely but perhaps untapped editor group: retired persons. In fact, it was my expectation to find a higher percentage of older editors—something like a reverse bell curve—showing greater participation by the young and old, with those in the middle with careers and young children contributing less frequently. In my personal experience on the site, some dedicated editors—some of the best, in my estimation—are middle aged or older. Yet the survey plausibly explains why they are statistically less common:

The last group is characterised by the fact that its members started to use / contribute to Wikipedia at a comparably old age. However, since the age range of this group is very broad, it covers persons that grew up with the Internet as well as persons that had to learn to use new media past their school and university time.

Someone who was 39 when Wikipedia was created is now 49 or 50, and actuarial realities will continue to produce a general population that is ever-more Internet-savvy, and therefore ever-more inclined to edit Wikipedia. That is to say, those who were once young editors may return as old editors.

Back at Gawker, the comment section offers another complaint to which Wales only alludes. The pseudonymous SoCalMalaise writes:

I used to write and edit Wikipedia a lot. Some long articles are almost entirely written by me. It was a way to fine tune both my research and writing skills and enjoy the novelty of writing something that thousands (millions?) of people read. But soon I found that your work is frequently stifled by so-called “administrators” who are usually high school or college students with sub-par research and writing skills. These trolls have created a Kafka-esque labyrinth of self-contradictory “policies” and “guidelines” that they used to remove sentences, paragraphs, sections or even entire articles that skilled writers have volunteered to put down. They cherry-pick various parts of their rules as an excuse to act out their God complexes and strike out content. … And I’m not talking about a few bad apples. These people are everywhere! The whole writing-for-Wikipedia thing became very frustrating and just not worth my time.

It’s difficult to generalize from any one person’s experience, and who knows what common-but-non-obvious mistakes SoCalMalaise might have made, but the sentiment is certainly not unheard-of.

Thing is, for every complaint about overzealous editors and sticklers for arcane rules, there’s a complaint about uninformed editors who show little respect for common-sense rules. I have to admit, I’m more of the latter complaint—it is sticklers for policies and guidelines who enforce a minimum level of quality required for new additions, and therefore maintain a semblance of article quality. Myself, I spent a lot of time learning how Wikipedia works. It took several years before I was able to contribute at a high level, creating new entries or significantly improving existing ones. I am polite when I find someone is doing it wrong, although I know also that some are not.

Meanwhile, the organized core of the community has spent a lot of time, especially recently, trying to figure out how to retain those who give Wikipedia a try. There is the WikiLove campaign, which has received some media attention, but I’ll have to explain my skepticism another time. I’ve also heard that new account registrants are sometimes asked to identify areas of interest, which sounds like an interesting idea, but as far as I can tell it hasn’t been widely deployed.

Ultimately, whether Wikipedia’s declining user base represents a problem is not a question that exists in a vacuum. The question is really whether Wikipedia has enough editors to keep getting better or, at the very least, maintain its current level of quality. There are multiple answers here. As I’ve pointed out before, the Wikipedia community’s rapid response to breaking news is impressive: if you want a good primer on the United States debt ceiling crisis, Wikipedia has a very strong and evolving summary. But Wikipedia sometimes fares poorly with articles on many pre-Internet topics, especially in the social sciences: if you want to know about Money market funds, I’m not sure I can recommend Wikipedia.

It’s worth taking stock of the fact that Wikipedia’s decline among editors is a bit more than gradual, but does not now appear to be accelerating. The next two years will be telling, but I suspect that Wikipedia’s contributor base will find its floor, and my guess—though it is only that—is that we’re probably somewhere near it. Wikipedia is no longer the new hotness, and let’s face it, it’s an encyclopedia. To most it is far less thrilling and far more challenging than YouTube or Facebook, and we shouldn’t expect that Wikipedia’s participation will look anything like it. It’s no less popular as a destination for readers, and it would take a very significant drop in article quality for that to happen. (Like, say, if Wikipedia’s vandal patrol disappeared tomorrow… if anyone, send your WikiLove to them.)

I think the current situation also raises a question that many Wikipedians are loathe to consider, but that is the professionalization of some aspects of Wikipedia. This doesn’t necessarily mean hiring editors, but it could mean working out partnerships to share in the responsibility of maintenance and development of software and perhaps even some content. It’s an article of faith that much of Wikipedia’s early growth and unique characteristics derive from its volunteer force, but as any business professor can tell you, the skill set that launches a viable company is not the same skill set that brings that company to maturity. There is precedent for this; Wikipedia needs the Wikimedia Foundation, which does have a paid staff, although they avoid organized involvement in matters of content, except as individuals. Ultimately, Wikipedia must remain in the hands of its volunteer editors—to change that would be too fundamental a shift. But as Wikipedia grows more complex, it’s not hard to think they could use greater support.

Wikipedia’s Endless Pool Party (Not Quite What it Sounds Like)

Tagged as , , , ,
on February 16, 2011 at 11:26 am

There’s no longer a question of whether the English-language Wikipedia will hit the four million article mark: only when. While new topics may become increasingly difficult to come by, five, six million or more articles is not out of the question. And when Wikipedians are not busy working on making that happen, sometimes they like to place guesses on when those things will happen. If you visit Wikipedia’s vast backstage, you can find several current and past betting pools these milestones and others through the years.

One of the first was the Half-million pool, in June 2004, in which several dozen editors took part. When Wikipedia passed 500,000 articles on March 17, 2005 the winner (an active Wikipedian to this day) had guessed March 18, narrowly beating another who had guessed March 15. Since then, more recent pools have focused on landmarks including the Million pool (passed March 1, 2006) and the 300-million edits pool (a matter of dispute, but certainly in 2009). Though there are just more than 3.5 million articles today, if you’d like to guess when Wikipedia’s four-millionth article will be created… I’m afraid you’re out of luck. No further guesses were taken after February 2010.

Among pools still open, one of two versions of the Five-million pool is still open, as is the Ten-million pool and the Twenty-million pool. In the latter category, one unlucky soul guessed 2007, several picks would have this achievement within the next decade, but more have placed their bets in the 2015-2025 range, and more still in the 2026-2100 range. A few have placed their bets on “Never”; time will tell… or not.

There are some more outlandish pools as well, including something like a dead pool: the Last topic pool. What will be the last article created on Wikipedia? There are some swell guesses; among my favorites are: “2100 Wikimedia server room fire” and “Why the zombies won”.

Want in on the fun? You can test your powers of prediction at Wikipedia:Pools. And if you do win, what exactly do you win? Is there any money involved here? Alas, no. Each page makes sure to note: “The person who comes closest to the actual date is the winner (of eternal fame).”

Photograph by Finlay McWalter, via Wikipedia.

Wikipedia’s Most Wanted

Tagged as , ,
on February 13, 2011 at 9:06 am

With more than 3.5 million articles on the English-language Wikipedia, it’s almost difficult to believe there could be much left to write. Although Wikipedia’s “hockey stick” growth has begun to slow down somewhat, the truth is that it still is growing very quickly—the English Wikipedia passed 3 million articles last August, and may well hit 4 million this year.

Indeed gaps do remain, and finding them can be a challenge. Now Magnus Manske, one of Wikipedia’s longest-active contributors (and the programmer of many different cool things) has created (actually, re-created after a long absence) a new DIY service called “Most wanted articles”.

Manske’s tool searches Wikipedia for “redlinks”. You’ve probably seen these around Wikipedia, and they are what they sound like: anchor text which is colored red because there is no article behind it. By contrast, links on Wikipedia that are colored blue will actually take you somewhere. Redlinks are sometimes considered unsightly, and they can be, if overused. Used selectively, they can highlight new subjects possibly deserving of new Wikipedia articles. Until that time comes—theoretically speaking—one can determine which are the “most wanted” by counting redlinks.

What follows is a list of the most-wanted Wikipedia articles, as of February 7, 2011:

  1. British films of 2011 (1842)
  2. British films of 2012 (1841)
  3. List of Argentine films of 2011 (1712)
  4. Bazinaprine (1204)
  5. Tetrindole (1203)
  6. Sercloremine (1203)
  7. Befol (1203)
  8. Esuprone (1134)
  9. Siddapur, Belgaum (1117)
  10. Milacemide (1059)

More than 1,000 redlinks for each of these topics? How did this happen? The answer is templates, especially “navboxes” which sit at the bottom of various articles, helping to group topics together. In each of the above-listed non-articles, redlinks to prospective articles have appeared in the following templates: Cinema of the UK, Cinema of Argentina, Dopaminergics and Belgaum district.

It might be more interesting to find out which articles were the most-wanted according to organically-created redlinks in article text, but that’s a bit more challenging; such a list may or may not be forthcoming. That said, the more of these articles created or otherwise dealt with, the closer we’ll get to those ones, further down the list.

And in fact, as I post this on February 13, 2011, the list has changed as some of these articles have been created—almost certainly based on discussion among Wikipedia editors about this list. The next time you find yourself looking for information about Bazinaprine, a monoamine oxidase inhibitor believed to be useful for the treatment of depression, then you have Manske (and of course the editor who took up the cause) to thank.

The Wikipedian Mystique: Do Women Participate Enough in Wikipedia?

Tagged as , , , ,
on February 7, 2011 at 4:57 pm

Could it really be that just 13% of Wikipedia editors are women? That statistic comes from a survey of Wikipedia users (whether contributing or just reading) sponsored by the Wikimedia Foundation, first previewed in fall 2009 and eventually published in full in March 2010. Last week, Wikimedia executive director Susan Gardner announced plans to try raising this number to 25% by 2015. Thanks to coverage by Noam Cohen in The New York Times, the topic has dominated Interweb discussion of Wikipedia since then.

This participatory imbalance is not a new phenomenon, and hardly unique to Wikipedia. Cohen points to op-ed pages, and the same is considered to be true in their virtual equivalent, the political blogosphere. While there are some very prominent female contributors to all of the above, most surveys tend to show that men nevertheless lead these sectors.

On the other hand, as a female colleague pointed out to me, if you were to look at online forums about health care, animals, or the environment, the gender balance is likely to flip. The same is true with regard to professions; some are predominantly male or female, and many fall somewhere in between. Some combination of biological programming and social reinforcement produces a society with masculine and feminine traits. However, just because many stereotypes have a basis in reality does not mean they should be taken for granted or used as an excuse. Just because something is natural doesn’t make it right.

Among the many words expended on the topic, probably the best is by veteran Wikipedia contributor Kat Walsh; the entirety of it is worth reading, but here is the conclusion:

The big problem is that the current Wikipedia community is what came about by letting things develop naturally–trying to influence it in another direction is no longer the easiest path, and requires conscious effort to change. How do you become more inclusive without breaking the qualities that make the project happen to begin with? (Any easy, obvious answer to this question is probably wrong.) That Wikipedia works at all is an improbable thing; that it works, for the most part, well, nearly miraculous. Wikipedia’s culture doesn’t have to be hostile or unfriendly to a group for it to be underrepresented–it merely has to be not one of the most attractive options.

It so happens that “unfriendliness” has been identified as one possible reason. And it’s not that Wikipedia doesn’t have policies designed to address this issue: Wikipedia:Civility and Wikipedia:No personal attacks are core, non-negotiable site policies, augmented by further guidelines such as Wikipedia:Please do not bite the newcomers. The message is simple: Be polite to other editors, or you can be blocked. However, any experienced editor also knows that enforcement is uneven. Wikipedia is a very big place, where many editors are used to working in isolation. If someone comes along and starts behaving abusively, it can often feel like there is nowhere to turn. Even if you do know where to go for help, one actually must petition for a resolution, and this can be an unpleasant process. It’s also probably worth pointing out that this is an already issue on the presumably male-dominated website, so it is far from just women who feel this way.

Another issue worth considering is that no one actually knows for sure how many women are on the site. Anonymity on Wikipedia is guaranteed; hence the survey. But it’s trickier than that still, as I found out personally.

An early draft of the script for The State of Wikipedia video included the same detail from the survey Cohen cites. To make sure I had the details right, I sought the input of Erik Zachte, a data analyst for the Wikimedia Foundation and curator of information at the great Infodisiac website.

What he pointed out is that the survey had a significant problem with self-selection bias; more than a quarter of survey respondents came from Russia, for example. Among survey respondents, it is true somewhat less than 13% were female contributors. Slice it another way, and among contributors to the website, slightly more than 16% were female. Meanwhile, just 25% of survey-takers identified themselves as female. Therefore, the information concerning women on WIkipedia is considerably less likely to be accurate compared with men, but it still seems probable the percentage of female contributors is somewhere south of the 25% Gardner would like it to be.

The question then is what exactly she plans to do about it, and that discussion is underway now. If you want to be part of it, the Wikimedia Foundation has set up a mailing list to address the topic that is open to the public, and the Wikipedians you will find there are likely to be among the most thoughtful and welcoming. I certainly have my doubts that much will come of it, or that we’ll be able to reliably measure it. Wikipedia is a challenge to most people, from all walks of life, and any effort to artificially boost participation from any one group over the other is likely bound to meet with failure. If any solutions do arise, my guess is that will not necessarily be gender-specific.

As a final note, I find some irony in the fact that one reason put forth to explain why women don’t participate in Wikipedia is that they may not feel confident in their contributions, because on this particular topic, I don’t feel confident in my observations. Just for the record, on one hand I find that I am writing something because it’s a big topic and I don’t want to let it pass me by entirely; on the other hand, I think there is far more to be said about the subject than even a lengthy blog post can address. So I publish this now, unsure whether I’ve actually said anything worthwhile. Or maybe I’m overthinking it.

David Petraeus’ Big Month

Tagged as , , , ,
on June 28, 2010 at 8:34 am

In the views of many, Wikipedia tends toward frivolity. After all, the concept of Wikigroaning assumes that articles on pop culture subjects will be given less attention than articles on weighty subjects. While Wikipedia does include plenty of material that Britannica could and would never address, I’ve pointed out before that this isn’t always the case.

Here’s another reason to retain your faith in humanity, and this time not just Wikipedia’s contributors but also its visitors: this month’s traffic to the Wikipedia article about Gen. David Petraeus. He was in the news twice this month, and for very different reasons. First, on June 15, Petraeus fainted while testifying before the Senate Armed Services Committee. It’s just the kind of TMZ DC-ready story that gets attention, including video, which always helps. Indeed, the story caused traffic on his Wikipedia article to spike.

But as the chart below indicates, that was only about a tenth of the traffic to his page once President Obama nominated him to replace Gen. Stanley McChrystal as the U.S. commander in Afghanistan, following the latter general’s unsolicitous remarks about the Obama administration in Rolling Stone magazine. Perhaps this does not reveal too much, as this is undoubtedly the bigger news story, but it is also a much more complicated one, and at least indicates that no matter how many articles about Pokemon characters Wikipedia may hold, people can still find what’s important.

As for the fact that the top day for traffic to McChrystal’s Wikipedia article this month nearly doubled the traffic on Petraeus’ top day, well, I’ll let you judge that for yourself.

Snapshot of traffic to Wikipedia article about David Petraeus, June 2010.

Snapshot of traffic to Wikipedia article about David Petraeus, June 2010.

Traffic statistics courtesy User:Henrik.