William Beutler on Wikipedia

Archive for August 2009

How Did the New York Times Overestimate Wikipedia’s Popularity? [Corrected]

Tagged as , , , , , , , , , , ,
on August 30, 2009 at 11:59 am

Update: Man, did I blow this one? Yeah, I think I did. David Gerard points out in the comments that updated gobal comScore figures — which are not easy to come by but which have been donated to Wikimedia and are available here — indeed show that the Foundation’s websites at #4 globally, with Wikipedia presumably the biggest traffic-driver by a long shot. So, hey, that’s great news. And that should be more widely-known. However, in the U.S. Wikipedia is still somewhere around #9 overall.

Which brings me to the mistake that got me here: I had misquoted ComScore and Quantcast numbers below as being global figures when in fact they were U.S. That’s just my mistake, and essentially the same mistake I had accused Cohen of making. So, there you have it. I will retreat now to the assertion that the New York Times should adopt Wikipedia’s inclusion of inline citations. Then maybe I wouldn’t make mistakes like this one.

New York Times tech correspondent Noam Cohen, reporting on the final day of the Wikimania conference in Buenos Aires for the NYT’s Bits blog, begins his most recent dispatch as such:

Considering that Wikipedia has reached Top Five world status among Web sites – with more than 330 million users – its annual Wikimania conference, which ended Friday night in BuenosAires, featured a lot of hand-wringing about all the problems the project faces.

What catches my attention is the assertion that Wikipedia has attained “Top Five” status worldwide. Cohen doesn’t provide a source (no small irony there) which makes his decision to uppercase the phrase “Top Five” all the more curious. According to what metric? There are several to choose from. And according to whose calculations? There are several competing firms who collect, analyze and determine such rankings, but none of them is necessarily authoritative.

The best-known but least-respected is Amazon-owned Alexa, which currently puts Wikipedia at #6 globally, according to a combination of users and pageviews counted by Alexa’s (somewhat murky) sources. That’s close, but it’s not in the top five.

Compete.com, a web metrics company which makes some public rankings available, lists Wikipedia at best #9 globally, according to Unique visitors. Somewhat surprisingly, it doesn’t rank for their other metrics, such as Visits and Page views.

A similar company is Comscore (I mean, comScore) which releases such information on a press release basis. Their last report, in July, put Wikimedia Foundation Sites at #10 for Unique visitors — actually down one place from a few months earlier.

Another service is Quantcast, one of the newer entrants and also one of the most-praised. Quantcast currently puts Wikipedia at #8. Although I like that figure — it reflects figures I’ve seen in months past and have quoted numerous times — perhaps we can split the difference and say, right now, Wikipedia is #9 overall. Nothing to be ashamed of there.

But then where does Cohen’s “Top Five” claim derive? I tried Googling for the answer, and I think I might have it.

According to an August 8, 2009 entry published on the blog of a web design firm which may be called PJ Designs and Concepts, Wikipedia lands in the “top five Social Media websites in terms of Inbound Links, Google Page Rank, Alexa Rank, and U.S. traffic data from Compete and Quantcast.” In fact Wikipedia ranks second, behind only MySpace and ahead of YouTube, Facebook and Photobucket. I find this claim somewhat suspicious. For one thing, Facebook routinely ranks in the top three of rankings by Alexa, Compete and Quantcast (follow the above links). It also has an identical PageRank to MySpace: 9/10, which Wikipedia also enjoys. That the post is authored by “admin” does not especially inspire confidence, either. And of course, these are just “social media” sites and not all “Web sites.”

Granted, it’s possible that new scholarship was announced at Wikimania, but I think that would have been worth a headline itself. As much as I’d like to see Wikipedia at #5 (let alone #2) I think we’d know if this was the case. If there is another explanation for Cohen’s assertion than the one I propose above, I can’t find it. But I’ll let you know if I find out.

Flagged Revisions Come to the English Wikipedia

Tagged as , , , , , , , ,
on August 26, 2009 at 6:39 am

Earlier this week, New York Times web reporter Noam Cohen, who does some of the best Wikipedia reporting this side of The Register, broke the news about a decision by Wikipedia’s parent organization to instate tighter controls on some articles. Wrote Cohen:

Officials at the Wikimedia Foundation, the nonprofit in San Francisco that governs Wikipedia, say that within weeks, the English-language Wikipedia will begin imposing a layer of editorial review on articles about living people.

The new feature, called “flagged revisions,” will require that an experienced volunteer editor for Wikipedia sign off on any change made by the public before it can go live. Until the change is approved — or in Wikispeak, flagged — it will sit invisibly on Wikipedia’s servers, and visitors will be directed to the earlier version.

The change is part of a growing realization on the part of Wikipedia’s leaders that as the site grows more influential, they must transform its embrace-the-chaos culture into something more mature and dependable.

It’s worth pointing out early on, as Cohen’s story unfortunately did not, that these changes will apply only to biographies of living persons. In Wikipedia, that is a proper noun: Biography of Living Persons (BLP) is one of Wikipedia’s most strenuously enforced policies; earlier this year, Wikipedia veteran Newyorkbrad explained this in a series of posts on Volokh Conspiracy, which The Wikipedian previously discussed.

Blogosphere reaction has been much more widespread than any Wikipedia story that comes to mind from this past year. I think this is because everybody who uses Wikipedia has some opinion about the website’s curious balance between openness and reliability — and now the balance has shifted. I’d say reaction is roughly divisible into four quadrants: those who mourn Wikipedia’s openness vs. those who will continue to question Wikipedia’s reliability, with those who are optimistic about the change vs. those who are not. Here is a walk-through:

Among those who feel that Wikipedia’s openness is key to the site’s success, count Judd Antin at TechnoTaste, who is studying Wikipedia as part of his PhD work:

As part of my dissertation research I’ve been interviewing less experienced Wikipedians about their perceptions of the site. One constant theme has been the perception of a class system in Wikipedia. Casual editors worry that their edits aren’t good enough, and that they’ll be rebuked by Wikipedia’s upper-classes. They perceive a mystical group of higher-order contributors who make Wikipedia work. … This latest move is troubling in that it seems to represent a lack of faith in crowdsourcing and the wisdom of crowds, in the model that made Wikipedia what it is today. This change will also remove another of the important social-psychological incentives that draw new people into the Wikipedia fold: the instant gratification that comes from seeing your work reflected on a Wikipedia page.

This is not always a good thing; Kate McMillan at Small Dead Animals is an example of someone who is the subject of a Wikipedia article, but is not exactly pleased about the fact. She also isn’t exactly optimistic that things will change:

My own Wiki page was instigated by an internet “stalker”, in fact, the same individual who once authored a blogspot site using my stolen identity. Requests to Wikipedia to delete the page went unheeded, and it’s remained a reliable source of misinformation, false attribution of quotes, and drive-by smears ever since. … It wasn’t until I threatened a Wiki editor personally with legal action for restoring defamatory material to the page, that they began to take tighter control of the content.

Another skeptic is Ann Bartow at Madisonian.net:

I have doubts about how effective this is going to be in improving the reliability of the content of Wikipedia entries, but it is a great PR move by Jimmy Wales, that’s for sure.

From the perspective of a frustrated editor, here is Andy Merrett at The Blog Herald:

As someone not in the Wikipedia “elite”, I’ve long since given up trying to edit entries on the site, having already wasted not insignificant time adding information only to have it reversed. I foresee that Wikipedia will increasingly become a place where only a minority of privileged and “trusted” editors have the keys to the kingdom.

That is a plus to others. Among the critics of Wikipedia’s reliability was Lisa Gold at Research Maven, who nonetheless is a skeptic herself:

I’m glad there is finally some acknowledgment among the powers that be at Wikipedia that accuracy is important. But that’s not enough. If accuracy is important, you have to make it a priority and do things on many different levels to try to achieve it. You have to apply your policies to the entire site, not just some articles. You have to bring in people with knowledge, experience, and qualifications to do real editing and fact-checking. (With all of the unemployed editors, fact-checkers, and journalists out there, why not hire a few and let them work their magic.) This new policy is not really about making Wikipedia more accurate, it’s just about trying to stop the embarrassing vandalism stories that hit the news with disturbing regularity.

A similar sentiment was expressed by Dr. Jim West, who appears to have some experience arguing with an intellectual opponent about Wikipedia content. His reaction to the change:

In a word, duh. Now if you’ll do the same for every entry then perhaps your resource might be worth visiting some day. Until then, I think I’ll continue to abstain. I’m not really interested in reading an article on the Dead Sea Scrolls that Raphael Golb has edited using one of his 200 fake names.

While I understand the concerns of both above, I also think they go too far. Striking a balance and offering a more optimistic view is Ben Parr at Mashable:

[W]e can’t help but feel a bit sad that this change had to happen. Wikipedia was egalitarian in the spread and use of information, and it treated everyone as equal contributors of knowledge. While that may not necessarily be true in the real world, it still was the driving force behind the creation of 3 million articles, more than any other encyclopedia could ever hope to boast.

The move was necessary, but it does mark a new chapter in the Wikipedia information age and the end of an old one.

And here’s another philosophical take from Joe Windish at The Moderate Voice:

There is little doubt the debate will be passionate, but that’s exactly as it should be. Eight years into the incredible success of Wikipedia, long one of the 10 most popular sites on the Web, many of us still don’t understand it. … The thousands of volunteer Wikipedian editors take their responsibility seriously. Flagged revisions may or may not work. What’s best about it is that the Wikipedia editorial community will watch and wonder about and debate it. And if it should not succeed, they will try and try again.

My own take on the situation? I don’t know yet. As Andrew Lih explains in his book, The Wikipedia Revolution, the German-language edition has had this feature for several years, and it seems to work there. On the other hand, the English Wikipedia is much larger, and the possibility certainly exists that some articles will be left unchecked and un-updated for extended periods of time. Will the site grow stagnant? Will the vast majority of people who read but do not edit even notice? These are just a few of the operative questions.

WikiProject Flagged Revisions, which will try to keep articles current, was only established on the 19th of August and as yet has just four listed participants. It’s also worth noting, once the details are hammered out — which they are not just yet — the plan will be implemented on a two-month trial basis. And after that? Well, I’m very interested to find out myself.

Three Million Served

Tagged as ,
on August 18, 2009 at 6:32 am

This week marks a milestone for the English-language Wikipedia that is both major and somewhat arbitrary: the creation of its 3 millionth article. If you visit the front page of Wikipedia now, you will see this message:

wiki-3-million

That article, about Norwegian actress Beate Eriksen is currently locked down to prevent vandals from messing it up, something that happens with nearly every article that gets widespread attention. Of course, usually it is because the subject was in the news, rather than the article itself.

As the chart below indicates (taken from here), Wikipedia passed 2 million articles in the third quarter of 2007. Will it take another 2 years for Wikipedia to reach 4 million?

wiki-article-growth

Actually, it may take a bit longer: Wikipedia’s article growth has been slowing down. This has been a topic for discussion on the Wikipedia Weekly podcast at least as far back as a year ago, and is inevitable. Given Wikipedia’s success and its strict rules on what qualifies for an article, there will come a point where most articles have already been created. We may have reached that point.

Or, as I think more likely, we have created most of the articles that can be assembled from web sources and in-print books. That’s why I think the next phase of Wikipedia’s growth will have to depend on archived materials involving historical subjects that are exactly the type of article Wikipedia does least well at. This wouldn’t stop Wikipedia’s growth from slowing, but it would keep its growth meaningful.

Update: From the comments, here are two thoughts from very smart and much more experienced Wikipedians than yours truly. First, David Gerard:

Actually, I think we’ve barely scratched the surface of books, in-print or not. What’s been done so far isn’t even the low-hanging fruit, it’s the fruit that’s actually sitting on the ground waiting to be picked up.

The growth curve so far looks like a logistic curve with a linear increase on top.

One interesting thing is that is the growth curves for the other large Wikipedias look similar. And the smaller Wikipedias are typically in early linear growth or the exponential upcurve of the logistic curve.

And from Sage Ross, a Wikipedia Weekly contributor:

“we have created most of the articles that can be assembled from web sources and in-print books”

That’s not nearly the case, especially if you count digitized scholarly journals as available sources too. Wikipedia could easily have another 3 million articles (probably more like 30 million) based on published sources. It’s just that the deeper you go into specialized areas where the untapped sources are rich, the fewer people there are who are interested in and/or capable of writing about those areas.

Bill Clinton’s Excellent Adventure

Tagged as , , , , ,
on August 5, 2009 at 11:27 am

Update: Hmm, so it looks like I may have gotten out ahead of the details on this one. See the comments, where fellow Wikipedian Graham87 points out that the current Wikipedia database does not in fact include edits from the early months of Wikipedia. As he points out, here is an earlier version of the Bill Clinton article. And what does that mean for this particular series? Well… at least I will have to select articles from approximately 2002 on.

The 42nd president is enjoying a pretty good week, having returned this morning from North Korea with American journalists Laura Ling and Euna Lee free upon his successful negotiations with Kim Jong-Il. This seems as good a moment as any for the second installment in a series on the first versions of major Wikipedia articles.

Bill Clinton left office just five days after Wikipedia was founded in January 2001. Although one might think this would make him a strong candidate for being one of the first articles created, it so happens that no such article was created until November 17 that year. And even then another editor would not contribute again for nearly another month — coincidentally the same day a Wikipedia article was created for his successor.

The first version of the Bill Clinton article was fairly substantial: 979 words excluding the Table of Contents. This is less than a tenth of the 9,900-some words of the Bill Clinton article today — to say nothing of all the articles about the many peripheral articles such as Electoral history of Bill Clinton — but it’s still pretty good.

Here is the first paragraph (of a much longer intro) today:

William Jefferson “Bill” Clinton (born William Jefferson Blythe III, August 19, 1946)[1] served as the 42nd President of the United States from 1993 to 2001. He was the third-youngest president; only Theodore Roosevelt and John F. Kennedy were younger when entering office. He became president at the end of the Cold War, and as he was born in the period after World War II, he is known as the first Baby Boomer president.[2] His wife, Hillary Rodham Clinton, is currently the United States Secretary of State. She was previously a United States Senator from New York, and also candidate for the Democratic presidential nomination in 2008. Both are graduates of Yale Law School.

Here is the first paragraph (of a much longer intro) then:

William Jefferson Clinton (Democrat) was the 42nd President of the United States, from 1993-2001. He was born August 19, 1946 in Hope, Arkansas. He was named after his father, William Jefferson Blythe II, who had been killed in a car accident just three months before his son was born.

In the original version, the Lewinsky scandal is handled in two short paragraphs in the intro section; by now Lewinsky and the subsequent impeachment trial have two short sections which link away to very comprehensive sections of their own.

While Wikipedia today strives to be non-partisan and avoid self-references, these concepts were less-developed early on, and this can be seen in how the original version closed. The last proper article sentence concluded:

There’s a great deal more to be said about him — let’s try to keep it non-partisan and encyclopedic.

And a deprecated link to the Talk page, at the time included in the text of the article itself, said:

/Talk (go ahead and be partisan there)

Not to worry — eight years later, they still are.