William Beutler on Wikipedia

Posts Tagged ‘TechCrunch’

The Agony and Ecstasy of Wikidata

Tagged as , , , , , , , , ,
on April 12, 2012 at 8:31 am

Although Wikipedia is by far the best-known of the Wikimedia collaborative projects, it is just one of many. Just this last week, Wikimedia Deutschland announced its latest contribution: Wikidata (also @Wikidata, and see this interview in the Wikipedia Signpost). Still under development, its temporary homepage announces:

Wikidata aims to create a free knowledge base about the world that can be read and edited by humans and machines alike. It will provide data in all the languages of the Wikimedia projects, and allow for the central access to data in a similar vein as Wikimedia Commons does for multimedia files. Wikidata is proposed as a new Wikimedia hosted and maintained project.

Possible Wikidata logo

One of a few Wikidata logos under consideration.

Upon its announcement, I tweeted my initial impression, that it sounded like Wikipedia’s answer to Wolfram Alpha, the commercial “answer engine” created by Stephen Wolfram in 2009. It seems to partly be that but also more, and its apparent ambition—not to mention the speculation surrounding it—is causing a stir.

Already touted by TechCrunch as “Wikipedia’s next big thing” (incorrectly identifying Wikipedia as its primary driver, I pedantically note), Wikidata will create a central database for the countless numbers, statistics and figures currently found in Wikipedia’s articles. The centralized collection of data will allow for quick updates and uniformity of statistical information across Wikipedia.

Currently when new information replaces old, as is the case with census surveys, elections results and quarterly reports are published, Wikipedians must manually update the old data in all the articles in which it appears, across every language. Wikidata would create the possibility for a quick computer led update to replace all out of date information. Additionally, it is expected that Wikidata will allow visitors to search and access information in a less labor-intensive method. As TechCrunch suggests:

Wikidata will also enable users to ask different types of questions, like which of the world’s ten largest cities have a female mayor?, for example. Queries like this are today answered by user-created Wikipedia Lists – that is, manually created structured answers. Wikidata, on the hand, will be able to create these lists automatically.

Though this project—which is funded by the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, and Google—is expected to take about a year to develop, but the blogosphere is already buzzing.

It’s probably fair to say that the overall response has been very positive. In a long post summarizing Wikidata’s aims, Yahoo! Labs researcher Nicolas Torzec identifies himself as one who excitedly awaits the changes Wikidata promises:

By providing and integrating Wikipedia with one common source of structured data that anyone can edit and use, Wikidata should enable higher consistency and quality within Wikipedia articles, increase the availability of information in and across Wikipedias, and decrease the maintenance effort for the editors working on Wikipedia. At the same time, it will also enable new types of Wikipedia pages and applications, including dynamically-generated timelines, maps, and charts; automatically-generated lists and aggregates; semantic search; light question & answering; etc. And because all these data will be available as Open Data in a machine-readable form, they will also benefit thrid-party [sic] knowledge-based projects at large Web companies such as Google, Bing, Facebook and Yahoo!, as well as at smaller Web startups…

Asked for comment by CNet, Andrew Lih, author of The Wikipedia Revolution, called it a “logical progression” for Wikipedia, even as he worries that Wikidata will drive away Wikipedians who are less tech-savvy, as it complicates the way in which information is recorded.

Also cautious is SEO blogger Pat Marcello, who warns that human error is still a very real possibility. She writes:

Wikidata is going to be just like Wikipedia in that it will be UGC (user-generated content) in many instances. So, how reliable will it be? I mean, when I write something — anything from a blog post to a book, I want the data I use in that work to be 100% accurate. I fear that just as with Wikipedia, the information you get may not be 100%, and with the volume of data they plan to include, there’s no way to vette [sic] all of the information.

Fair enough, but of course the upside is that corrections can be easily made. If one already uses Wikipedia, this tradeoff is very familiar.

The most critical voice so far is Mark Graham, an English geographer (and a fellow participant in the January 2010 WikiWars conference) who published “The Problem with Wikidata” on The Atlantic’s website this week:

This is a highly significant and hugely important change to the ways that Wikipedia works. Until now, the Wikipedia community has never attempted any sort of consistency across all languages. …

It is important that different communities are able to create and reproduce different truths and worldviews. And while certain truths are universal (Tokyo is described as a capital city in every language version that includes an article about Japan), others are more messy and unclear (e.g. should the population of Israel include occupied and contested territories?).

The reason that Wikidata marks such a significant moment in Wikipedia’s history is the fact that it eliminates some of the scope for culturally contingent representations of places, processes, people, and events. However, even more concerning is that fact that this sort of congealed and structured knowledge is unlikely to reflect the opinions and beliefs of traditionally marginalized groups.

The comments on the article are interesting, with some voices sharing Graham’s concerns, while others argue his concerns are overstated:

While there are exceptions, most of the information (and bias) in Wikipedia articles is contained within the prose and will be unaffected by Wikidata. … It’s quite possible that Wikidata will initially provide a lopsided database with a heavy emphasis on the developed world. But Wikipedia’s increasing focus on globalization and the tremendous potential of the open editing model make it one of the best candidates for mitigating that factor within the Semantic Web.

Wikimedia and Wikipedia’s slant toward the North, the West, and English speakers are well-covered in Wikipedia’s own list of its systemic biases, and Wikidata can’t help but face the same challenges. Meanwhile, another commenter argued:

The sky is falling! Or not, take your pick. Other commenters have made more informed posts than this, but does Wikidata’s existence force Wikipedia to use it? Probably not. … But if Wikidata has a graph of the Israel boundary–even multiple graphs–I suppose that the various Wikipedia authors could use one, or several, or none and make their own…which might get edited by someone else.

Under the canny (partial) title of “Who Will Be Mostly Right … ?” on the blog Data Liberate, Richard Wallis writes:

I share some of [Graham's] concerns, but also draw comfort from some of the things Denny said in Berlin – “WikiData will not define the truth, it will collect the references to the data…. WikiData created articles on a topic will point to the relevant Wikipedia articles in all languages.” They obviously intend to capture facts described in different languages, the question is will they also preserve the local differences in assertion. In a world where we still can not totally agree on the height of our tallest mountain, we must be able to take account of and report differences of opinion.

Evidence that those behind Wikidata have anticipated a response similar to Graham’s can be found on the blog Too Big to Know where technologist David Weinberger shared a snippet of an IRC chat with he had with a Wikimedian:

[11:29] hi. I’m very interested in wikidata and am trying to write a brief blog post, and have a n00b question.
[11:29] go ahead!
[11:30] When there’s disagreement about a fact, will there be a discussion page where the differences can be worked through in public?
[11:30] two-fold answer
[11:30] 1. there will be a discussion page, yes
[11:31] 2. every fact can always have references accompanying it. so it is not about “does berlin really have 3.5 mio people” but about “does source X say that berlin has 3.5 mio people”
[11:31] wikidata is not about truth
[11:31] but about referenceable facts

The compiled phrase “Wikidata is not about truth, but about referenceable facts” is an intentional echo of Wikipedia’s oft-debated but longstanding allegiance to “verifiability, not truth”. Unsurprisingly, this familiar debate is playing itself out around Wikidata already.

Thanks for research assistance to Morgan Wehling.

Is Quora the Next Wikipedia? Part II: Follow the Leader

Tagged as , , , , ,
on March 1, 2011 at 3:40 pm

The first part of this series is available here: Is Quora the Next Wikipedia? Part I.

I’m not persuaded that Quora is necessarily an attempt to displace Wikipedia, but I do believe it was designed to emulate aspects of “the encyclopedia anyone can edit” that make sense for Quora while trying a different approach. Mike Arrington has said that Quora is about creating a “better” Wikipedia, but it isn’t clear just yet that its approach is actually better. In some ways, I’ll bet it’s worse.
Wikipedia and Quora logos
But before we compare the pluses and minuses of each model, let’s first consider the ways in which Quora consciously follows Wikipedia’s lead.

First and foremost, Arrington and company aren’t making the comparison to Wikipedia without a strong hint from the site itself. Indeed, calling Quora a “Q&A website” is a bit like saying Bill Simmons is a “sportswriter”; no one will say you’re wrong, but that misses the bigger picture. And Quora doesn’t hide its ambitions; the very first paragraph of its About page declares:

“Quora is a continually improving collection of questions and answers created, edited, and organized by everyone who uses it. The most important thing is to have each question page become the best possible resource for someone who wants to know about the question.”

Except for the “question” part, that sounds a heck of a lot like Wikipedia. A few paragraphs later:

“People use Quora to document the world around them. Over time, the database of knowledge should grow and grow until almost everything that anyone wants to know is available in the system.”

Other Quora policies clarify that, yes, you may ask easy questions and, yes, you may ask questions you already know the answer to. How else could the system grow to encompass virtually everything under the sun?

Based on the above and nothing more, I’d say one could describe Quora as a “reverse Wikipedia”: rather than presenting a set of facts on a general topic answering many hypothetical questions, as Wikipedia does, Quora wants to organize the same information around very non-hypothetical questions.

Read a little further into Quora’s list of policies and the hints go from “strong” to “explicit”. Asked about spelling and capitalization, Quora punts:

“When possible, use Wikipedia as a guide. … For things that Wikipedia doesn’t provide a model for, try to use the same pattern that Wikipedia uses for similar things.”

The same goes for naming topics:

“When there is controversy over a topic’s name, we generally prefer Wikipedia’s conventions.”

Asked about limits on acceptable user behavior, Quora policy states:

“Users are also not allowed to post content or adopt a tone that would be interpreted by a reasonable observer as [list of horribles]. This policy is based on Wikipedia’s policy on harassment.”

A related guideline points to Wikipedia’s policy on personal attacks. One can call it copy-catting, but I’d say it shows respect for the thought and effort Wikipedia’s contributors have put into the challenges of categorization and cultivation of community.

And there is more still. While Quora remains in the early stages of development, its creators have already declared some future plans. One is something no other Q&A site has attempted, and that is introducing a preferred format for citing sources. It’s currently quite primitive, and I have not much seen them much in use, but their intentions are clear. Quora policies allow that citations are optional, but promises their use will be rewarded:

A good reason [to use the format] is that when/if Quora adds real footnote support, footnotes following these guidelines will be automatically converted.

So far, Quora has proven to be extraordinarily well thought out. Of course they’ve had considerable help, but to their credit they’ve certainly nodded in the direction of their inspiration.

Now that we’ve established that Quora is indeed a lot like Wikipedia, we still need to analyze how the two platforms differ. Then we can discuss the advantages and disadvantages of each. And that’s my next post.

If you are so inclined, you may follow me on Quora.

Is Quora the Next Wikipedia? Part I

Tagged as , , , , , , , , ,
on February 28, 2011 at 10:13 am

In the past few months, I’ve become increasingly interested in the hit startup website Quora. If you’re not familiar with it, the simplest explanation is that it’s a Q&A website that gets right what earlier incarnations got wrong.* A longer explanation would include a discussion of why it is much more ambitious.

To expand on the point: Answers.com is a wasteland of unanswered questions and no visible community, while Quora has real enthusiasts. ChaCha has more reliable respondents, but they are paid generalists who may not know much about a given topic. Yahoo! Answers seems to have a genuine community, albeit one full of know-nothings. Quora, on the other hand, has attracted the participation of experts (at least in tech) who volunteer their time to create new content on topics of their own interest.

Does this sound like any other websites you know?

Quora’s strengths as a social media platform and Q&A site are evident: it looks sharp and stylish, seems to be well thought out, and has followed the Facebook-Twitter model of starting with a core group of likeminded users before gradually expanding its user base. While it is very far from being a household word, it is often enough compared to those two social media juggernauts, and in fact has early Facebook employees on board. But more and more it is being compared to Wikipedia, which answers the question (so to speak) about why I’ve become so fascinated by it.

To wit: A recent post by Techcrunch editor Mike Arrington declared that Quora was about building “a better Wikipedia”. John Keehler at Random Culture recently called it “Wikipedia, Evolved”. In response to these, Teluq-UQAM professor Seb Paquet published an essay at The Quora Review titled “Why Quora is Not Wikipedia”.

But if Quora’s goal is to “beat” Wikipedia—and I have not heard its founders claim this as a goal—it is very far from doing so now. For virtually every topic Wikipedia addresses, the site is usually found at or near the top of relevant search engine results. Its ubiquity is so great that some have speculated Google purposefully elevates Wikipedia in search results (the more likely reason is that wiki software does many things Google bots look for, and many people link to it). Quora, on the other hand, is nowhere to be found in most searches.

Wikipedia contains 3.5 million separate articles (in its English edition alone), each of which may cover several related topics in detail. And with a few million more “redirects” also catching the eye of Google’s crawlers, the number of opportunities for Wikipedia to land a prominent position on a search results page may be in the neighborhood of ten million. The number of questions on Quora is, at present, not public information.

Any way you slice the numbers, Wikipedia is one of the top ten websites in the United States and the entire world. According to Alexa, Quora is at best the 1,269th website in the United States, and is so far limited to the English language. Wikipedia has been around for more than ten years; Quora, less than two. Whatever Quora might achieve in the future, it has not yet. Wikipedia certainly has.

Quora and Wikipedia are unique in many ways, but to focus on where they are different is to gloss over what they have in common. Meanwhile, Arrington’s flat statement that Quora is “better” greatly oversimplifies the matter. Instead, I’d like to examine what they do have in common, and how they may compete with or complement each other.

In my next post, that’s just what I’ll do.

P.S. If you’d like, you can follow me on Quora.

* On Twitter, Matt Bucher reminds me of Ask MetaFilter, which is different in several ways from the sites discussed above. He is right to identify it as a quality site; the MetaFilter community has been well-cultivated in its decade-plus existence, and is a fine and frequently thoughtful resource for its community. However, I think that’s all it ever plans to be: one section of a larger online community.

April Fools! …or Not?

Tagged as , ,
on April 1, 2009 at 8:35 am

Today is April Fools’ Day, and among those getting in on the act are the Wikipedians who update the “In the news” section of the English Wikipedia‘s front page:

wikipedia-aprilfools

Ireland’s PM, naked? Diamonds in the sky? Hartford and New Orleans collide? Actually… yes, yes and yes. Where most April Fools jokes are invented from whole cloth — TechCrunch has a guide to many of the Internet’s more prominent hoaxes today — all of these stories are 100% true. They’ve just been couched in dubious language.

Click through the image today, or try here after April 1, to see the real stories for yourself.