William Beutler on Wikipedia

Posts Tagged ‘Wikidata’

The Top 10 Wikipedia Stories of 2015

Tagged as , , , , , , , , , , , , , , , , , , , , ,
on December 22, 2015 at 3:28 pm

Each year since 2010, The Wikipedian has looked back at the year on Wikipedia and taken a stab at determining which trends, milestones, and controversies most influenced the direction of Wikipedia in the twelve months preceding.

This is no easy task, considering the millions of articles, edits, and editors within the scope of Wikipedia and its sister projects, not to mention the off-wiki and even offline circumstances affecting them. The most important events may be overlooked, acknowledged major events can be misunderstood, and the significance of each can differ greatly depending on one’s viewpoint. No matter, The Wikipedian will make its best effort regardless.

This time around I’m pairing our retrospective with a post on the blog of my firm, Beutler Ink, called “Ten Predictions for Wikipedia in 2016”. I recommend reading this one first: as we learn from the Bard, what’s past is prologue.

♦     ♦     ♦

10. Wikidata Rising

When Wikidata, the collaborative structured database project, first launched in 2012, it was difficult to summarize with any confidence. The Wikipedian covered it by carefully outlining its stated goals and quoting the speculative news and blog coverage. At the end of 2015, it’s not much easier to describe to a layperson, and many of its goals remain just that, but Wikidata’s growth is undeniable and the passion it inspires in the Wikipedia community is unmistakable. At this year’s Wikimania conference, Wikidata’s presence was felt like never before.

Screenshot 2015-12-22 10.39.33One big reason: Wikidata is unexplored territory in a way that Wikipedia no longer is. The encyclopedia project feels mature at 5 million articles (more about that below), but the database at only 15 million items has a long road ahead of it. For editors who joined the larger Wikimedia movement for the joy of discovery, Wikidata is where it’s at. The project still has some very real challenges, some of which unsurprisingly mirror those of Wikipedia, but it’s possible now to imagine that Wikidata, not Wikipedia, may prove to be the real “sum of all human knowledge”.

9. Exodus from New Montgomery Street

Has Wikipedia’s parent organization, the Wikimedia Foundation (WMF), seen a year with more comings and goings from its headquarters on San Francisco’s New Montgomery Street than 2015? It seems unlikely. The organization has seen admired veterans and high-level executives depart under different circumstances, and some touted recruits from Silicon Valley firms arrived to fanfare, only to exit quickly, and without comment. The only reason this exodus of talent isn’t higher on this list is because it’s one of 2015’s least-reported stories.

Approximately 18 months since Lila Tretikov became executive director, the WMF has experienced almost 100% turnover. For some longtime staff, it was probably time to move on anyway. And any incoming leader can be expected to make new hires and rearrange reports to their liking. But the very short tenures of some key hires, and mysterious circumstances surrounding some departures, can’t help but raise questions about whether Tretikov is in command of her personnel—and perhaps even if she’s the leader Wikipedia needs.

8. Community Tensions Felt in Trustee Elections

The Wikimedia Board of Trustees is the “ultimate corporate authority” of the Wikimedia Foundation, and its number includes three members elected from the volunteer community. The most recent election, held in May, was also the first since a major fight between the foundation and community over software implementation (Media Viewer) and platform control (Superprotect) in 2014. Against this backdrop, disagreements over Wikipedia’s next big software initiative, Flow, became increasingly increasingly pronounced—and a few months later, the project was shelved.

Perhaps it’s unfair to assume a direct cause-and-effect, but the result seemed to be a “throw the bums out” election. Ousted were Phoebe Ayers, Samuel Klein, and María Sefidari (in fairness, none were “bums”, nor particularly responsible for the problem). In are three respected veterans with the good fortune of non-incumbency: James Heilman, Dariusz Jemielniak, and Denny Vrandečić.

Oddly, the two women ousted received the first and third most votes in favor, but Wikimedia accounts for “oppose” votes, and they had too many of those. Today, just two Board members are women, the lowest representation in Wikipedia’s history.

7. “Wikipedia Hates Women”—or Maybe Just Lightbreather

Wikipedia’s alarmingly low female participation rate is decidedly not a new problem. The issue first came to attention in the late 2000s, as editor surveys confirmed suspicions that Wikipedia was a total brodown. Today, the gender gap remains a frequent topic of debate, including a much-discussed Cracked.com article whence this entry takes part of its name.

The other half of the title comes from what’s called the “Lightbreather” case, focusing on a female editor with this username, and her interactions with, among others, a (male) editor named Eric Corbett. A disinterested appraisal of the case would find plenty of fault with both, although there is not one person in the world who possesses the powers of concentration necessary to follow all of the rabbit holes leading from this single case. Notwithstanding the particulars, it became the subject of a provocative, error-ridden, five-times corrected but nevertheless widely read article in The Atlantic, held up as one example of Wikipedia’s “hostility” to women.

The myriad possible explanations for this problem only open doors to more complicated issues. How much of the gender balance can be attributed to Wikipedia’s rules? Its community? Where is the line between heated disagreements and harassment? How much can be explained by how the web influences behavior? How much is this reflective of the tech industry’s gender gap? Will understanding this question help to explain why other marginalized identities, from Latinos to Africans, contribute to Wikipedia in small numbers? The answers to these questions seem within the reach of comprehension, but beyond the grasp of consensus.

6. A Clockwork Orangemoody

OrangeMoody-BubbleGraphCombined-NolabelsAnother perennial topic on Wikipedia is conflict of interest (COI), usually playing out as someone inside Wikipedia or outside writing a self-serving autobiography, a low-rent marketing firm getting in trouble for editing clients’ pages, or sometimes more favorably, a group of PR firms coming together to try to make a good impression. This year, however, brought us something we never quite imagined: a massive extortion plot inverting the typical model of paid editing: rather than helping paying customers create Wikipedia entries, non-paying “customers” could simply be threatened with unflattering articles.

Orangemoody, as it was named for its “ringleader” account, was called the largest of its kind, but that merely counted the number of involved user accounts (nearly 400). The truth is, there has never been anything quite like it. Previous cases revolved around unscrupulous firms like Wiki-PR and WikiExperts who at least professed to be offering their clients a service. Orangemoody was a shakedown involving pages held for ransom, impersonation of Wikipedia administrators, and no real-world entity to absorb the blame. Orangemoody is so threatening because it suggests that Wikipedia’s open-editing model opens the door not just to unethical, if conceivable shenanigans, but also to transgressions that are much more horrifying.

5. The Luck of Grant Shapps

Next to Orangemoody, there’s something almost comforting about the familiar narrative of alleged self-interested editing of Wikipedia by Tory MP Grant Shapps and the plot twist that brought his accuser to (relative) ignominy and ruin.

Amid the UK parliamentary elections this spring, a report emerged in the left-leaning Guardian, prompted by an allegation by a Wikimedia UK administrator, that Shapps had used a pseudonymous account to massage his own Wikipedia profile while giving a drubbing to others. It seemed plausible: Shapps had admitted to editing his own biography years ago, and using assumed names in other circumstances, and his side career as an Internet executive aided the narrative.

But the tables soon turned: the right-leaning Telegraph revealed that there was no smoking gun connecting Shapps to the suspicious edits, that the Wikipedia administrator, Richard Symonds, was in fact a Lib Dem activist who had communicated with the Guardian prior to taking action, and Wikipedians soon became concerned that Symonds may have abused his administrative privileges in blocking the suspicious account.

In the end, Symonds lost his adminship, and Shapps exited a succession of positions within the Conservative Party and government. All that’s missing is Keyzer Soze shrugging off his limp and lighting a cigarette.

4. Wikipedia’s Big Picture Trends in Flux

editors-risingAfter a long period of sustained narratives about Wikipedia’s traffic and editing trends, this year things got a little interesting. Following unabated growth in global traffic to Wikipedia, given a boost in recent years by the proliferation of web-enabled mobile devices, overall traffic actually fell for the first time. Meanwhile, after almost a decade of resignation to Wikipedia’s ever-dwindling editor base—a decline perhaps also attributable to the adoption of mobile devices—the numbers ticked upward.

An August report from an SEO analysis firm showed that Wikipedia’s search referrals from Google fell by up to 20% since the beginning of the year. Most speculation focused on Google’s ever-advancing practice of answering search queries on the results page, obviating the need to click through to non-Google websites. This has bedeviled companies like Yelp, which compete with Google to serve up reviews while also depending upon it for traffic. For Wikipedia, the situation is more complicated, and perhaps less of an issue. After all, a significant portion of Google’s answers are powered by Wikimedia projects. In fact, beginning in late 2014, Google wound down its own open knowledge database, Freebase, in favor of Wikidata. And Google still recommends more Wikimedia sites than it recommends Google sites.

Also in August, the first hard data emerged to show that the long, slow decline of active (and “very active”) Wikipedia editors had been arrested—and is now trending the other way, if ever so slightly. As close Wikipedia observers know too well, Wikipedia attained its zenith participation rate in 2007, arguably the high point for the project’s activity and excitement overall, after which the lowering tide revealed consternation and even alarm, with nobody knowing where it would end. Well, maybe here? The number of very active editors—with at least 100 edits monthly—Wikipedia’s most valuable contributors, stabilized in 2014 and actually grew in 2015. The decline of administrators, coupled with the difficulty in admitting new ones in recent years, however, remains an issue.

In both cases, more data is surely needed before we can say what it really means.

3. English Wikipedia Hits 5 Million Articles

Wikipedia_5m_ArticlesAdmittedly, most of these top stories are unhappy ones, and the one just above is arguably mixed, but this one is unambiguously celebratory: on November 1, Wikipedia’s English language edition—by far its most popular, and synonymous with “Wikipedia” for most readers—notched its 5 millionth article.

Wikipedia has been the largest encyclopedia by any reasonable measure for a long while, so nothing has really changed. And it took seven years for Wikipedia to double in size, so if growth trends continue holding steady for now, we might not have a similar milestone to celebrate until sometime the next decade. Meanwhile, sheer heft is easier to measure than other important characteristics, like accuracy or completeness, so this benchmark will remain Wikipedia’s equivalent of McDonald’s “Billions Served” for the foreseeable future. It may be an arbitrary measurement, but it’s a damned impressive one.

Number 5,000,000 itself: Persoonia terminalis, a rare shrub native to eastern Australia. Oh, and if you haven’t seen the RfC debating which temporary logo Wikipedia should display on the joyous day, I very much recommend taking a look at the near misses. Perhaps it will instill some faith in Wikipedia’s community processes if you agree the best logo won (and you should).

2. It’s About Ethics in Gamergate Opposition

In late 2014 and into the start of this year, the loosely-affiliated right-wing counterpart to the left-ish Anonymous expanded its focus from video game journalists to include the Wikipedia entries where said journalists’ critical takes had accumulated. Organizing on Reddit and other forums, the ‘gaters created numerous throwaway Wikipedia accounts to first try swinging Wikipedia’s coverage of their movement and a few of their top targets around to their liking and, when that failed, they took on Wikipedia editors directly.

gamergatelogoWikipedians fought back hard—too hard, in some cases—and when Wikipedia’s Arbitration Committee got around to handing out punishments, the only ones with anything to lose were the Wikipedia editors who cared. It also fed into the above-discussed ongoing trouble over Wikipedia’s treatment of gender issues, and was by far the year’s biggest blow-up along such lines, far greater than the argument over how to handle Caitlyn Jenner’s gender transition, which still lay ahead.

It’s hard to say if Gamergate is a 100-year-flood (although on the Internet, the time frame may be more like 100 months) or a sign of things to come. Wikipedia has faced trolls before, but few have been as dedicated or as destructive as the ones beneath the Gamergate bridge. The best defense is a strong base of committed Wikipedians, and perhaps this year shows us they’ll probably still be around to carry the sand bags and shore up the levees.

1. China, Russia, and Completing the HTTPS Transition

One aspect of Wikipedia’s global prominence that the foundation and movement alike have struggled to fully grasp is the role it can, should, and does play on the international stage. This year, the Wikimedia Foundation joined forces with the ACLU to sue the National Security Agency over its mass surveillance practices, only for the case to be thrown out by a federal court. As important as that fight may be, it is but one jurisdiction of many where Wikipedia has become a proxy for privacy and free speech battles, not to mention authoritarian power grabs.

In 2015, Wikipedia’s multi-year plan to convert all traffic moving through Wikimedia servers to the HTTPS encryption protocol was finally completed. HTTPS was first enabled for WMF sites in 2011, then became the default for logged in users in 2013, and this year was finally made the default for all traffic, including readers without a Wikipedia account. This is a good thing for Internet users who wish to access Wikipedia without their governments knowing about it. But it’s complicated when governments decide to shut off access altogether.

Indeed, the full implementation of HTTPS prevents governments like China from blocking access to specific entries—such as Tiananmen Square protests of 1989—and instead they have to choose between allowing all traffic, or blocking the site entirely. China opted for the latter. To be sure, Wikipedia wasn’t the biggest collaborative online encyclopedia in the PRC—it wasn’t even the second—and China’s Communist Party seems to be perfectly TankMancontent promoting its homegrown versions of Google, Facebook and Twitter. In December, Wikipedia’s famous co-founder, Jimmy Wales, traveled to China to participate in an Internet conference, where his comments about the limitations of the state’s ability to control the Internet were intentionally lost in translation, as the Wall Street Journal reports.

A similar issue is ongoing in Russia, where the government’s media authority, Roskomnadzor, has weighed blocking access to the Russian-language Wikipedia based on its entries about illegal drugs, temporarily blocking reader access. In addition, it may also be attempting to co-opt Russian-language editors, presenting further challenges to the independence of the Wikimedia project among Russian language contributors.

It’s unclear what Russia will decide to do, but it seems safe to assume that China will hold the line for the foreseeable future. In both countries, and under still more repressive regimes—like Kazakhstan and Azerbaijan—independent websites and even independent political parties and religious movements are allowed to operate only at these governments’ discretion. Why should Wikipedia be any different?

♦     ♦     ♦

And this seems like a perfectly good place to leave it. More often than not, Wikipedia’s issues reflect issues that animate and plague society and the Internet writ large. Open knowledge and digital discourse create incredible opportunities for research and innovation, but also bestow tremendous power to the platforms and communities that effectively control the gates. The problems on Wikipedia aren’t that different from those on Reddit or Twitter, they just feel more significant given the site’s mandate and perceived authority. To understand Wikipedia’s successes and failures, we have to look to ourselves for the answer.

If you liked this post, don’t forget to check out its companion piece at The Ink Tank: “Ten Predictions for Wikipedia in 2016”.

All images via Wikimedia Commons except Gamergate logo, source unknown.

The Agony and Ecstasy of Wikidata

Tagged as , , , , , , , , ,
on April 12, 2012 at 8:31 am

Although Wikipedia is by far the best-known of the Wikimedia collaborative projects, it is just one of many. Just this last week, Wikimedia Deutschland announced its latest contribution: Wikidata (also @Wikidata, and see this interview in the Wikipedia Signpost). Still under development, its temporary homepage announces:

Wikidata aims to create a free knowledge base about the world that can be read and edited by humans and machines alike. It will provide data in all the languages of the Wikimedia projects, and allow for the central access to data in a similar vein as Wikimedia Commons does for multimedia files. Wikidata is proposed as a new Wikimedia hosted and maintained project.

Possible Wikidata logo

One of a few Wikidata logos under consideration.

Upon its announcement, I tweeted my initial impression, that it sounded like Wikipedia’s answer to Wolfram Alpha, the commercial “answer engine” created by Stephen Wolfram in 2009. It seems to partly be that but also more, and its apparent ambition—not to mention the speculation surrounding it—is causing a stir.

Already touted by TechCrunch as “Wikipedia’s next big thing” (incorrectly identifying Wikipedia as its primary driver, I pedantically note), Wikidata will create a central database for the countless numbers, statistics and figures currently found in Wikipedia’s articles. The centralized collection of data will allow for quick updates and uniformity of statistical information across Wikipedia.

Currently when new information replaces old, as is the case with census surveys, elections results and quarterly reports are published, Wikipedians must manually update the old data in all the articles in which it appears, across every language. Wikidata would create the possibility for a quick computer led update to replace all out of date information. Additionally, it is expected that Wikidata will allow visitors to search and access information in a less labor-intensive method. As TechCrunch suggests:

Wikidata will also enable users to ask different types of questions, like which of the world’s ten largest cities have a female mayor?, for example. Queries like this are today answered by user-created Wikipedia Lists – that is, manually created structured answers. Wikidata, on the hand, will be able to create these lists automatically.

Though this project—which is funded by the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, and Google—is expected to take about a year to develop, but the blogosphere is already buzzing.

It’s probably fair to say that the overall response has been very positive. In a long post summarizing Wikidata’s aims, Yahoo! Labs researcher Nicolas Torzec identifies himself as one who excitedly awaits the changes Wikidata promises:

By providing and integrating Wikipedia with one common source of structured data that anyone can edit and use, Wikidata should enable higher consistency and quality within Wikipedia articles, increase the availability of information in and across Wikipedias, and decrease the maintenance effort for the editors working on Wikipedia. At the same time, it will also enable new types of Wikipedia pages and applications, including dynamically-generated timelines, maps, and charts; automatically-generated lists and aggregates; semantic search; light question & answering; etc. And because all these data will be available as Open Data in a machine-readable form, they will also benefit thrid-party [sic] knowledge-based projects at large Web companies such as Google, Bing, Facebook and Yahoo!, as well as at smaller Web startups…

Asked for comment by CNet, Andrew Lih, author of The Wikipedia Revolution, called it a “logical progression” for Wikipedia, even as he worries that Wikidata will drive away Wikipedians who are less tech-savvy, as it complicates the way in which information is recorded.

Also cautious is SEO blogger Pat Marcello, who warns that human error is still a very real possibility. She writes:

Wikidata is going to be just like Wikipedia in that it will be UGC (user-generated content) in many instances. So, how reliable will it be? I mean, when I write something — anything from a blog post to a book, I want the data I use in that work to be 100% accurate. I fear that just as with Wikipedia, the information you get may not be 100%, and with the volume of data they plan to include, there’s no way to vette [sic] all of the information.

Fair enough, but of course the upside is that corrections can be easily made. If one already uses Wikipedia, this tradeoff is very familiar.

The most critical voice so far is Mark Graham, an English geographer (and a fellow participant in the January 2010 WikiWars conference) who published “The Problem with Wikidata” on The Atlantic’s website this week:

This is a highly significant and hugely important change to the ways that Wikipedia works. Until now, the Wikipedia community has never attempted any sort of consistency across all languages. …

It is important that different communities are able to create and reproduce different truths and worldviews. And while certain truths are universal (Tokyo is described as a capital city in every language version that includes an article about Japan), others are more messy and unclear (e.g. should the population of Israel include occupied and contested territories?).

The reason that Wikidata marks such a significant moment in Wikipedia’s history is the fact that it eliminates some of the scope for culturally contingent representations of places, processes, people, and events. However, even more concerning is that fact that this sort of congealed and structured knowledge is unlikely to reflect the opinions and beliefs of traditionally marginalized groups.

The comments on the article are interesting, with some voices sharing Graham’s concerns, while others argue his concerns are overstated:

While there are exceptions, most of the information (and bias) in Wikipedia articles is contained within the prose and will be unaffected by Wikidata. … It’s quite possible that Wikidata will initially provide a lopsided database with a heavy emphasis on the developed world. But Wikipedia’s increasing focus on globalization and the tremendous potential of the open editing model make it one of the best candidates for mitigating that factor within the Semantic Web.

Wikimedia and Wikipedia’s slant toward the North, the West, and English speakers are well-covered in Wikipedia’s own list of its systemic biases, and Wikidata can’t help but face the same challenges. Meanwhile, another commenter argued:

The sky is falling! Or not, take your pick. Other commenters have made more informed posts than this, but does Wikidata’s existence force Wikipedia to use it? Probably not. … But if Wikidata has a graph of the Israel boundary–even multiple graphs–I suppose that the various Wikipedia authors could use one, or several, or none and make their own…which might get edited by someone else.

Under the canny (partial) title of “Who Will Be Mostly Right … ?” on the blog Data Liberate, Richard Wallis writes:

I share some of [Graham’s] concerns, but also draw comfort from some of the things Denny said in Berlin – “WikiData will not define the truth, it will collect the references to the data…. WikiData created articles on a topic will point to the relevant Wikipedia articles in all languages.” They obviously intend to capture facts described in different languages, the question is will they also preserve the local differences in assertion. In a world where we still can not totally agree on the height of our tallest mountain, we must be able to take account of and report differences of opinion.

Evidence that those behind Wikidata have anticipated a response similar to Graham’s can be found on the blog Too Big to Know where technologist David Weinberger shared a snippet of an IRC chat with he had with a Wikimedian:

[11:29] hi. I’m very interested in wikidata and am trying to write a brief blog post, and have a n00b question.
[11:29] go ahead!
[11:30] When there’s disagreement about a fact, will there be a discussion page where the differences can be worked through in public?
[11:30] two-fold answer
[11:30] 1. there will be a discussion page, yes
[11:31] 2. every fact can always have references accompanying it. so it is not about “does berlin really have 3.5 mio people” but about “does source X say that berlin has 3.5 mio people”
[11:31] wikidata is not about truth
[11:31] but about referenceable facts

The compiled phrase “Wikidata is not about truth, but about referenceable facts” is an intentional echo of Wikipedia’s oft-debated but longstanding allegiance to “verifiability, not truth”. Unsurprisingly, this familiar debate is playing itself out around Wikidata already.

Thanks for research assistance to Morgan Wehling.