William Beutler on Wikipedia

Posts Tagged ‘Wikidata’

The Top 10 Wikipedia Stories of 2017

Tagged as , , , , , , , , , , , , , , , , , , ,
on January 3, 2018 at 6:16 pm

Every year since 2010, The Wikipedian has delivered a roundup of the most interesting events, trends, situations, occasions, and general goings-on that marked the foregoing year on Wikipedia and in the broader Wikimedia community. Last year’s edition remarked upon the head-spinning series of events that made 2016 the “worst year ever”—or so we thought at the time—and now, looking ahead to 2018, we have a stronger sense that the most realistic expectation is more of the same.

Where does Wikipedia fit into that? Following the U.S. presidential election, it became briefly fashionable to see Wikipedia as a bulwark against “fake news”, but in a year where the new American president suffered vanishingly few consequences for his constant issuance of falsehoods, 2017 very much felt like a year when truth was under constant attack. These ten stories depict a Wikipedia editorial community and readership not necessarily in the midst of a crisis, but of life during informational wartime. Let’s go:

10. In the Wikimedia Year 2030…

Wikimedia 2030, photo by Avery JensenLast year’s list was dominated by a metastasizing organizational breakdown culminating in a change of leadership at the Wikimedia Foundation (WMF). Among many complaints about the non-profit’s former executive director, two of the most important were vision and communication, which is to say their lack. Katherine Maher, WMF’s current chief, seems determined not to let the same be said of her. In August 2017, a little over a year into her tenure, she announced an initiative called “Wikimedia 2030”, starting with a high-minded re-articulation of the Wikimedia movement’s mission statement and a series of commitments to (paraphrasing from the document itself) advancing the world through knowledge. It’s obviously operating on a very long time frame, and a lot depends on its implementation, which is yet to come. But the document received overwhelming support by community members in October, which is at least a positive sign in this otherwise fractured age.

9. The Daily Mail and Governance

Daily Mail clock, photo by Alex Muller / WikideaWikipedia’s quality is highly dependent on the sources it allows to verify its information. In February Wikipedia’s community decided it was fed up with the website of UK tabloid The Daily Mail for its mendacious unreliability, and so “voted” to “ban” its use. This apparent decision was widely reported, including by this blog. And yet, that’s not quite what happened. Rather than an official blacklisting, the Daily Mail was simply added to a list of potentially unreliable sources, and it’s possible to find instances of the website being used as a reference since, perhaps by contributors entirely unaware there was a controversy in the first place. This is how Wikipedia works: it has very few rules that cannot be overcome by editorial clout, determined obstinacy, continued evasion, or blithe disregard. On the whole, Wikipedia works pretty well, but breaks down at the edges: and that is still where the Daily Mail remains.

8. “Monkey Selfie” Reckoning

First, a mea culpa: as far as I can tell, The Wikipedian has never written a word about the Monkey selfie copyright dispute, as Wikipedia’s own article on the subject calls it.

Monkey selfie by David SlaterWikipedia played only a small role in the legal case, which primarily involved nature photographer David Slater being sued by the People for the Ethical Treatment of Animals on behalf of a Celebes crested macaque who had no idea any of this was taking place. The legal matter isn’t quite settled, but as of September it seems close: Slater keeps the copyright, with concessions. Yet Wikipedia played a much larger role in the sense that there may never have been a case at all, or it would have remained quite obscure, had the WMF not refused to abide by Slater’s request to delete the photo from Wikimedia Commons. By virtue of its high profile, Wikipedia magnifies everything.

What’s more, the enthusiasm of its community also obscures: I remember the photo being everywhere at Wikimania 2014 in London and, being charmed like everyone else, I played along and used it in a slide presentation without looking into it further. I’m more regretful of this than my own non-coverage, and consider it still unresolved whether WMF is on the side of virtue in this matter. (Why am I using the photo here, then? For the same reason Wikipedia uses copyrighted logos: for identification.)

It seems indisputable to me that the copyright should belong with the human who went to considerable lengths at personal cost to facilitate its creation, regardless of which bipedal mammal clicked the button, and if the law is unclear on this, then the law should be clarified. If you haven’t listened to This American Life’s episode about the case from November, it’s worth your time—and Wikipedia doesn’t come across terribly well.

7. Burger King’s Way

Burger KingRemember this? In April, Burger King announced a television ad for the U.S. and UK markets featuring dialogue intended to activate Google Home and read out Wikipedia’s entry for the Whopper. Almost immediately, The Verge noticed that Burger King’s ad team had surreptitiously edited the Whopper entry from Wikipedia’s typical dispassionate summary “…signature hamburger product sold by the international fast-food restaurant chain…” to unambiguous marketing-speak “…flame-grilled patty made with 100 percent beef with no preservatives or fillers…” Then, predictably, unidentified randos joined in and hijacked the entry to disparage the mass-market burger, producing head-scratching headlines like this one from BBC: “Burger King advert sabotaged on Wikipedia”.

Although Burger King was probably unaware of Wikipedia’s policy “Wikipedia is not a soapbox or means of promotion” and practically guaranteed ignorant of the guideline “Do not disrupt Wikipedia to illustrate a point” that should hardly matter; Burger King knew what it was doing, and figured the ensuing coverage was worth the cost. They were probably right. But I can’t not play the schoolmarm, and tsk-tsk: it’s one thing for a high-school student to vandalize Wikipedia for fun, but quite another for a multinational corporation.

6. Wikipedia Vandalism is Fun for All

Last year’s version of this column decried the phenomenon of lazy sports-bloggers leaning on blink-and-you-missed-it vandalism of sports-related Wikipedia articles for amusement and clicks, and this continued unabated throughout 2017. Most of these stories came from minor sports websites and local news teams, but just as Wikipedia’s prominence owes to its high Google search ranking, so too are these time-wasters afforded visibility by Google News. But this year, we got something else: ostensibly serious news publications marveling over a pattern of self-aware edits coming from U.S. congressional computers.

US CSince 2014, the automated Twitter account @CongressEdits has tracked and reposted every edit made from House and Senate offices; in October, BuzzFeed and CNN both noticed that someone on the Hill was editing articles from Carly Rae Jepsen to Chuck E. Cheese, and on subjects as ubiquitous as Star Wars to obscure as indie band The Mountain Goats. In December, a college student and former congressional aide claimed credit in The Daily Beast, which led to other former interns and anonymous persons crying out for recognition as well. Whether for the lulz, or as part of “the resistance”, these edits at least proved that curiosity about Wikipedia’s willful vulnerability to nonsense appeals to journalists and readers who should probably be focused on something else.

5. Signpost of the Times

WikipediaSignpostIcon.svgA year ago, this list bemoaned the decline of Wikipedia criticism, largely based on the departure of critical thinkers (or at least decent writers) from forums such as Wikipediocracy. This year, I find myself concerned with Wikipedia’s own community news source, The Signpost. A bi-weekly online “newspaper”, The Signpost has been around since 2005, written and edited by volunteers much as Wikipedia itself is. In early 2016 a new editor-in-chief took the reins, led with an ambitious and hopeful editor’s note, produced three issues by the end of February, and then simply stopped.

The editor, a longtime community veteran and onetime WMF staffer, in fact ceased editing Wikipedia almost entirely. I thought about investigating it at the time, but figured I already knew the basics: burnout is a natural occurrence and all but inevitable, although it’s less typical for a project leader to step away without so much as a “gone fishin'” sign. By June, a skeleton crew of former contributors had banded together to put out an edition on at least a once-per-month basis, with a new permanent editor named as of September. Here’s hoping they can return the Signpost to its former schedule and retain its high quality.

In the meantime, I’ll say again what I’ve said many times before: The Signpost is hard work and is a crucial service for the core Wikipedia community; its health is in some ways a measure of the health of the community itself. Its editorship should be a stipended position, funded by but free from oversight of the Wikimedia Foundation. Wikipedia does not depend upon volunteer developers, nor should it depend on volunteer reporters.

4. Everipedia Stalking

What’s Everipedia? Oh, it’s just the latest upstart challenging Wikipedia, this time an actual startup: a rival wiki-based online encyclopedia launched in 2014 by a couple of UCLA students, which later attracted investment from excommunicated Rap Genius co-founder Mahbod Moghadam, and in December also the involvement of expatriate Wikipedia co-founder Larry Sanger.

195px-L_SangerEveripedia is certainly audacious, calling itself the world’s biggest encyclopedia (for having exported all of Wikipedia’s entries and then adding more Wikipedia wouldn’t accept) and it projects a certain braggadocio not typically found in online knowledge repositories (at one time, its founders liked to call it “Thug Wikipedia”). It’s also not Sanger’s first attempt at a do-over, having left Wikipedia citing philosophical differences early on; his decidedly more staid Citizendium effort is itself now more than 10 years old, but with only a handful of active editors, is all but a dead project.

The most interesting thing about Everipedia, though, is its pivot to using blockchain technology and announced development of a cyrptocurrency with which to pay contributors. I’m curious to be sure, but even more sure of my skepticism. No question, Wikipedia is built on a relatively ancient software framework, and there is a case to be made that blockchain’s public ledger could represent an advancement in recording all “transactions”. But this is what Harvard’s Clayton Christensen would call a “sustaining innovation”, not a “disruptive innovation”—there’s no reason Wikipedia couldn’t adopt a blockchain ledger should the idea prove meritorious, meanwhile there’s very little chance that Everipedia can replace the day-to-day deliberations of an editorial community more than 15 years old. Culture is impossible to replicate, and extremely difficult to develop. I can’t promise an assortment of brogrammers and Wikipedia’s kooky uncle won’t pull it off, but I have my doubts.

3. Hey, Big Spenders

Wikimedia_Foundation_financial_development_multilanguage.svgWikipedia’s fundraising prowess, ever-growing expenses, and nevertheless-expanding bank account are a matter of interest year in and year out. From about $56,000 in the bank at the end of the 2004 fiscal year to more than $90 million by 2016, Wikipedia’s financial situation is still growing in a way that’s entirely divorced from the number of volunteers actively participating. In February, a 12-year veteran editor published an alarming (or alarmist) op-ed at the then-functioning Signpost with the unfortunate headline “Wikipedia Has Cancer”.

The controversial connotation (which I realize I’ve also made in #10) was very much intended: Wikipedia’s financial position has far exceeded what is necessary for the running of this non-profit, volunteer-driven project. What happens if (and presumably when) revenues slow—will the Wikimedia Foundation adjust spending downward, or start taking on debt? Pointing to recent failures in WMF software development initiatives as a reason to worry about Wikipedia’s leadership, the op-ed called for a spending freeze and greater transparency in financial matters. With some fiscal discipline, and Wikipedia’s newly-established endowment, Wikipedia could live comfortably off its prior fundraising indefinitely. Although the rhetoric was probably excessive, it struck a nerve, attracting an overwhelming number of comments in a discussion that continued for months. Soon after, an article in Quartz called the resulting frenzy “nuts”, and published a chart comparing Wikipedia favorably to similar institutions, including the New York Public Library and even the British Museum.

2. Slow Wiki Movement

Given the lack of high-impact news events surrounding Wikipedia, here is a new one: nothing really happened this year. That’s probably good news, but it doesn’t make for an exciting story. And for an avowed non-story, it’s relatively high-positioned as well. But as I contemplated the mood around Wikipedia over the past twelve months, I found it rather fitting.

320px-Wikidatacon_ux_participatorydesignworkshop_11Two items that just missed the cut: the WMF’s 2015 lawsuit against the NSA, dismissed by one court, was reinstated by another, and this could well be a standalone entry next year. And Wikipedia’s open database, Wikidata, continued to develop and grow, but all of this happened behind the scenes, without any single inflection point (though attendees of the first-ever Wikidatacon are free to disagree with me).

Meanwhile, Wikipedia’s edit wars and paid editing scuffles continued, but few made actual news. Trolls, especially of the GamerGate variety, continued to be a nuisance, but (for now) are not an existential threat. Wikipedia’s gender imbalance barely registered a blip, Wikipedia’s editorship numbers again ticked upward, and Wikimania Montreal went off without a hitch. Other topics this year-end report card series has discussed before were also ho-hum: no major sock puppet networks detected, no major article-creation milestones (we’re just over halfway to 6 million), the detente between Wikipedia and education continues, and the Visual Editor continues to work even as most veterans ignore it. Yes, Turkey blocked Wikipedia, but following China and Russia having done so in previous years, it hardly made a dent.

This is what maturity looks like: Wikipedia is Wikipedia, and seems likely to continue doing what it does for a long time to come. So, does it feel like we’re celebrating?

1. WikiTribune’s Rocky Start

wikitribuneIn keeping with the somnolence of the previous item, this year’s top story isn’t even about Wikipedia: it’s about WikiTribune, the other new initiative from Wikipedia’s other co-founder, Jimmy Wales. Announced to great fanfare and no little skepticism in April, Wales’ long-dreamed wiki-based online news site finally launched at the end of October. Early reviews were not enthusiastic, and it has been little remarked-upon since. As of this writing, it has continued publishing a few stories a day, none with any apparent impact. WikiTribune offers little more than what other news operations are doing, and less of it.

In May, this blog offered advice about how it might stand out in a crowded online world: by focusing on developing news teams at the local level, and trial-run innovations that might be ported back Wikipedia. But WikiTribune seems determined to cover international news with no discernible viewpoint or special access, and has no connection to Wikipedia besides its name and famous founder. Why would anyone visit WikiTribune for news over any other publication? I have no idea. Alas, WikiTribune looks like just another much-heralded effort to reinvent news by doing the exact same thing that other news publications were already struggling to keep doing in seemingly impossible circumstances. Whether WikiTribune survives to see the end of 2018, or makes this list a year from now, I have no idea either.

Photo credits, in order: Avery Jensen; Alex Muller / Wikidea; David Slater; Restaurant Brands International; Public domain; Kjoonlee; Larry Sanger; Sameboat; Jan Dittrich; WikiTribune.

What You Missed at Wikimania 2017

Tagged as , , , , , , , , , , , , , , , , , , , , , , , ,
on August 18, 2017 at 4:39 pm

N.B. At the end of this post I’ve embedded a Spotify playlist for the delightful 2006 album “Trompe-l’oeil” by the Francophone Montreal indie rock band Malajube. It’s what I was listening to as I arrived at Montréal–Pierre Elliott Trudeau International Airport last week, and I think it would make a nice soundtrack for reading this post.

♦     ♦     ♦

Wikimania 2017, the thirteenth annual global meeting of Wikipedia editors and the larger Wikimedia movement, was held in Montreal last weekend. For the fifth time overall, and the first time in two years, I was there. I’ve covered previously attended Wikimanias, sometimes glancingly, and sometimes day-by-day, and this time I’ll do something a little different as well.

One nice thing about a conference for a project focused on the internet: many of the presentations can be found on the internet! Some but not all were recorded and streamed; some but not all have slides available to revisit. The second half of this post is a roundup of presentations I attended, or wished I attended, with media available so you can follow up at your own pace.

But first, a note on a major theme of the conference: implicitly if not specifically called “Wikimedia 2030”, and a draft of a “strategic direction” document circulating by stapled printout from the conference start, later addressed specifically in a presentation by Wikimedia Foundation executive director Katherine Maher and board chair Christophe Henner. It’s available to read here, and I recommend it as a straightforward and clearly-described (if detail-deficient) summary of how Wikimedians understand their project, and where its most dedicated members want to take it.

Draft strategic direction at Wikimania 2017As one would expect, the memo acknowledges the many types of contributors and contributions, brought together by a belief in the power of freely shared knowledge, and a committment to helping organize it. It also focuses on developing infrastructure, building relationships, and strengthening networks. One thing it doesn’t talk much about is Wikipedia, which might be surprising to some. After all, Wikipedia is arguably more important to the movement than the iPhone is to Apple: Wikipedia receives 97.5% of all WMF site traffic, while the iPhone accounts for “only” 70% of Apple’s revenues.

I don’t wish to belabor the Apple analogy much, because there are too many divergences to be useful in a global analysis, but both were revolutionary within their markets, upset competitors, created a whole new participatory ecosystem in their wake, and each grew exponentially until they didn’t. Now the stewards of each are looking beyond the cash cow for new areas of growth. For Apple, it’s cloud-based Services revenue. For the WMF, it’s not quite as easily summarized. But the answer is also partly about building in the cloud, at least figuratively. Although both Wikipedia and the iPhone will remain the most publicly visible manifestations of each organization for the foreseeable future, the leadership of each is focused on what other services they enable, and how they can even make the core product more valuable.

I see two main themes in the memo, about how the Wikimedia movement can better develop that broad ecosystem beyond Wikimedia’s existing base, and how it can improve its underlying systems within movement technology and governance. The former is too big a subject to grapple with here, and I’ll share just a single thought about the latter.

One thing the document concerns itself with at least as much as with Wikipedia is “data structures”—and this nods to Wikidata, which has been the new hotness for awhile, but whose centrality to the larger project is becoming clearer all the time. Take just one easily overlooked line, about how most Wikimedia content is “long-text, unstructured articles”. You know, those lo-fi Wikipedia entries that remain so enduringly popular. They lack structure now, but they might not always. Imagine a future where Wikidata provides information not just to infoboxes (although that is a tricky subject) but also to boring old Wikipedia itself. Forget “red links”: every plain text noun in the whole project may be connected to its “Q number”. Using AI and machine learning, entire concepts can be quickly linked in a way that once required many lifetimes.

At present, Wikipedia is the closest thing we have to the “sum of all human knowledge” but in the future, it may only be the default user interface. Now more than ever, the real action is happening behind the scenes.

♦     ♦     ♦

Birth of Bias: implicit bias’ permanence on Wikipedia

Wikipedia is a project by and for human beings, and necessarily carries the implicit biases of those human beings, whether they’re mindful of the fact or not. This presentation, offered by San Francisco State visiting scholar Jackie Koerner, focused on how to recognize this and think about what to do about it. Slides are accessible by clicking on the image below, and notes from the presentation are here.

Koerner Implicit Bias Wikimania 2017

♦     ♦     ♦

Readership metrics: Trends and stories from our global traffic data

How much do people around the world look at Wikipedia? How much do they look at it on desktop vs. mobile device? How have things changed over time? All of this and more is found in this presentation from Tilman Bayer, accessible by clicking through the image below.

Readership metrics. Trends and stories from our global traffic data (Wikimania 2017 presentation)

♦     ♦     ♦

The Internet Archive and Wikimedia – Common Knowledge Goals

The Internet Archive is not a Wikimedia project, but it is a fellow nonprofit with a similar outlook, complementary mission and, over time, increasing synergy between the two institutions. Every serious Wikimedian should know about the Internet Archive. I didn’t attend the presentation by Wendy Hanamura and Mark Graham, but there’s a lot to be gleaned from the slides embedded below, and session notes here.

♦     ♦     ♦

State of Video in the Wikimedia Movement

You don’t watch a lot of video on Wikipedia, do you? It’s not for lack of interest on the part of Wikipedians. It’s for lack of media availability under appropriate licenses, technology and infrastructure to deliver it, and even community agreement about what kinds of videos would help Wikipedia’s mission. It’s an issue Andrew Lih has focused on for several years, and his slides are highly readable on the subject.

♦     ♦     ♦

The Keilana Effect: Visualizing the closing coverage gaps with ORES

As covered in this blog’s roundup of 2016’s biggest Wikipedia stories, one of Wikipedia’s more recent mini-celebrities is a twentysomething medical student named Emily Temple-Wood, who goes by the nom-de-wiki Keilana. Her response to each experienced instance of gender-based harassment on the internet was to create a new biographical article about another woman scientist on Wikipedia. But it’s not just an inspiring story greenlit by countless news editors in the last couple years: WikiProject Women Scientists, founded by Temple-Wood and Rosie Stephenson-Goodknight, dramatically transformed the number and quality of articles within this subject area, taking them from a slight lag relative to the average article to dramatically outpacing them. Aaron Halfaker, a research scientist at the Wikimedia Foundation, crunched the numbers using the new-ish machine learning article quality evaluation tool ORES. Halfaker presented his findings, with Temple-Wood onstage to add context, on Wikimania’s final day. More than just a victory lap, the question they asked: can it be done again? Only Wikipedia’s contributors can answer that question.

The slides can be accessed by clicking through the image below, notes taken live can be found here, and for the academically inclined, you can also read Halfaker’s research paper: Interpolating Quality Dynamics in Wikipedia and Demonstrating the Keilana Effect.

Keilana Effect (Wikimania 2017)

That was fun! Let’s do this again next year.

Update: Looking for more slides and notes? There’s an “All Session Notes” page on the Wikimania site for your edification.

♦     ♦     ♦

The Top 10 Wikipedia Stories of 2015

Tagged as , , , , , , , , , , , , , , , , , , , , ,
on December 22, 2015 at 3:28 pm

Each year since 2010, The Wikipedian has looked back at the year on Wikipedia and taken a stab at determining which trends, milestones, and controversies most influenced the direction of Wikipedia in the twelve months preceding.

This is no easy task, considering the millions of articles, edits, and editors within the scope of Wikipedia and its sister projects, not to mention the off-wiki and even offline circumstances affecting them. The most important events may be overlooked, acknowledged major events can be misunderstood, and the significance of each can differ greatly depending on one’s viewpoint. No matter, The Wikipedian will make its best effort regardless.

This time around I’m pairing our retrospective with a post on the blog of my firm, Beutler Ink, called “Ten Predictions for Wikipedia in 2016”. I recommend reading this one first: as we learn from the Bard, what’s past is prologue.

♦     ♦     ♦

10. Wikidata Rising

When Wikidata, the collaborative structured database project, first launched in 2012, it was difficult to summarize with any confidence. The Wikipedian covered it by carefully outlining its stated goals and quoting the speculative news and blog coverage. At the end of 2015, it’s not much easier to describe to a layperson, and many of its goals remain just that, but Wikidata’s growth is undeniable and the passion it inspires in the Wikipedia community is unmistakable. At this year’s Wikimania conference, Wikidata’s presence was felt like never before.

Screenshot 2015-12-22 10.39.33One big reason: Wikidata is unexplored territory in a way that Wikipedia no longer is. The encyclopedia project feels mature at 5 million articles (more about that below), but the database at only 15 million items has a long road ahead of it. For editors who joined the larger Wikimedia movement for the joy of discovery, Wikidata is where it’s at. The project still has some very real challenges, some of which unsurprisingly mirror those of Wikipedia, but it’s possible now to imagine that Wikidata, not Wikipedia, may prove to be the real “sum of all human knowledge”.

9. Exodus from New Montgomery Street

Has Wikipedia’s parent organization, the Wikimedia Foundation (WMF), seen a year with more comings and goings from its headquarters on San Francisco’s New Montgomery Street than 2015? It seems unlikely. The organization has seen admired veterans and high-level executives depart under different circumstances, and some touted recruits from Silicon Valley firms arrived to fanfare, only to exit quickly, and without comment. The only reason this exodus of talent isn’t higher on this list is because it’s one of 2015’s least-reported stories.

Approximately 18 months since Lila Tretikov became executive director, the WMF has experienced almost 100% turnover. For some longtime staff, it was probably time to move on anyway. And any incoming leader can be expected to make new hires and rearrange reports to their liking. But the very short tenures of some key hires, and mysterious circumstances surrounding some departures, can’t help but raise questions about whether Tretikov is in command of her personnel—and perhaps even if she’s the leader Wikipedia needs.

8. Community Tensions Felt in Trustee Elections

The Wikimedia Board of Trustees is the “ultimate corporate authority” of the Wikimedia Foundation, and its number includes three members elected from the volunteer community. The most recent election, held in May, was also the first since a major fight between the foundation and community over software implementation (Media Viewer) and platform control (Superprotect) in 2014. Against this backdrop, disagreements over Wikipedia’s next big software initiative, Flow, became increasingly increasingly pronounced—and a few months later, the project was shelved.

Perhaps it’s unfair to assume a direct cause-and-effect, but the result seemed to be a “throw the bums out” election. Ousted were Phoebe Ayers, Samuel Klein, and María Sefidari (in fairness, none were “bums”, nor particularly responsible for the problem). In are three respected veterans with the good fortune of non-incumbency: James Heilman, Dariusz Jemielniak, and Denny Vrandečić.

Oddly, the two women ousted received the first and third most votes in favor, but Wikimedia accounts for “oppose” votes, and they had too many of those. Today, just two Board members are women, the lowest representation in Wikipedia’s history.

7. “Wikipedia Hates Women”—or Maybe Just Lightbreather

Wikipedia’s alarmingly low female participation rate is decidedly not a new problem. The issue first came to attention in the late 2000s, as editor surveys confirmed suspicions that Wikipedia was a total brodown. Today, the gender gap remains a frequent topic of debate, including a much-discussed Cracked.com article whence this entry takes part of its name.

The other half of the title comes from what’s called the “Lightbreather” case, focusing on a female editor with this username, and her interactions with, among others, a (male) editor named Eric Corbett. A disinterested appraisal of the case would find plenty of fault with both, although there is not one person in the world who possesses the powers of concentration necessary to follow all of the rabbit holes leading from this single case. Notwithstanding the particulars, it became the subject of a provocative, error-ridden, five-times corrected but nevertheless widely read article in The Atlantic, held up as one example of Wikipedia’s “hostility” to women.

The myriad possible explanations for this problem only open doors to more complicated issues. How much of the gender balance can be attributed to Wikipedia’s rules? Its community? Where is the line between heated disagreements and harassment? How much can be explained by how the web influences behavior? How much is this reflective of the tech industry’s gender gap? Will understanding this question help to explain why other marginalized identities, from Latinos to Africans, contribute to Wikipedia in small numbers? The answers to these questions seem within the reach of comprehension, but beyond the grasp of consensus.

6. A Clockwork Orangemoody

OrangeMoody-BubbleGraphCombined-NolabelsAnother perennial topic on Wikipedia is conflict of interest (COI), usually playing out as someone inside Wikipedia or outside writing a self-serving autobiography, a low-rent marketing firm getting in trouble for editing clients’ pages, or sometimes more favorably, a group of PR firms coming together to try to make a good impression. This year, however, brought us something we never quite imagined: a massive extortion plot inverting the typical model of paid editing: rather than helping paying customers create Wikipedia entries, non-paying “customers” could simply be threatened with unflattering articles.

Orangemoody, as it was named for its “ringleader” account, was called the largest of its kind, but that merely counted the number of involved user accounts (nearly 400). The truth is, there has never been anything quite like it. Previous cases revolved around unscrupulous firms like Wiki-PR and WikiExperts who at least professed to be offering their clients a service. Orangemoody was a shakedown involving pages held for ransom, impersonation of Wikipedia administrators, and no real-world entity to absorb the blame. Orangemoody is so threatening because it suggests that Wikipedia’s open-editing model opens the door not just to unethical, if conceivable shenanigans, but also to transgressions that are much more horrifying.

5. The Luck of Grant Shapps

Next to Orangemoody, there’s something almost comforting about the familiar narrative of alleged self-interested editing of Wikipedia by Tory MP Grant Shapps and the plot twist that brought his accuser to (relative) ignominy and ruin.

Amid the UK parliamentary elections this spring, a report emerged in the left-leaning Guardian, prompted by an allegation by a Wikimedia UK administrator, that Shapps had used a pseudonymous account to massage his own Wikipedia profile while giving a drubbing to others. It seemed plausible: Shapps had admitted to editing his own biography years ago, and using assumed names in other circumstances, and his side career as an Internet executive aided the narrative.

But the tables soon turned: the right-leaning Telegraph revealed that there was no smoking gun connecting Shapps to the suspicious edits, that the Wikipedia administrator, Richard Symonds, was in fact a Lib Dem activist who had communicated with the Guardian prior to taking action, and Wikipedians soon became concerned that Symonds may have abused his administrative privileges in blocking the suspicious account.

In the end, Symonds lost his adminship, and Shapps exited a succession of positions within the Conservative Party and government. All that’s missing is Keyzer Soze shrugging off his limp and lighting a cigarette.

4. Wikipedia’s Big Picture Trends in Flux

editors-risingAfter a long period of sustained narratives about Wikipedia’s traffic and editing trends, this year things got a little interesting. Following unabated growth in global traffic to Wikipedia, given a boost in recent years by the proliferation of web-enabled mobile devices, overall traffic actually fell for the first time. Meanwhile, after almost a decade of resignation to Wikipedia’s ever-dwindling editor base—a decline perhaps also attributable to the adoption of mobile devices—the numbers ticked upward.

An August report from an SEO analysis firm showed that Wikipedia’s search referrals from Google fell by up to 20% since the beginning of the year. Most speculation focused on Google’s ever-advancing practice of answering search queries on the results page, obviating the need to click through to non-Google websites. This has bedeviled companies like Yelp, which compete with Google to serve up reviews while also depending upon it for traffic. For Wikipedia, the situation is more complicated, and perhaps less of an issue. After all, a significant portion of Google’s answers are powered by Wikimedia projects. In fact, beginning in late 2014, Google wound down its own open knowledge database, Freebase, in favor of Wikidata. And Google still recommends more Wikimedia sites than it recommends Google sites.

Also in August, the first hard data emerged to show that the long, slow decline of active (and “very active”) Wikipedia editors had been arrested—and is now trending the other way, if ever so slightly. As close Wikipedia observers know too well, Wikipedia attained its zenith participation rate in 2007, arguably the high point for the project’s activity and excitement overall, after which the lowering tide revealed consternation and even alarm, with nobody knowing where it would end. Well, maybe here? The number of very active editors—with at least 100 edits monthly—Wikipedia’s most valuable contributors, stabilized in 2014 and actually grew in 2015. The decline of administrators, coupled with the difficulty in admitting new ones in recent years, however, remains an issue.

In both cases, more data is surely needed before we can say what it really means.

3. English Wikipedia Hits 5 Million Articles

Wikipedia_5m_ArticlesAdmittedly, most of these top stories are unhappy ones, and the one just above is arguably mixed, but this one is unambiguously celebratory: on November 1, Wikipedia’s English language edition—by far its most popular, and synonymous with “Wikipedia” for most readers—notched its 5 millionth article.

Wikipedia has been the largest encyclopedia by any reasonable measure for a long while, so nothing has really changed. And it took seven years for Wikipedia to double in size, so if growth trends continue holding steady for now, we might not have a similar milestone to celebrate until sometime the next decade. Meanwhile, sheer heft is easier to measure than other important characteristics, like accuracy or completeness, so this benchmark will remain Wikipedia’s equivalent of McDonald’s “Billions Served” for the foreseeable future. It may be an arbitrary measurement, but it’s a damned impressive one.

Number 5,000,000 itself: Persoonia terminalis, a rare shrub native to eastern Australia. Oh, and if you haven’t seen the RfC debating which temporary logo Wikipedia should display on the joyous day, I very much recommend taking a look at the near misses. Perhaps it will instill some faith in Wikipedia’s community processes if you agree the best logo won (and you should).

2. It’s About Ethics in Gamergate Opposition

In late 2014 and into the start of this year, the loosely-affiliated right-wing counterpart to the left-ish Anonymous expanded its focus from video game journalists to include the Wikipedia entries where said journalists’ critical takes had accumulated. Organizing on Reddit and other forums, the ‘gaters created numerous throwaway Wikipedia accounts to first try swinging Wikipedia’s coverage of their movement and a few of their top targets around to their liking and, when that failed, they took on Wikipedia editors directly.

gamergatelogoWikipedians fought back hard—too hard, in some cases—and when Wikipedia’s Arbitration Committee got around to handing out punishments, the only ones with anything to lose were the Wikipedia editors who cared. It also fed into the above-discussed ongoing trouble over Wikipedia’s treatment of gender issues, and was by far the year’s biggest blow-up along such lines, far greater than the argument over how to handle Caitlyn Jenner’s gender transition, which still lay ahead.

It’s hard to say if Gamergate is a 100-year-flood (although on the Internet, the time frame may be more like 100 months) or a sign of things to come. Wikipedia has faced trolls before, but few have been as dedicated or as destructive as the ones beneath the Gamergate bridge. The best defense is a strong base of committed Wikipedians, and perhaps this year shows us they’ll probably still be around to carry the sand bags and shore up the levees.

1. China, Russia, and Completing the HTTPS Transition

One aspect of Wikipedia’s global prominence that the foundation and movement alike have struggled to fully grasp is the role it can, should, and does play on the international stage. This year, the Wikimedia Foundation joined forces with the ACLU to sue the National Security Agency over its mass surveillance practices, only for the case to be thrown out by a federal court. As important as that fight may be, it is but one jurisdiction of many where Wikipedia has become a proxy for privacy and free speech battles, not to mention authoritarian power grabs.

In 2015, Wikipedia’s multi-year plan to convert all traffic moving through Wikimedia servers to the HTTPS encryption protocol was finally completed. HTTPS was first enabled for WMF sites in 2011, then became the default for logged in users in 2013, and this year was finally made the default for all traffic, including readers without a Wikipedia account. This is a good thing for Internet users who wish to access Wikipedia without their governments knowing about it. But it’s complicated when governments decide to shut off access altogether.

Indeed, the full implementation of HTTPS prevents governments like China from blocking access to specific entries—such as Tiananmen Square protests of 1989—and instead they have to choose between allowing all traffic, or blocking the site entirely. China opted for the latter. To be sure, Wikipedia wasn’t the biggest collaborative online encyclopedia in the PRC—it wasn’t even the second—and China’s Communist Party seems to be perfectly TankMancontent promoting its homegrown versions of Google, Facebook and Twitter. In December, Wikipedia’s famous co-founder, Jimmy Wales, traveled to China to participate in an Internet conference, where his comments about the limitations of the state’s ability to control the Internet were intentionally lost in translation, as the Wall Street Journal reports.

A similar issue is ongoing in Russia, where the government’s media authority, Roskomnadzor, has weighed blocking access to the Russian-language Wikipedia based on its entries about illegal drugs, temporarily blocking reader access. In addition, it may also be attempting to co-opt Russian-language editors, presenting further challenges to the independence of the Wikimedia project among Russian language contributors.

It’s unclear what Russia will decide to do, but it seems safe to assume that China will hold the line for the foreseeable future. In both countries, and under still more repressive regimes—like Kazakhstan and Azerbaijan—independent websites and even independent political parties and religious movements are allowed to operate only at these governments’ discretion. Why should Wikipedia be any different?

♦     ♦     ♦

And this seems like a perfectly good place to leave it. More often than not, Wikipedia’s issues reflect issues that animate and plague society and the Internet writ large. Open knowledge and digital discourse create incredible opportunities for research and innovation, but also bestow tremendous power to the platforms and communities that effectively control the gates. The problems on Wikipedia aren’t that different from those on Reddit or Twitter, they just feel more significant given the site’s mandate and perceived authority. To understand Wikipedia’s successes and failures, we have to look to ourselves for the answer.

If you liked this post, don’t forget to check out its companion piece at The Ink Tank: “Ten Predictions for Wikipedia in 2016”.

All images via Wikimedia Commons except Gamergate logo, source unknown.

The Agony and Ecstasy of Wikidata

Tagged as , , , , , , , , ,
on April 12, 2012 at 8:31 am

Although Wikipedia is by far the best-known of the Wikimedia collaborative projects, it is just one of many. Just this last week, Wikimedia Deutschland announced its latest contribution: Wikidata (also @Wikidata, and see this interview in the Wikipedia Signpost). Still under development, its temporary homepage announces:

Wikidata aims to create a free knowledge base about the world that can be read and edited by humans and machines alike. It will provide data in all the languages of the Wikimedia projects, and allow for the central access to data in a similar vein as Wikimedia Commons does for multimedia files. Wikidata is proposed as a new Wikimedia hosted and maintained project.

Possible Wikidata logo

One of a few Wikidata logos under consideration.

Upon its announcement, I tweeted my initial impression, that it sounded like Wikipedia’s answer to Wolfram Alpha, the commercial “answer engine” created by Stephen Wolfram in 2009. It seems to partly be that but also more, and its apparent ambition—not to mention the speculation surrounding it—is causing a stir.

Already touted by TechCrunch as “Wikipedia’s next big thing” (incorrectly identifying Wikipedia as its primary driver, I pedantically note), Wikidata will create a central database for the countless numbers, statistics and figures currently found in Wikipedia’s articles. The centralized collection of data will allow for quick updates and uniformity of statistical information across Wikipedia.

Currently when new information replaces old, as is the case with census surveys, elections results and quarterly reports are published, Wikipedians must manually update the old data in all the articles in which it appears, across every language. Wikidata would create the possibility for a quick computer led update to replace all out of date information. Additionally, it is expected that Wikidata will allow visitors to search and access information in a less labor-intensive method. As TechCrunch suggests:

Wikidata will also enable users to ask different types of questions, like which of the world’s ten largest cities have a female mayor?, for example. Queries like this are today answered by user-created Wikipedia Lists – that is, manually created structured answers. Wikidata, on the hand, will be able to create these lists automatically.

Though this project—which is funded by the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, and Google—is expected to take about a year to develop, but the blogosphere is already buzzing.

It’s probably fair to say that the overall response has been very positive. In a long post summarizing Wikidata’s aims, Yahoo! Labs researcher Nicolas Torzec identifies himself as one who excitedly awaits the changes Wikidata promises:

By providing and integrating Wikipedia with one common source of structured data that anyone can edit and use, Wikidata should enable higher consistency and quality within Wikipedia articles, increase the availability of information in and across Wikipedias, and decrease the maintenance effort for the editors working on Wikipedia. At the same time, it will also enable new types of Wikipedia pages and applications, including dynamically-generated timelines, maps, and charts; automatically-generated lists and aggregates; semantic search; light question & answering; etc. And because all these data will be available as Open Data in a machine-readable form, they will also benefit thrid-party [sic] knowledge-based projects at large Web companies such as Google, Bing, Facebook and Yahoo!, as well as at smaller Web startups…

Asked for comment by CNet, Andrew Lih, author of The Wikipedia Revolution, called it a “logical progression” for Wikipedia, even as he worries that Wikidata will drive away Wikipedians who are less tech-savvy, as it complicates the way in which information is recorded.

Also cautious is SEO blogger Pat Marcello, who warns that human error is still a very real possibility. She writes:

Wikidata is going to be just like Wikipedia in that it will be UGC (user-generated content) in many instances. So, how reliable will it be? I mean, when I write something — anything from a blog post to a book, I want the data I use in that work to be 100% accurate. I fear that just as with Wikipedia, the information you get may not be 100%, and with the volume of data they plan to include, there’s no way to vette [sic] all of the information.

Fair enough, but of course the upside is that corrections can be easily made. If one already uses Wikipedia, this tradeoff is very familiar.

The most critical voice so far is Mark Graham, an English geographer (and a fellow participant in the January 2010 WikiWars conference) who published “The Problem with Wikidata” on The Atlantic’s website this week:

This is a highly significant and hugely important change to the ways that Wikipedia works. Until now, the Wikipedia community has never attempted any sort of consistency across all languages. …

It is important that different communities are able to create and reproduce different truths and worldviews. And while certain truths are universal (Tokyo is described as a capital city in every language version that includes an article about Japan), others are more messy and unclear (e.g. should the population of Israel include occupied and contested territories?).

The reason that Wikidata marks such a significant moment in Wikipedia’s history is the fact that it eliminates some of the scope for culturally contingent representations of places, processes, people, and events. However, even more concerning is that fact that this sort of congealed and structured knowledge is unlikely to reflect the opinions and beliefs of traditionally marginalized groups.

The comments on the article are interesting, with some voices sharing Graham’s concerns, while others argue his concerns are overstated:

While there are exceptions, most of the information (and bias) in Wikipedia articles is contained within the prose and will be unaffected by Wikidata. … It’s quite possible that Wikidata will initially provide a lopsided database with a heavy emphasis on the developed world. But Wikipedia’s increasing focus on globalization and the tremendous potential of the open editing model make it one of the best candidates for mitigating that factor within the Semantic Web.

Wikimedia and Wikipedia’s slant toward the North, the West, and English speakers are well-covered in Wikipedia’s own list of its systemic biases, and Wikidata can’t help but face the same challenges. Meanwhile, another commenter argued:

The sky is falling! Or not, take your pick. Other commenters have made more informed posts than this, but does Wikidata’s existence force Wikipedia to use it? Probably not. … But if Wikidata has a graph of the Israel boundary–even multiple graphs–I suppose that the various Wikipedia authors could use one, or several, or none and make their own…which might get edited by someone else.

Under the canny (partial) title of “Who Will Be Mostly Right … ?” on the blog Data Liberate, Richard Wallis writes:

I share some of [Graham’s] concerns, but also draw comfort from some of the things Denny said in Berlin – “WikiData will not define the truth, it will collect the references to the data…. WikiData created articles on a topic will point to the relevant Wikipedia articles in all languages.” They obviously intend to capture facts described in different languages, the question is will they also preserve the local differences in assertion. In a world where we still can not totally agree on the height of our tallest mountain, we must be able to take account of and report differences of opinion.

Evidence that those behind Wikidata have anticipated a response similar to Graham’s can be found on the blog Too Big to Know where technologist David Weinberger shared a snippet of an IRC chat with he had with a Wikimedian:

[11:29] hi. I’m very interested in wikidata and am trying to write a brief blog post, and have a n00b question.
[11:29] go ahead!
[11:30] When there’s disagreement about a fact, will there be a discussion page where the differences can be worked through in public?
[11:30] two-fold answer
[11:30] 1. there will be a discussion page, yes
[11:31] 2. every fact can always have references accompanying it. so it is not about “does berlin really have 3.5 mio people” but about “does source X say that berlin has 3.5 mio people”
[11:31] wikidata is not about truth
[11:31] but about referenceable facts

The compiled phrase “Wikidata is not about truth, but about referenceable facts” is an intentional echo of Wikipedia’s oft-debated but longstanding allegiance to “verifiability, not truth”. Unsurprisingly, this familiar debate is playing itself out around Wikidata already.

Thanks for research assistance to Morgan Wehling.