William Beutler on Wikipedia

Posts Tagged ‘Wikidata’

The Top 10 Wikipedia Stories of 2020

Tagged as , , , , , , , , , , , , , , , , , , , ,
on December 31, 2020 at 1:46 pm

It’s no overstatement to say that 2020 was a year where everything changed. Since March, ubiquitous semi-ironic references to the “Before Times” have served to euphemize the unfathomable. To date, COVID-19 has killed nearly two million people worldwide, reshaped the global economy, galvanized worldwide protests, and impacted politics, business and culture for years to come—including in ways we can’t yet see. 2020 gets all the hate now, but can we be so certain that the coming year will be meaningfully different?

2020 was also a time of change for Wikipedia, though these shifts occurred almost entirely below the surface: unless you’re an active participant in the Wikimedia movement, much of this list will come as news to you. This was a year where ambitious new projects were announced, small-scale tweaks took on larger significance, the relationship between human editors and the software supporting them became more fraught, differences in vision between the community and professional corners of Wikipedia emerged or were reinforced, and the future of the movement simultaneously became both clearer and more contentious.

Every year since 2010, The Wikipedian has offered its summary of the top ten Wikipedia stories—events, themes, and trends—of the previous year. In this installment we’ll do the same again, but with a little something extra. On Wednesday, December 30, I joined a recording of the Wikipedia Weekly YouTube livestream to discuss the big issues of the year that was. This list is informed by the “top ten” discussed on this show, although it is not identical. I hope you’ll read through my list, and then watch or listen to the discussion, which complements the topics covered below.

♦     ♦     ♦

10. Wikipedia approaches its 20th anniversary

Countless retrospective pieces will surely be published in the coming weeks to commemorate the 20th anniversary of Wikipedia, which I am certain you do not need to look up to know was founded on January 15, 2001. That milestone has loomed large over the past year, lending additional significance to milestones and benchmarks recently passed.

Wikipedia’s 6 millionth article, maybe?

In January, Wikipedia hit 6 million articles in the English language, its largest and most widely-read edition. No one knows precisely which article was the true number 6,000,000, but the nod was given to Rosie Stephenson-Goodknight, co-founder of the Women in Red project, for her article about a Canadian schoolteacher and temperance movement leader. 

In February, Wired published a story calling Wikipedia “the last best place on the internet”, using the site as a counterpoint to the neverending dumpster fire of today’s World Wide Web—the last refuge of the promise of the “open web” which has long since given way to the mundanity of knowledge workers never being offline, every day facing another onslaught of disinformation and unpleasantry. By the end of the year, BuzzFeed offered a different way of saying pretty much the same thing: “The Top 40 Most Read Wikipedia Pages Of 2020 Perfectly Capture The Hellscape That Was 2020”.

Meanwhile, Wikipedia’s impressive stature was affirmed yet again when Twitter announced it was considering using Wikipedia as a benchmark for which user accounts would be bestowed with the simultaneously coveted and scorned “blue checkmark”. It was likewise affirmed in a more serious way when the World Health Organization announced it would be licensing its information for use on Wikipedia.

All in all, not a bad way to mark two decades, right? Well, you should see what else happened.

9. Should Wikipedia fear a Section 230 repeal?

If the phrase “Section 230” doesn’t mean much to you, then you probably don’t spend much time following the United States Congress… or on Twitter. Section 230 is the portion of the 1996 Communications Decency Act that protects providers of internet platforms, such as Google, Facebook, Twitter and, of course, Wikipedia, from being sued for content posted by users. Section 230 specifically allows these websites to moderate content—or not—as it sees fit. The internet as we know it today could not exist without it.

But in the last few years, 230 has come under increasing scrutiny, especially for websites alleged to permit sex trafficking (Craigslist), or terroristic threats (8chan), or disinformation (too many to count, but Facebook especially). What’s more, right-wing politicians and conspiracy theorists in the U.S. have viewed it as shielding the tech giants which they believe (or at least claim to believe) are censoring them. Meanwhile, “the internet as we know it today” is no longer seen as the frontier of possibility it was as recently as 2015. In the last week of December 2020, Senate Majority Leader Mitch McConnell tied a vote on the latest covid stimulus package to 230 repeal, a poison pill designed to derail modifications sought by Democrats (and of course Republicans’ own outgoing president). 

Although I hesitate to make any predictions about the world we live in now, full repeal seems exceedingly unlikely. But maybe I’m only saying that because the internet after 230 is impossible to imagine—it would spell headaches at best and doom at worst for the entire Web 2.0 ecosystem (including Wikipedia) and the tech giants who rely upon it. So while it’s probably not going to happen, it’s still worth worrying about.

8. Creating Theresa Greenfield’s Wikipedia article

November already feels like it was years ago, but barely two months ago a news story involving Wikipedia captured the attention of American political media for about 24 hours: why Theresa Greenfield, the Democratic nominee opposing Iowa senator Joni Ernst, did not have a Wikipedia article. It goes without saying that Wikipedia is a widely-read source of information by voters, so it seemed notable that Iowans (and the reporters covering one of the country’s most hotly contested racers) couldn’t even look her up on Wikipedia.

The reason owes to a perfect storm of three applicable circumstances: 1) Greenfield was not a well-known figure prior to capturing the Senate nomination, 2) Wikipedia doesn’t have a rule granting “Notability” to major party nominees, but 3) it does have a rule against creating articles about individuals known for just one event—in this case, the Senate race. This surprised me, because for years I had been under the impression that there was a rule automatically guaranteeing an entry for major party nominees, the same way there is for professional athletes.

As tends to happen in such cases, debate ensued and Greenfield was eventually granted a Wikipedia entry. Given how much news the race had generated, the article quickly grew to a level of detail that made the earlier obstinacy seem ridiculous. And then on November 3, she lost.

7. Scots Wikipedia and the trouble with small Wikipedias

Perhaps the actual biggest story involving Wikipedia this year, at least in terms of headlines generated, was the “fun” and “lighthearted” discovery that the Scots Wikipedia was basically a complete sham. For those whose only experience with Scots is thumbing through an Irvine Welsh novel sometime after seeing Trainspotting in the mid-1990s, Scots is either a language of its own or a heavy dialect of English spoken by the Scottish peoples. This blog last mentioned it in 2014 when Scotland voted on a referendum to leave the United Kingdom (lolsob emoji goes here) and it is one of the smaller language editions of Wikipedia.

If it’s not Scottish, it’s crap!

Well… in August a Reddit user realized that roughly a third of its 60,000-odd articles had been written by a single user, who turned out to be an American teenager with scant knowledge of proper Scots grammar or terminology. In other words, by a kid using a bad Scottish accent. The story was too good to pass up for almost any outlet that considers itself remotely “online”, and they all had a good laugh

A month after the Scots Wikipedia controversy, it emerged that a significant majority of the articles on the Wikipedia edition written in Malagasy—the national language of Madagascar—had been written by a bot translating articles from other editions. And most of them rather badly. And the Malagasy Wikipedia is far from the only Wikipedia edition to be mostly written by bots—a Vice report in February pointed out that the Cebuano edition was largely written without human editors, albeit apparently with more success.

But bots are not the only challenge. In a different example, the Portuguese Wikipedia—containing more than one million entries with just shy of 1400 active editors—decided to ban IP accounts from making edits, because the vast majority of vandalism on the site came from these unregistered editors. According to the Wikipedia Signpost, vandalism went down, and new account creation increased. This is unlikely to be adopted on the largest editions, but it’s worth watching to see if other small language communities decide to follow suit.

5. Anticipation and apprehensions about Abstract Wikipedia

Wikipedia is as human-created a project as exists in the world, but its future increasingly looks to be dominated by computers, programs, and algorithms. Look no further than the newly announced project called Abstract Wikipedia, and its sister project WikiFunctions, which plans to do much the same as the bots on small Wikipedias, but at a much larger scale and with greater ingenuity. 

First announced in a Signpost editorial in April, and approved unanimously by the WMF board just three months later, Abstract Wikipedia aims to create Wikipedia articles independent of any one language, combining structured data and “functions” related to information within them, to make it feasible for machine translation to effectively translate articles from one language to another. It sounds so ambitious as to be reckless, but its pedigree couldn’t be better—creator Denny Vrandečić is a former WMF board member, former Googler, and the creator of another pie-in-the-sky project that has become wildly successful: Wikidata.

Father of Wikidata, and now Abstract Wikipedia

As Vrandečić pointed out, of all topics that exist across Wikipedia, only a third of them have articles in English. Further: “only about half of articles in the German Wikipedia have a counterpart on the English Wikipedia … There are huge amounts of knowledge out there that are not accessible to readers who can read only one or two languages.”

If Abstract Wikipedia succeeds, it points toward a future where Wikipedia is controlled less by those who can merely write articles, and more by those who can write code. Exciting as the project may be, anxieties exist, too. Will Abstract Wikipedia dictate the content of articles, or merely inform them? Local control matters a lot to Wikipedians and, as we’ll see in the next few sections, WMF bigfooting is of increasing concern to some community members.

But it’s also easy to see why it appeals to many Wikimedians: much like Wikidata and very much unlike Wikipedia, it’s greenfield, unencumbered by the old habits of the arguably hidebound, conservative editorial base that both keeps Wikipedia running while also preventing it from growing beyond its original vision. The building of Abstract Wikipedia is set to begin in 2022, and it’s expected to start integrating with Wikipedia itself in 2023.

5. WMF Board makes some suspicious moves

In the spring, as the far-reaching implications of the coronavirus pandemic became clearer, the Wikimedia Board of Trustees announced that it would postpone its tri-annual board elections, and the three trustees whose terms were set to expire would stay on for another year. At the time, it was seen as a regrettable if understandable concession to the dire circumstances, even for an organization that can operate exclusively online in many other ways.

But then in October, the Board unveiled a considerable overhaul to the committee’s bylaws, with eyebrow-raising changes to the terms of, well, board elections. Certain board seats were no longer described as “community-selected” but “community-sourced”, and the words “majority” and “voting” were removed. A number of community members raised concerns that it could spell the end of community-elected board members, thereby increasing the stratification between the “professional” and “community” parts of Wikipedia. WMF general counsel Amanda Keton conceded that the community had “found a bug” in the proposal, and promised they would address them in a revision that is still yet to come.

Compounding matters, the timeline set for the change was considered too short, while Board members expressed different opinions about how far along in the process the proposals really were. Furthermore, apt questions were raised about the wisdom of sweeping changes when the board had three members who, in normal times, wouldn’t even be there. Perhaps it was merely an oversight, but it certainly exacerbated tensions that already existed.

4. Wikimedia debates Jimmy Wales’ permanent board seat

But that wasn’t the only discordant note involving Board governance this year. Shortly after the new bylaws were proposed, prominent Wikimedian Liam Wyatt suggested another change: discontinuing Wikipedia co-founder Jimmy Wales’ permanent “Community Founder Trustee Position”—in short, eliminating his board seat after nearly 20 years. As Wyatt put it, “Now that the WMF is a mature organisation, I do not believe it is appropriate any longer for a single individual to have an infinitely-renewable and non-transferrable position on the board.”

Jimmy Wales, man of the people—really!

Wales himself replied in short order, expressing a not intractable opposition to the idea at some point, but arguing that the reason it should not happen now is because of the self-same tensions ongoing. As Wales put it, it is actually he who represents the community among the professional set. And in fact, Wales’ positions on the board have been largely pro-community, including expressed opposition to curtailing community voter supervision of the board.

And while it seemed a “modest proposal” in its initial offering, the idea was soon hotly debated, with community members taking it very seriously and arguing the pros and cons. Mike Godwin, former WMF general counsel, even took to the Wikipedia Weekly Facebook group to argue for Wales as the connective tissue back to Wikipedia’s original purpose, concluding: “in my view, he shouldn’t be kicked out of the traditional position before he’s ready to go.”

The debate never really focused on Wales’ leadership, but rather the wisdom of having such a position in the first place, and it doesn’t seem likely to be taken much further for now. In a year where many statues around the world fell, it seems like the Wikimedia community decided it should at least consider whether to topple one of its own.

3. Covering COVID-19 and the George Floyd protests

It feels sort of wrong to put COVID-19 and the George Floyd protests into just one list item, but they are very much of a piece, and together they highlight what Wikipedia’s community is better at than any other editorial body: documenting far-reaching global happenings. The old saying about journalism being the “first draft of history” made sense when it was first expressed, but now that role clearly belongs to Wikipedia.

This blog covered both efforts when they first arose, in the early part and middle of the year, respectively, with posts more thoroughly researched than imaginatively titled: “How Wikipedia is Covering the Coronavirus Pandemic” and “How Wikipedia Has Responded to the George Floyd Protests”. Both subjects gave rise to dozens, if not hundreds, of new articles apiece, and several were among the most-read Wikipedia pages all year long. Quartz recently assembled a calendar depicting the most-read articles for each day of the year, and the month of June is dominated by relevant topics, including Killing of George Floyd, Juneteenth, and Edward Colston.

George Floyd protest in Brooklyn

The George Floyd protests also created opportunities for organizing around social justice issues, which have been close to the hearts of many Wikimedia affiliate groups for a long time. A virtual Juneteenth edit-a-thon was well-attended, WikiProject Black Lives Matter took shape, and the AfroCrowd initiative built a following.

To this day, the main page of the English Wikipedia retains an information box in its top right corner directing readers to critical information about the pandemic.

Activism on Wikipedia is a tricky thing: as the Neutral point of view policy spells out clearly, articles should not advocate for a particular perspective on the topics covered. But which articles Wikipedians choose to edit shows a lot about what they think is most important.

2. Effects of the global pandemic on the Wikimedia movement

How much could Wikipedia be affected by a global pandemic, anyway? Everything it does is about putting information on the internet, while the lockdowns and restrictions most affected those who couldn’t simply move online, such as restaurants and the travel industry.

In the first place, its professional class realized how much it actually depends on travel. Although all the editing necessarily happens online, in every other year dozens of regional and global meetings take place. The Wikimedia Summit, formerly known as the Wikimedia Conference and scheduled for April, was the first to be canceled. It didn’t take long for the main annual event, Wikimania, to be “postponed” from its August date in Bangkok, Thailand as well. Rumor has it that Wikimania 2021 will not happen either.

Some events, with more time to prepare, moved online: Wikiconference North America went ahead with a scaled-down virtual program in mid-December. And Wikipedia’s community has long made use of online tools from the esoteric like IRC and Etherpad to the commonplace like Zoom and Google Hangouts. A new wikiproject even sprang up to catalog the various online-only events, and to offer advice to those wanting to host their own. But virtual conferences are a split proposition: the lack of obligation to appear in-person made it easier for some to participate remotely, while removing a lot of the reason to show up in the first place for others.

I’ll add one more possible effect of the pandemic, and I suggest this very delicately: COVID-19 might have actually been a good thing for Wikipedia. As The Signpost noted this summer, editing activity on Wikipedia surged to levels not previously seen in a decade. As they explained: “Recent years seem to have stabilised at a million edits every six to six and a half days, so the lockdown period with its editing levels of a million edits every five days is a significant increase.” 

Some people learned to make sourdough. Others, presumably, learned to edit Wikipedia.

1. The Wikipedia Foundation?

Chances are, you have never heard of the biggest controversy to envelop Wikipedia in 2020. The dispute, which began in January, boiled over in June, and remains as yet unresolved, centered on the obvious desire of the Wikimedia Foundation (WMF) to change its name to the “Wikipedia Foundation” despite the clear majority of active Wikimedians who oppose the idea. 

The case in favor of doing so is simple: everyone and their grandmother knows what Wikipedia is, but almost no one outside of the movement knows what Wikimedia means. Wikipedia’s ubiquity has overshadowed other important projects funded by the WMF. By rechristening the entire endeavor “Wikipedia” and doing away with the confusing split branding of “Wikimedia”, it would unify the whole project behind the one word everyone knows.

I still remember when the WMF logo was in color

But the arguments against were simple, too, and passionate: rather than drawing attention to other projects, it would obscure their independent status and achievements. Further, the proposed change was initiated without sufficient feedback or consideration for the branding of the movement’s many organized chapters and user groups. Procedurally, it was inexplicably separated from the rest of the long-gestating Wikimedia 2030 Movement Strategy that it clearly belonged to, and rushed to the proposal stage at a time when the conferences and meetings where this would normally be debated had been called off due to the pandemic. What’s more, the proposal drew the harshest rebuke from those very groups who work most closely with the WMF—a rare intra-wiki dispute not between Wikipedia’s professionals and volunteers, but within the professional class itself.

The sequence of events was damning, too: In June, the WMF opened up a survey asking the community to weigh in on what Wikipedia should call itself. The survey was heavily weighted toward the conclusion that “Wikipedia Foundation” was the way to go, even though a Request for Comment earlier in the year ran 9 to 1 against it. Yet the WMF decided that its “informed oppose” was less than 1%, based on an invented number of “~9,000” community members whom they claimed had a chance to fill out the survey, though far fewer actually submitted responses. Soon after, an open letter organized by the affiliate groups received nearly 1,000 signatories calling on the WMF to “pause renaming activities … due to process shortcomings”. 

And so it was shelved, but only until March 2021. Whether the WMF will go ahead and become the WPF (I guess) remains to be seen, but this blog for one finds it unlikely. Interestingly enough, it also shows the limits of even these change-oriented groups’ interest in changing how they think of themselves and the movement they’ve dedicated their lives and careers to. The WMF would do well to put this aside and accept this as just one of the many contradictions that Wikipedia has managed to succeed in spite of over nearly two decades. As the old joke among longtime editors goes: “Wikipedia doesn’t work in theory, only in practice.” That’s as true here as it is anywhere.

For threatening the goodwill of its closest allies, for creating a headache where none need exist, and for being an own goal of massive proportions, the controversy around the renaming of the Wikimedia Foundation is easily the #1 Wikipedia story of 2020. 

♦     ♦     ♦

And now, if you still can’t get enough Wikipedia year-in-review content, I present to you the Wikipedia Weekly episode featuring Richard Knipel, Vera de Kok, Netha Hussain, Jan Ainali, Andrew Lih, and yours truly. Enjoy, and see you in 2021!

Image credits, top top bottom: Public domain, Sodacan, Victor Grigas, Zachary McCune, Rhododendrites, Wikimedia Foundation

The Top Ten Wikipedia Stories of 2018

Tagged as , , , , , , , , , , , , , , , , , , , , , , , , , , ,
on December 28, 2018 at 4:17 pm

Were you exhausted by 2018? If not, then The Wikipedian doesn’t know what year you just lived in. The continued crises in Western democracies, ongoing wars in the Middle East, embrace of authoritarianism around the world, and the inexorable, seemingly unstoppable transition to a world where data comes before people—all served up for consumption on your internet device of choice as quickly as you can pull to refresh—have changed what “normal” means. Where 2016 was once half-jokingly called the “worst year ever” only for 2017 to replicate the experience, by 2018 it’s become apparent that we may never end up reverting to the previous mean. Indeed, this is just how things are now. Mean.

But is Wikipedia different? Whether because it’s a decentralized, international effort or simply not one dependent upon advertising or unstable business models, the wide world of wiki has often this year felt disconnected from the madness it ostensibly documents. Yet, if we look closely, we can see where the real world has seeped in. In this blog post, for the ninth year in a row, The Wikipedian will present a summary of ten events, trends, phenomena, and people that marked the year in Wikimedia.

Shall we?

10. Is that all she wrote for WikiTribune?

It was a questionable decision on The Wikipedian’s part to make last year’s number one story the rocky start for WikiTribune, the collaborative internet news site from Wikipedia founder Jimmy Wales. It isn’t an official Wikimedia project, it has no financial relationship with the Wikimedia Foundation (WMF), and Wales’ involvement with Wikipedia is arguably at an all-time low. But he had announced the concept in a Wikimania speech five years ago, and it certainly got a lot of attention when it launched. Well, it also got some attention when it laid off its entire staff this fall, having burned through its funding without otherwise making a dent in the broader media ecosystem. This was entirely foreseeable, as the idea always involved a leap of faith (but so did Wikipedia!) and Wales’ post-Wikipedia projects have mostly failed to thrive. Will we see WikiTribune mentioned again next year? It’s already fallen nine positions, so I wouldn’t count on it—or even that it’s still around by then.

9. Testing new models of collaboration

It is no minor understatement to say that Wikipedia has gone very far with its laissez-faire model of knowledge production: like Douglas Adams’ eponymous Hitchhiker’s Guide, the content is written by those who have happened across it, spotted something they could fix, and miraculously actually done so. Yet Wikipedia’s content gaps and systemic biases are well observed, and it should take nothing away from the prior accomplishment to believe that more concerted efforts may be necessary for Wikipedia to take another step forward. For several years now the Wiki Education Foundation has been trying out different models, and this year they may have had a breakthrough with their Wikipedia Fellows pilot program, inviting academics from associations in multiple disciplines to try improving Wikipedia. The project has had some early success, though the number of participants were few and achievements relatively limited. Bringing more subject matter expertise to neglected areas of Wikipedia is still a daunting task that may not scale, but these experiments show promise and warrant further study.

8. Getting serious about systemic biases

Wikipedia and its associated nonprofits have been tackling similar problems in other ways: this year was the first occurrence of the Decolonizing the Internet conference, held concurrently with this year’s Wikimania in Cape Town, South Africa. Spearheaded by another independent group called Whose Knowledge?, the event brought together multiple strands of discussion and voices typically underrepresented on Wikipedia. Whereas Wikipedia has historically been the province of white males from North America and Western Europe, the conference’s participation was more than two-thirds non-male, from the Global South, and more than three quarters non-white. Actual outcome? Lots of discussion, a published report outlining agreement on issues to address (not always easy in sometimes fractured, identitarian spaces) and the creation of working groups to tackle specific issues. Whether this effort will have any measurable impact on a recognizable time frame is still an unknown, as the report acknowledges, but formalizing such efforts outside the WMF is nevertheless a major development.

7. “Free” Wikipedia goes offline

OK, one more in this vein: the Wikimedia Foundation’s efforts to bring Wikipedia (and yes, the other projects as well) to the far corners of the world without always-on wifi has unsurprisingly faced many challenges. Since 2012, the leading effort has been Wikipedia Zero, a program seeking telecom firms in developing regions to “zero-rate” Wikipedia, which means accessing it using their services would be exempt from the normal fee. It’s controversial in some quarters as it is often perceived to conflict in spirit, if not in law, with the principle of net neutrality. (Similar programs are also controversial in parts of the Global South: for example, in 2016 India rejected Facebook’s similar Free Basics program.) Although the WMF estimates it has reached more than 800 million people in more than 70 countries, the criticism never subsided and there was no corner to be turned, so in 2018 the program was shuttered.

So how will would-be Wikipedians in Ghana, Sri Lanka, Kosovo and elsewhere reach Wikipedia now? One would-be contender is the independent Internet-in-a-Box initiative, which seeks to put a copy of Wikipedia (and other digital libraries) on a low-cost computer (currently a Raspberry Pi) and distribute it the old-fashioned way. While it doesn’t come with any of the scary global data questions of Wikipedia Zero, because now we are again talking about atoms as well as bits, the old problems of distribution and scalability threaten to keep it a niche project. The tradeoffs are stark, and a sign of the times.

6. Attrition of administrators

It’s been a couple of years since we last worried openly about the decline in the total number of Wikipedia editors, largely because the erosion has been arrested. (These days Wikipedians worry about different charts going not down, but going up too much.) But topline figures only tell part of the story, and when it’s the power users who have the most impact on Wikipedia’s day-to-day governance, it’s troubling to note that Wikipedia contributors approved just ten new administrators—trusted editors who step in to lock pages and block accounts when needed—on eighteen nominations, the lowest number in either category in Wikipedia’s history. Yes, there’s even a down-and-to-the-right chart to describe it, and while it’s clear this trend has been developing for awhile—The Atlantic covered it in 2012 (!)—in 2018 all of the relevant figures approached, or breached, single digits for the first time (speaking of “Wikipedia zero”…). While Wikipedia still has more than 500 active administrators, there was a net loss for the year and no sign that will turn around. As attrition advances, will Wikipedia decide to lighten up, loosen requirements, or learn to live with fewer admins?

5. Save the links!

There are two widely held and mutually exclusive ways to think about the durability of content on the internet: nothing is forgotten, and everything is ephemeral. On Wikipedia, both are true: Wikipedia exists to record knowledge for posterity and every edit to every page is saved for all time, yet once something disappears from Wikipedia’s pages it rarely resurfaces—although it can! And this year, in one sense, it did.

The concept of “link rot” is central to this dilemma: because the internet is made up of links between files (and the World Wide Web specifically between web pages) if one file should disappear, the connection is broken, and so is information. The Internet Archive was established in the mid-1990s—practically the dawn of time, as the internet goes—to combat this problem by actually crawling the web, page by page, and storing all kinds of content long after its original publishers decide they no longer care to. This year a three-year effort in collaboration with Wikipedia delivered on rescuing millions of links to references once used in Wikipedia articles that later disappeared. It’s hard to overstate how important this is: Wikipedia is only as good as its sources, and finally its external sources are as stable as they ever have been—and perhaps can be.

4. I promise we’ll only mention him this once

The Wikimedia movement may be a global one, but considering its flagship Wikipedia edition is in English and its nonprofit foundation based in the United States, in 2018 hardly a week could go by without some intersection between the metastasizing national shitstorm that is the U.S. federal government with the leading source of putatively non-partisan, non-sectarian, non-biased information the world has agreed upon, Wikipedia. Most of the time, this involved harmful edits that require, ahem, administrators to combat effectively. From early in the year when Google amplified an instance of vandalism calling Republicans “Nazis” to efforts to whitewash articles related to the Mueller investigation to seemingly constant attacks on the Donald Trump Wikipedia page (often juvenile in nature, which alas is entirely fitting) and finally multiple issues revolving around the Brett Kavanaugh Supreme Court confirmation hearings. The eyebrow-raising edits to the Devil’s Triangle page were almost quaint; more troubling was the “doxing” of elected officials on Wikipedia, which was then transmitted by CongressEdits (a Twitter account reporting Wikipedia edits from congressional IP addresses) which was then shut down by Twitter for being an unwitting conduit. The account, much celebrated since its 2014 launch, has not returned. Like much else these days, it makes for a tidy symbol of the nice things we can no longer have.

3. Building our own Hal 9000

The Wikipedian is not a very successful computer person and therefore pretty anxious about getting this one wrong, so let’s try to keep this really high-level and see if I don’t royally screw this up: besides Wikipedia, there are related projects like Wikidata (an open source knowledge database) and Wikimedia Commons (a repository of media files, especially images) that provide content for Wikipedia articles and serve as resources for researchers. Both have come a long way in recent years, and they are growing together. This year, structured data came to Wikimedia Commons, meaning the metadata about the files will now be better organized and machine-readable, and therefore more searchable, editable, and useful in ways we haven’t yet defined. Also lexemes came to Wikidata, which you’ll just have to trust me is important, too. Meanwhile, the WMF’s ORES project, which uses machine learning to evaluate the quality of entire articles and individual edits, got more useful—but it’s still most useful to decently successful computer people who know how to do things like install javascript files, and so it’s not quite ready for prime time. Maybe in 2019 some of this will become more comprehensible.

2. Donna Strickland and Jess Wade

Speaking of very successful computer people, in October the Canadian physicist Donna Strickland was awarded a Nobel Prize for her work in chirped pulse amplification. At the time, Wikipedia had no biographical article for her, and very quickly, this became an international incident in itself. Wikipedia’s oversight was covered by The Washington Post, The Guardian, The Independent, Business Insider, Vox, Nature, The National Interest, The Daily Beast, and many more. In fact, it turned out an article about Strickland had been proposed in the months prior, only to be declined by a reviewing editor.

The Wikimedia Foundation, which absorbs every column inch of bad press that Wikipedia gets, was put on its heels, eventually publishing multiple explanatory blog posts about the matter, first by a mere staffer, and later by its executive director, Katherine Maher. What happened is perfectly understandable to anyone familiar with Wikipedia: there was not enough published information about her from independent sources prior to the Nobel committee’s announcement to satisfy Wikipedia’s stringent requirements. This is not unusual, as academics nearly always toil in obscurity. But of course, it’s almost certainly related to institutional sexism, and that while the processes in this instance were followed correctly, the outcome was nevertheless regrettable after the fact. Understandable, yes, but defensible? Perhaps not. And so the line out of the WMF is that yes, Wikipedia has to do better, but so must we all.

Meanwhile, there is another female physicist whose Wikipedia article was successfully created in early 2018: Jess Wade, who happens to be a Wikipedia editor herself. (Hmmm.) And not just any editor, but one who is the creator of hundreds of articles about other female scientists and who has received considerable media attention because of the fact. (It’s not even the first time this has been a story: cf. Emily Temple-Wood, an American medical student and prolific Wikipedian recognized in 2016’s list). Wade’s star began to rise this summer, and while it owed nothing to the Strickland issue—her first big round of U.S. coverage arrived more than two months earlier—it does feel like it may not be remembered that way.

1. YouTube’s bewildering fact-checking announcement

Wikipedia’s relationship to the global tech giants like Google and Facebook it is sometimes compared to is uncomfortable for many reasons: all enjoy audiences and impact of truly staggering scale (not to mention Bay Area headquarters) but Wikipedia’s mission and governance are completely the opposite of its supposed peers. If Wikipedia was a for-profit corporation, it would undoubtedly be a “unicorn”, except it’s a nonprofit and it ever tried to monetize the value of its reach, its community would rebel and the project might collapse entirely. (Which could still happen to some unicorns, actually.)

All of which is backdrop for probably the most jaw-dropping, perplexing, and as-yet-unsettled Wikipedia-related news of the year: an announcement from YouTube CEO Susan Wojcicki, speaking on stage at SXSW in March, that they would combat “fake news” by including links to Wikipedia articles on certain user-generated videos that ventured into conspiracy theory territory. How would this be done? What videos would be flagged? What articles would be linked? Among those asking: the Wikimedia Foundation, which quickly put out a statement saying that Wojcicki had not shared this information with them. And yet, some publications went so far as to call it a “partnership” even though no such relationship existed. But it’s not hard to imagine why they leapt to this conclusion. Following the announcement, you could be forgiven for thinking they just dropped the whole thing. In fact, YouTube did start including Wikipedia-sourced advisories with some videos, at least in some instances. It’s not clear how it has worked in practice because neither YouTube nor Wikipedia ever mentioned it again. Has the internet already forgotten?

Clearly, this was an unforced error on YouTube’s part. But was it also one by the Wikimedia Foundation as well? After all, it was little more than two years ago that the WMF published a blog post declaring Wikipedia a bulwark against the “post-fact world”. While the real shame lies with YouTube and its tendency, however unintended, to radicalize its audience by algorithmic recommendation, it’s another reminder that there remains a significant gap between what Wikipedia says it is, what people believe Wikipedia is, and what Wikipedia really is.

Will that gap narrow in the coming year? We’ll see, but I doubt this trend will fall all the way to number 10 in next year’s list. See you in 2019!

Image credits, in order: WikiTribune via Neiman Lab, Tinaral, Doc James, Hazmat2, RandomUserGuy1738, Gaia Octavia Agrippa, Sikander, Andrew Lih.

The Top 10 Wikipedia Stories of 2017

Tagged as , , , , , , , , , , , , , , , , , , ,
on January 3, 2018 at 6:16 pm

Every year since 2010, The Wikipedian has delivered a roundup of the most interesting events, trends, situations, occasions, and general goings-on that marked the foregoing year on Wikipedia and in the broader Wikimedia community. Last year’s edition remarked upon the head-spinning series of events that made 2016 the “worst year ever”—or so we thought at the time—and now, looking ahead to 2018, we have a stronger sense that the most realistic expectation is more of the same.

Where does Wikipedia fit into that? Following the U.S. presidential election, it became briefly fashionable to see Wikipedia as a bulwark against “fake news”, but in a year where the new American president suffered vanishingly few consequences for his constant issuance of falsehoods, 2017 very much felt like a year when truth was under constant attack. These ten stories depict a Wikipedia editorial community and readership not necessarily in the midst of a crisis, but of life during informational wartime. Let’s go:

10. In the Wikimedia Year 2030…

Wikimedia 2030, photo by Avery JensenLast year’s list was dominated by a metastasizing organizational breakdown culminating in a change of leadership at the Wikimedia Foundation (WMF). Among many complaints about the non-profit’s former executive director, two of the most important were vision and communication, which is to say their lack. Katherine Maher, WMF’s current chief, seems determined not to let the same be said of her. In August 2017, a little over a year into her tenure, she announced an initiative called “Wikimedia 2030”, starting with a high-minded re-articulation of the Wikimedia movement’s mission statement and a series of commitments to (paraphrasing from the document itself) advancing the world through knowledge. It’s obviously operating on a very long time frame, and a lot depends on its implementation, which is yet to come. But the document received overwhelming support by community members in October, which is at least a positive sign in this otherwise fractured age.

9. The Daily Mail and Governance

Daily Mail clock, photo by Alex Muller / WikideaWikipedia’s quality is highly dependent on the sources it allows to verify its information. In February Wikipedia’s community decided it was fed up with the website of UK tabloid The Daily Mail for its mendacious unreliability, and so “voted” to “ban” its use. This apparent decision was widely reported, including by this blog. And yet, that’s not quite what happened. Rather than an official blacklisting, the Daily Mail was simply added to a list of potentially unreliable sources, and it’s possible to find instances of the website being used as a reference since, perhaps by contributors entirely unaware there was a controversy in the first place. This is how Wikipedia works: it has very few rules that cannot be overcome by editorial clout, determined obstinacy, continued evasion, or blithe disregard. On the whole, Wikipedia works pretty well, but breaks down at the edges: and that is still where the Daily Mail remains.

8. “Monkey Selfie” Reckoning

First, a mea culpa: as far as I can tell, The Wikipedian has never written a word about the Monkey selfie copyright dispute, as Wikipedia’s own article on the subject calls it.

Monkey selfie by David SlaterWikipedia played only a small role in the legal case, which primarily involved nature photographer David Slater being sued by the People for the Ethical Treatment of Animals on behalf of a Celebes crested macaque who had no idea any of this was taking place. The legal matter isn’t quite settled, but as of September it seems close: Slater keeps the copyright, with concessions. Yet Wikipedia played a much larger role in the sense that there may never have been a case at all, or it would have remained quite obscure, had the WMF not refused to abide by Slater’s request to delete the photo from Wikimedia Commons. By virtue of its high profile, Wikipedia magnifies everything.

What’s more, the enthusiasm of its community also obscures: I remember the photo being everywhere at Wikimania 2014 in London and, being charmed like everyone else, I played along and used it in a slide presentation without looking into it further. I’m more regretful of this than my own non-coverage, and consider it still unresolved whether WMF is on the side of virtue in this matter. (Why am I using the photo here, then? For the same reason Wikipedia uses copyrighted logos: for identification.)

It seems indisputable to me that the copyright should belong with the human who went to considerable lengths at personal cost to facilitate its creation, regardless of which bipedal mammal clicked the button, and if the law is unclear on this, then the law should be clarified. If you haven’t listened to This American Life’s episode about the case from November, it’s worth your time—and Wikipedia doesn’t come across terribly well.

7. Burger King’s Way

Burger KingRemember this? In April, Burger King announced a television ad for the U.S. and UK markets featuring dialogue intended to activate Google Home and read out Wikipedia’s entry for the Whopper. Almost immediately, The Verge noticed that Burger King’s ad team had surreptitiously edited the Whopper entry from Wikipedia’s typical dispassionate summary “…signature hamburger product sold by the international fast-food restaurant chain…” to unambiguous marketing-speak “…flame-grilled patty made with 100 percent beef with no preservatives or fillers…” Then, predictably, unidentified randos joined in and hijacked the entry to disparage the mass-market burger, producing head-scratching headlines like this one from BBC: “Burger King advert sabotaged on Wikipedia”.

Although Burger King was probably unaware of Wikipedia’s policy “Wikipedia is not a soapbox or means of promotion” and practically guaranteed ignorant of the guideline “Do not disrupt Wikipedia to illustrate a point” that should hardly matter; Burger King knew what it was doing, and figured the ensuing coverage was worth the cost. They were probably right. But I can’t not play the schoolmarm, and tsk-tsk: it’s one thing for a high-school student to vandalize Wikipedia for fun, but quite another for a multinational corporation.

6. Wikipedia Vandalism is Fun for All

Last year’s version of this column decried the phenomenon of lazy sports-bloggers leaning on blink-and-you-missed-it vandalism of sports-related Wikipedia articles for amusement and clicks, and this continued unabated throughout 2017. Most of these stories came from minor sports websites and local news teams, but just as Wikipedia’s prominence owes to its high Google search ranking, so too are these time-wasters afforded visibility by Google News. But this year, we got something else: ostensibly serious news publications marveling over a pattern of self-aware edits coming from U.S. congressional computers.

US CSince 2014, the automated Twitter account @CongressEdits has tracked and reposted every edit made from House and Senate offices; in October, BuzzFeed and CNN both noticed that someone on the Hill was editing articles from Carly Rae Jepsen to Chuck E. Cheese, and on subjects as ubiquitous as Star Wars to obscure as indie band The Mountain Goats. In December, a college student and former congressional aide claimed credit in The Daily Beast, which led to other former interns and anonymous persons crying out for recognition as well. Whether for the lulz, or as part of “the resistance”, these edits at least proved that curiosity about Wikipedia’s willful vulnerability to nonsense appeals to journalists and readers who should probably be focused on something else.

5. Signpost of the Times

WikipediaSignpostIcon.svgA year ago, this list bemoaned the decline of Wikipedia criticism, largely based on the departure of critical thinkers (or at least decent writers) from forums such as Wikipediocracy. This year, I find myself concerned with Wikipedia’s own community news source, The Signpost. A bi-weekly online “newspaper”, The Signpost has been around since 2005, written and edited by volunteers much as Wikipedia itself is. In early 2016 a new editor-in-chief took the reins, led with an ambitious and hopeful editor’s note, produced three issues by the end of February, and then simply stopped.

The editor, a longtime community veteran and onetime WMF staffer, in fact ceased editing Wikipedia almost entirely. I thought about investigating it at the time, but figured I already knew the basics: burnout is a natural occurrence and all but inevitable, although it’s less typical for a project leader to step away without so much as a “gone fishin'” sign. By June, a skeleton crew of former contributors had banded together to put out an edition on at least a once-per-month basis, with a new permanent editor named as of September. Here’s hoping they can return the Signpost to its former schedule and retain its high quality.

In the meantime, I’ll say again what I’ve said many times before: The Signpost is hard work and is a crucial service for the core Wikipedia community; its health is in some ways a measure of the health of the community itself. Its editorship should be a stipended position, funded by but free from oversight of the Wikimedia Foundation. Wikipedia does not depend upon volunteer developers, nor should it depend on volunteer reporters.

4. Everipedia Stalking

What’s Everipedia? Oh, it’s just the latest upstart challenging Wikipedia, this time an actual startup: a rival wiki-based online encyclopedia launched in 2014 by a couple of UCLA students, which later attracted investment from excommunicated Rap Genius co-founder Mahbod Moghadam, and in December also the involvement of expatriate Wikipedia co-founder Larry Sanger.

195px-L_SangerEveripedia is certainly audacious, calling itself the world’s biggest encyclopedia (for having exported all of Wikipedia’s entries and then adding more Wikipedia wouldn’t accept) and it projects a certain braggadocio not typically found in online knowledge repositories (at one time, its founders liked to call it “Thug Wikipedia”). It’s also not Sanger’s first attempt at a do-over, having left Wikipedia citing philosophical differences early on; his decidedly more staid Citizendium effort is itself now more than 10 years old, but with only a handful of active editors, is all but a dead project.

The most interesting thing about Everipedia, though, is its pivot to using blockchain technology and announced development of a cyrptocurrency with which to pay contributors. I’m curious to be sure, but even more sure of my skepticism. No question, Wikipedia is built on a relatively ancient software framework, and there is a case to be made that blockchain’s public ledger could represent an advancement in recording all “transactions”. But this is what Harvard’s Clayton Christensen would call a “sustaining innovation”, not a “disruptive innovation”—there’s no reason Wikipedia couldn’t adopt a blockchain ledger should the idea prove meritorious, meanwhile there’s very little chance that Everipedia can replace the day-to-day deliberations of an editorial community more than 15 years old. Culture is impossible to replicate, and extremely difficult to develop. I can’t promise an assortment of brogrammers and Wikipedia’s kooky uncle won’t pull it off, but I have my doubts.

3. Hey, Big Spenders

Wikimedia_Foundation_financial_development_multilanguage.svgWikipedia’s fundraising prowess, ever-growing expenses, and nevertheless-expanding bank account are a matter of interest year in and year out. From about $56,000 in the bank at the end of the 2004 fiscal year to more than $90 million by 2016, Wikipedia’s financial situation is still growing in a way that’s entirely divorced from the number of volunteers actively participating. In February, a 12-year veteran editor published an alarming (or alarmist) op-ed at the then-functioning Signpost with the unfortunate headline “Wikipedia Has Cancer”.

The controversial connotation (which I realize I’ve also made in #10) was very much intended: Wikipedia’s financial position has far exceeded what is necessary for the running of this non-profit, volunteer-driven project. What happens if (and presumably when) revenues slow—will the Wikimedia Foundation adjust spending downward, or start taking on debt? Pointing to recent failures in WMF software development initiatives as a reason to worry about Wikipedia’s leadership, the op-ed called for a spending freeze and greater transparency in financial matters. With some fiscal discipline, and Wikipedia’s newly-established endowment, Wikipedia could live comfortably off its prior fundraising indefinitely. Although the rhetoric was probably excessive, it struck a nerve, attracting an overwhelming number of comments in a discussion that continued for months. Soon after, an article in Quartz called the resulting frenzy “nuts”, and published a chart comparing Wikipedia favorably to similar institutions, including the New York Public Library and even the British Museum.

2. Slow Wiki Movement

Given the lack of high-impact news events surrounding Wikipedia, here is a new one: nothing really happened this year. That’s probably good news, but it doesn’t make for an exciting story. And for an avowed non-story, it’s relatively high-positioned as well. But as I contemplated the mood around Wikipedia over the past twelve months, I found it rather fitting.

320px-Wikidatacon_ux_participatorydesignworkshop_11Two items that just missed the cut: the WMF’s 2015 lawsuit against the NSA, dismissed by one court, was reinstated by another, and this could well be a standalone entry next year. And Wikipedia’s open database, Wikidata, continued to develop and grow, but all of this happened behind the scenes, without any single inflection point (though attendees of the first-ever Wikidatacon are free to disagree with me).

Meanwhile, Wikipedia’s edit wars and paid editing scuffles continued, but few made actual news. Trolls, especially of the GamerGate variety, continued to be a nuisance, but (for now) are not an existential threat. Wikipedia’s gender imbalance barely registered a blip, Wikipedia’s editorship numbers again ticked upward, and Wikimania Montreal went off without a hitch. Other topics this year-end report card series has discussed before were also ho-hum: no major sock puppet networks detected, no major article-creation milestones (we’re just over halfway to 6 million), the detente between Wikipedia and education continues, and the Visual Editor continues to work even as most veterans ignore it. Yes, Turkey blocked Wikipedia, but following China and Russia having done so in previous years, it hardly made a dent.

This is what maturity looks like: Wikipedia is Wikipedia, and seems likely to continue doing what it does for a long time to come. So, does it feel like we’re celebrating?

1. WikiTribune’s Rocky Start

wikitribuneIn keeping with the somnolence of the previous item, this year’s top story isn’t even about Wikipedia: it’s about WikiTribune, the other new initiative from Wikipedia’s other co-founder, Jimmy Wales. Announced to great fanfare and no little skepticism in April, Wales’ long-dreamed wiki-based online news site finally launched at the end of October. Early reviews were not enthusiastic, and it has been little remarked-upon since. As of this writing, it has continued publishing a few stories a day, none with any apparent impact. WikiTribune offers little more than what other news operations are doing, and less of it.

In May, this blog offered advice about how it might stand out in a crowded online world: by focusing on developing news teams at the local level, and trial-run innovations that might be ported back Wikipedia. But WikiTribune seems determined to cover international news with no discernible viewpoint or special access, and has no connection to Wikipedia besides its name and famous founder. Why would anyone visit WikiTribune for news over any other publication? I have no idea. Alas, WikiTribune looks like just another much-heralded effort to reinvent news by doing the exact same thing that other news publications were already struggling to keep doing in seemingly impossible circumstances. Whether WikiTribune survives to see the end of 2018, or makes this list a year from now, I have no idea either.

Photo credits, in order: Avery Jensen; Alex Muller / Wikidea; David Slater; Restaurant Brands International; Public domain; Kjoonlee; Larry Sanger; Sameboat; Jan Dittrich; WikiTribune.

What You Missed at Wikimania 2017

Tagged as , , , , , , , , , , , , , , , , , , , , , , , ,
on August 18, 2017 at 4:39 pm

N.B. At the end of this post I’ve embedded a Spotify playlist for the delightful 2006 album “Trompe-l’oeil” by the Francophone Montreal indie rock band Malajube. It’s what I was listening to as I arrived at Montréal–Pierre Elliott Trudeau International Airport last week, and I think it would make a nice soundtrack for reading this post.

♦     ♦     ♦

Wikimania 2017, the thirteenth annual global meeting of Wikipedia editors and the larger Wikimedia movement, was held in Montreal last weekend. For the fifth time overall, and the first time in two years, I was there. I’ve covered previously attended Wikimanias, sometimes glancingly, and sometimes day-by-day, and this time I’ll do something a little different as well.

One nice thing about a conference for a project focused on the internet: many of the presentations can be found on the internet! Some but not all were recorded and streamed; some but not all have slides available to revisit. The second half of this post is a roundup of presentations I attended, or wished I attended, with media available so you can follow up at your own pace.

But first, a note on a major theme of the conference: implicitly if not specifically called “Wikimedia 2030”, and a draft of a “strategic direction” document circulating by stapled printout from the conference start, later addressed specifically in a presentation by Wikimedia Foundation executive director Katherine Maher and board chair Christophe Henner. It’s available to read here, and I recommend it as a straightforward and clearly-described (if detail-deficient) summary of how Wikimedians understand their project, and where its most dedicated members want to take it.

Draft strategic direction at Wikimania 2017As one would expect, the memo acknowledges the many types of contributors and contributions, brought together by a belief in the power of freely shared knowledge, and a committment to helping organize it. It also focuses on developing infrastructure, building relationships, and strengthening networks. One thing it doesn’t talk much about is Wikipedia, which might be surprising to some. After all, Wikipedia is arguably more important to the movement than the iPhone is to Apple: Wikipedia receives 97.5% of all WMF site traffic, while the iPhone accounts for “only” 70% of Apple’s revenues.

I don’t wish to belabor the Apple analogy much, because there are too many divergences to be useful in a global analysis, but both were revolutionary within their markets, upset competitors, created a whole new participatory ecosystem in their wake, and each grew exponentially until they didn’t. Now the stewards of each are looking beyond the cash cow for new areas of growth. For Apple, it’s cloud-based Services revenue. For the WMF, it’s not quite as easily summarized. But the answer is also partly about building in the cloud, at least figuratively. Although both Wikipedia and the iPhone will remain the most publicly visible manifestations of each organization for the foreseeable future, the leadership of each is focused on what other services they enable, and how they can even make the core product more valuable.

I see two main themes in the memo, about how the Wikimedia movement can better develop that broad ecosystem beyond Wikimedia’s existing base, and how it can improve its underlying systems within movement technology and governance. The former is too big a subject to grapple with here, and I’ll share just a single thought about the latter.

One thing the document concerns itself with at least as much as with Wikipedia is “data structures”—and this nods to Wikidata, which has been the new hotness for awhile, but whose centrality to the larger project is becoming clearer all the time. Take just one easily overlooked line, about how most Wikimedia content is “long-text, unstructured articles”. You know, those lo-fi Wikipedia entries that remain so enduringly popular. They lack structure now, but they might not always. Imagine a future where Wikidata provides information not just to infoboxes (although that is a tricky subject) but also to boring old Wikipedia itself. Forget “red links”: every plain text noun in the whole project may be connected to its “Q number”. Using AI and machine learning, entire concepts can be quickly linked in a way that once required many lifetimes.

At present, Wikipedia is the closest thing we have to the “sum of all human knowledge” but in the future, it may only be the default user interface. Now more than ever, the real action is happening behind the scenes.

♦     ♦     ♦

Birth of Bias: implicit bias’ permanence on Wikipedia

Wikipedia is a project by and for human beings, and necessarily carries the implicit biases of those human beings, whether they’re mindful of the fact or not. This presentation, offered by San Francisco State visiting scholar Jackie Koerner, focused on how to recognize this and think about what to do about it. Slides are accessible by clicking on the image below, and notes from the presentation are here.

Koerner Implicit Bias Wikimania 2017

♦     ♦     ♦

Readership metrics: Trends and stories from our global traffic data

How much do people around the world look at Wikipedia? How much do they look at it on desktop vs. mobile device? How have things changed over time? All of this and more is found in this presentation from Tilman Bayer, accessible by clicking through the image below.

Readership metrics. Trends and stories from our global traffic data (Wikimania 2017 presentation)

♦     ♦     ♦

The Internet Archive and Wikimedia – Common Knowledge Goals

The Internet Archive is not a Wikimedia project, but it is a fellow nonprofit with a similar outlook, complementary mission and, over time, increasing synergy between the two institutions. Every serious Wikimedian should know about the Internet Archive. I didn’t attend the presentation by Wendy Hanamura and Mark Graham, but there’s a lot to be gleaned from the slides embedded below, and session notes here.

♦     ♦     ♦

State of Video in the Wikimedia Movement

You don’t watch a lot of video on Wikipedia, do you? It’s not for lack of interest on the part of Wikipedians. It’s for lack of media availability under appropriate licenses, technology and infrastructure to deliver it, and even community agreement about what kinds of videos would help Wikipedia’s mission. It’s an issue Andrew Lih has focused on for several years, and his slides are highly readable on the subject.

♦     ♦     ♦

The Keilana Effect: Visualizing the closing coverage gaps with ORES

As covered in this blog’s roundup of 2016’s biggest Wikipedia stories, one of Wikipedia’s more recent mini-celebrities is a twentysomething medical student named Emily Temple-Wood, who goes by the nom-de-wiki Keilana. Her response to each experienced instance of gender-based harassment on the internet was to create a new biographical article about another woman scientist on Wikipedia. But it’s not just an inspiring story greenlit by countless news editors in the last couple years: WikiProject Women Scientists, founded by Temple-Wood and Rosie Stephenson-Goodknight, dramatically transformed the number and quality of articles within this subject area, taking them from a slight lag relative to the average article to dramatically outpacing them. Aaron Halfaker, a research scientist at the Wikimedia Foundation, crunched the numbers using the new-ish machine learning article quality evaluation tool ORES. Halfaker presented his findings, with Temple-Wood onstage to add context, on Wikimania’s final day. More than just a victory lap, the question they asked: can it be done again? Only Wikipedia’s contributors can answer that question.

The slides can be accessed by clicking through the image below, notes taken live can be found here, and for the academically inclined, you can also read Halfaker’s research paper: Interpolating Quality Dynamics in Wikipedia and Demonstrating the Keilana Effect.

Keilana Effect (Wikimania 2017)

That was fun! Let’s do this again next year.

Update: Looking for more slides and notes? There’s an “All Session Notes” page on the Wikimania site for your edification.

♦     ♦     ♦

The Top 10 Wikipedia Stories of 2015

Tagged as , , , , , , , , , , , , , , , , , , , , ,
on December 22, 2015 at 3:28 pm

Each year since 2010, The Wikipedian has looked back at the year on Wikipedia and taken a stab at determining which trends, milestones, and controversies most influenced the direction of Wikipedia in the twelve months preceding.

This is no easy task, considering the millions of articles, edits, and editors within the scope of Wikipedia and its sister projects, not to mention the off-wiki and even offline circumstances affecting them. The most important events may be overlooked, acknowledged major events can be misunderstood, and the significance of each can differ greatly depending on one’s viewpoint. No matter, The Wikipedian will make its best effort regardless.

This time around I’m pairing our retrospective with a post on the blog of my firm, Beutler Ink, called “Ten Predictions for Wikipedia in 2016”. I recommend reading this one first: as we learn from the Bard, what’s past is prologue.

♦     ♦     ♦

10. Wikidata Rising

When Wikidata, the collaborative structured database project, first launched in 2012, it was difficult to summarize with any confidence. The Wikipedian covered it by carefully outlining its stated goals and quoting the speculative news and blog coverage. At the end of 2015, it’s not much easier to describe to a layperson, and many of its goals remain just that, but Wikidata’s growth is undeniable and the passion it inspires in the Wikipedia community is unmistakable. At this year’s Wikimania conference, Wikidata’s presence was felt like never before.

Screenshot 2015-12-22 10.39.33One big reason: Wikidata is unexplored territory in a way that Wikipedia no longer is. The encyclopedia project feels mature at 5 million articles (more about that below), but the database at only 15 million items has a long road ahead of it. For editors who joined the larger Wikimedia movement for the joy of discovery, Wikidata is where it’s at. The project still has some very real challenges, some of which unsurprisingly mirror those of Wikipedia, but it’s possible now to imagine that Wikidata, not Wikipedia, may prove to be the real “sum of all human knowledge”.

9. Exodus from New Montgomery Street

Has Wikipedia’s parent organization, the Wikimedia Foundation (WMF), seen a year with more comings and goings from its headquarters on San Francisco’s New Montgomery Street than 2015? It seems unlikely. The organization has seen admired veterans and high-level executives depart under different circumstances, and some touted recruits from Silicon Valley firms arrived to fanfare, only to exit quickly, and without comment. The only reason this exodus of talent isn’t higher on this list is because it’s one of 2015’s least-reported stories.

Approximately 18 months since Lila Tretikov became executive director, the WMF has experienced almost 100% turnover. For some longtime staff, it was probably time to move on anyway. And any incoming leader can be expected to make new hires and rearrange reports to their liking. But the very short tenures of some key hires, and mysterious circumstances surrounding some departures, can’t help but raise questions about whether Tretikov is in command of her personnel—and perhaps even if she’s the leader Wikipedia needs.

8. Community Tensions Felt in Trustee Elections

The Wikimedia Board of Trustees is the “ultimate corporate authority” of the Wikimedia Foundation, and its number includes three members elected from the volunteer community. The most recent election, held in May, was also the first since a major fight between the foundation and community over software implementation (Media Viewer) and platform control (Superprotect) in 2014. Against this backdrop, disagreements over Wikipedia’s next big software initiative, Flow, became increasingly increasingly pronounced—and a few months later, the project was shelved.

Perhaps it’s unfair to assume a direct cause-and-effect, but the result seemed to be a “throw the bums out” election. Ousted were Phoebe Ayers, Samuel Klein, and María Sefidari (in fairness, none were “bums”, nor particularly responsible for the problem). In are three respected veterans with the good fortune of non-incumbency: James Heilman, Dariusz Jemielniak, and Denny Vrandečić.

Oddly, the two women ousted received the first and third most votes in favor, but Wikimedia accounts for “oppose” votes, and they had too many of those. Today, just two Board members are women, the lowest representation in Wikipedia’s history.

7. “Wikipedia Hates Women”—or Maybe Just Lightbreather

Wikipedia’s alarmingly low female participation rate is decidedly not a new problem. The issue first came to attention in the late 2000s, as editor surveys confirmed suspicions that Wikipedia was a total brodown. Today, the gender gap remains a frequent topic of debate, including a much-discussed Cracked.com article whence this entry takes part of its name.

The other half of the title comes from what’s called the “Lightbreather” case, focusing on a female editor with this username, and her interactions with, among others, a (male) editor named Eric Corbett. A disinterested appraisal of the case would find plenty of fault with both, although there is not one person in the world who possesses the powers of concentration necessary to follow all of the rabbit holes leading from this single case. Notwithstanding the particulars, it became the subject of a provocative, error-ridden, five-times corrected but nevertheless widely read article in The Atlantic, held up as one example of Wikipedia’s “hostility” to women.

The myriad possible explanations for this problem only open doors to more complicated issues. How much of the gender balance can be attributed to Wikipedia’s rules? Its community? Where is the line between heated disagreements and harassment? How much can be explained by how the web influences behavior? How much is this reflective of the tech industry’s gender gap? Will understanding this question help to explain why other marginalized identities, from Latinos to Africans, contribute to Wikipedia in small numbers? The answers to these questions seem within the reach of comprehension, but beyond the grasp of consensus.

6. A Clockwork Orangemoody

OrangeMoody-BubbleGraphCombined-NolabelsAnother perennial topic on Wikipedia is conflict of interest (COI), usually playing out as someone inside Wikipedia or outside writing a self-serving autobiography, a low-rent marketing firm getting in trouble for editing clients’ pages, or sometimes more favorably, a group of PR firms coming together to try to make a good impression. This year, however, brought us something we never quite imagined: a massive extortion plot inverting the typical model of paid editing: rather than helping paying customers create Wikipedia entries, non-paying “customers” could simply be threatened with unflattering articles.

Orangemoody, as it was named for its “ringleader” account, was called the largest of its kind, but that merely counted the number of involved user accounts (nearly 400). The truth is, there has never been anything quite like it. Previous cases revolved around unscrupulous firms like Wiki-PR and WikiExperts who at least professed to be offering their clients a service. Orangemoody was a shakedown involving pages held for ransom, impersonation of Wikipedia administrators, and no real-world entity to absorb the blame. Orangemoody is so threatening because it suggests that Wikipedia’s open-editing model opens the door not just to unethical, if conceivable shenanigans, but also to transgressions that are much more horrifying.

5. The Luck of Grant Shapps

Next to Orangemoody, there’s something almost comforting about the familiar narrative of alleged self-interested editing of Wikipedia by Tory MP Grant Shapps and the plot twist that brought his accuser to (relative) ignominy and ruin.

Amid the UK parliamentary elections this spring, a report emerged in the left-leaning Guardian, prompted by an allegation by a Wikimedia UK administrator, that Shapps had used a pseudonymous account to massage his own Wikipedia profile while giving a drubbing to others. It seemed plausible: Shapps had admitted to editing his own biography years ago, and using assumed names in other circumstances, and his side career as an Internet executive aided the narrative.

But the tables soon turned: the right-leaning Telegraph revealed that there was no smoking gun connecting Shapps to the suspicious edits, that the Wikipedia administrator, Richard Symonds, was in fact a Lib Dem activist who had communicated with the Guardian prior to taking action, and Wikipedians soon became concerned that Symonds may have abused his administrative privileges in blocking the suspicious account.

In the end, Symonds lost his adminship, and Shapps exited a succession of positions within the Conservative Party and government. All that’s missing is Keyzer Soze shrugging off his limp and lighting a cigarette.

4. Wikipedia’s Big Picture Trends in Flux

editors-risingAfter a long period of sustained narratives about Wikipedia’s traffic and editing trends, this year things got a little interesting. Following unabated growth in global traffic to Wikipedia, given a boost in recent years by the proliferation of web-enabled mobile devices, overall traffic actually fell for the first time. Meanwhile, after almost a decade of resignation to Wikipedia’s ever-dwindling editor base—a decline perhaps also attributable to the adoption of mobile devices—the numbers ticked upward.

An August report from an SEO analysis firm showed that Wikipedia’s search referrals from Google fell by up to 20% since the beginning of the year. Most speculation focused on Google’s ever-advancing practice of answering search queries on the results page, obviating the need to click through to non-Google websites. This has bedeviled companies like Yelp, which compete with Google to serve up reviews while also depending upon it for traffic. For Wikipedia, the situation is more complicated, and perhaps less of an issue. After all, a significant portion of Google’s answers are powered by Wikimedia projects. In fact, beginning in late 2014, Google wound down its own open knowledge database, Freebase, in favor of Wikidata. And Google still recommends more Wikimedia sites than it recommends Google sites.

Also in August, the first hard data emerged to show that the long, slow decline of active (and “very active”) Wikipedia editors had been arrested—and is now trending the other way, if ever so slightly. As close Wikipedia observers know too well, Wikipedia attained its zenith participation rate in 2007, arguably the high point for the project’s activity and excitement overall, after which the lowering tide revealed consternation and even alarm, with nobody knowing where it would end. Well, maybe here? The number of very active editors—with at least 100 edits monthly—Wikipedia’s most valuable contributors, stabilized in 2014 and actually grew in 2015. The decline of administrators, coupled with the difficulty in admitting new ones in recent years, however, remains an issue.

In both cases, more data is surely needed before we can say what it really means.

3. English Wikipedia Hits 5 Million Articles

Wikipedia_5m_ArticlesAdmittedly, most of these top stories are unhappy ones, and the one just above is arguably mixed, but this one is unambiguously celebratory: on November 1, Wikipedia’s English language edition—by far its most popular, and synonymous with “Wikipedia” for most readers—notched its 5 millionth article.

Wikipedia has been the largest encyclopedia by any reasonable measure for a long while, so nothing has really changed. And it took seven years for Wikipedia to double in size, so if growth trends continue holding steady for now, we might not have a similar milestone to celebrate until sometime the next decade. Meanwhile, sheer heft is easier to measure than other important characteristics, like accuracy or completeness, so this benchmark will remain Wikipedia’s equivalent of McDonald’s “Billions Served” for the foreseeable future. It may be an arbitrary measurement, but it’s a damned impressive one.

Number 5,000,000 itself: Persoonia terminalis, a rare shrub native to eastern Australia. Oh, and if you haven’t seen the RfC debating which temporary logo Wikipedia should display on the joyous day, I very much recommend taking a look at the near misses. Perhaps it will instill some faith in Wikipedia’s community processes if you agree the best logo won (and you should).

2. It’s About Ethics in Gamergate Opposition

In late 2014 and into the start of this year, the loosely-affiliated right-wing counterpart to the left-ish Anonymous expanded its focus from video game journalists to include the Wikipedia entries where said journalists’ critical takes had accumulated. Organizing on Reddit and other forums, the ‘gaters created numerous throwaway Wikipedia accounts to first try swinging Wikipedia’s coverage of their movement and a few of their top targets around to their liking and, when that failed, they took on Wikipedia editors directly.

gamergatelogoWikipedians fought back hard—too hard, in some cases—and when Wikipedia’s Arbitration Committee got around to handing out punishments, the only ones with anything to lose were the Wikipedia editors who cared. It also fed into the above-discussed ongoing trouble over Wikipedia’s treatment of gender issues, and was by far the year’s biggest blow-up along such lines, far greater than the argument over how to handle Caitlyn Jenner’s gender transition, which still lay ahead.

It’s hard to say if Gamergate is a 100-year-flood (although on the Internet, the time frame may be more like 100 months) or a sign of things to come. Wikipedia has faced trolls before, but few have been as dedicated or as destructive as the ones beneath the Gamergate bridge. The best defense is a strong base of committed Wikipedians, and perhaps this year shows us they’ll probably still be around to carry the sand bags and shore up the levees.

1. China, Russia, and Completing the HTTPS Transition

One aspect of Wikipedia’s global prominence that the foundation and movement alike have struggled to fully grasp is the role it can, should, and does play on the international stage. This year, the Wikimedia Foundation joined forces with the ACLU to sue the National Security Agency over its mass surveillance practices, only for the case to be thrown out by a federal court. As important as that fight may be, it is but one jurisdiction of many where Wikipedia has become a proxy for privacy and free speech battles, not to mention authoritarian power grabs.

In 2015, Wikipedia’s multi-year plan to convert all traffic moving through Wikimedia servers to the HTTPS encryption protocol was finally completed. HTTPS was first enabled for WMF sites in 2011, then became the default for logged in users in 2013, and this year was finally made the default for all traffic, including readers without a Wikipedia account. This is a good thing for Internet users who wish to access Wikipedia without their governments knowing about it. But it’s complicated when governments decide to shut off access altogether.

Indeed, the full implementation of HTTPS prevents governments like China from blocking access to specific entries—such as Tiananmen Square protests of 1989—and instead they have to choose between allowing all traffic, or blocking the site entirely. China opted for the latter. To be sure, Wikipedia wasn’t the biggest collaborative online encyclopedia in the PRC—it wasn’t even the second—and China’s Communist Party seems to be perfectly TankMancontent promoting its homegrown versions of Google, Facebook and Twitter. In December, Wikipedia’s famous co-founder, Jimmy Wales, traveled to China to participate in an Internet conference, where his comments about the limitations of the state’s ability to control the Internet were intentionally lost in translation, as the Wall Street Journal reports.

A similar issue is ongoing in Russia, where the government’s media authority, Roskomnadzor, has weighed blocking access to the Russian-language Wikipedia based on its entries about illegal drugs, temporarily blocking reader access. In addition, it may also be attempting to co-opt Russian-language editors, presenting further challenges to the independence of the Wikimedia project among Russian language contributors.

It’s unclear what Russia will decide to do, but it seems safe to assume that China will hold the line for the foreseeable future. In both countries, and under still more repressive regimes—like Kazakhstan and Azerbaijan—independent websites and even independent political parties and religious movements are allowed to operate only at these governments’ discretion. Why should Wikipedia be any different?

♦     ♦     ♦

And this seems like a perfectly good place to leave it. More often than not, Wikipedia’s issues reflect issues that animate and plague society and the Internet writ large. Open knowledge and digital discourse create incredible opportunities for research and innovation, but also bestow tremendous power to the platforms and communities that effectively control the gates. The problems on Wikipedia aren’t that different from those on Reddit or Twitter, they just feel more significant given the site’s mandate and perceived authority. To understand Wikipedia’s successes and failures, we have to look to ourselves for the answer.

If you liked this post, don’t forget to check out its companion piece at The Ink Tank: “Ten Predictions for Wikipedia in 2016”.

All images via Wikimedia Commons except Gamergate logo, source unknown.

The Agony and Ecstasy of Wikidata

Tagged as , , , , , , , , ,
on April 12, 2012 at 8:31 am

Although Wikipedia is by far the best-known of the Wikimedia collaborative projects, it is just one of many. Just this last week, Wikimedia Deutschland announced its latest contribution: Wikidata (also @Wikidata, and see this interview in the Wikipedia Signpost). Still under development, its temporary homepage announces:

Wikidata aims to create a free knowledge base about the world that can be read and edited by humans and machines alike. It will provide data in all the languages of the Wikimedia projects, and allow for the central access to data in a similar vein as Wikimedia Commons does for multimedia files. Wikidata is proposed as a new Wikimedia hosted and maintained project.

Possible Wikidata logo

One of a few Wikidata logos under consideration.

Upon its announcement, I tweeted my initial impression, that it sounded like Wikipedia’s answer to Wolfram Alpha, the commercial “answer engine” created by Stephen Wolfram in 2009. It seems to partly be that but also more, and its apparent ambition—not to mention the speculation surrounding it—is causing a stir.

Already touted by TechCrunch as “Wikipedia’s next big thing” (incorrectly identifying Wikipedia as its primary driver, I pedantically note), Wikidata will create a central database for the countless numbers, statistics and figures currently found in Wikipedia’s articles. The centralized collection of data will allow for quick updates and uniformity of statistical information across Wikipedia.

Currently when new information replaces old, as is the case with census surveys, elections results and quarterly reports are published, Wikipedians must manually update the old data in all the articles in which it appears, across every language. Wikidata would create the possibility for a quick computer led update to replace all out of date information. Additionally, it is expected that Wikidata will allow visitors to search and access information in a less labor-intensive method. As TechCrunch suggests:

Wikidata will also enable users to ask different types of questions, like which of the world’s ten largest cities have a female mayor?, for example. Queries like this are today answered by user-created Wikipedia Lists – that is, manually created structured answers. Wikidata, on the hand, will be able to create these lists automatically.

Though this project—which is funded by the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, and Google—is expected to take about a year to develop, but the blogosphere is already buzzing.

It’s probably fair to say that the overall response has been very positive. In a long post summarizing Wikidata’s aims, Yahoo! Labs researcher Nicolas Torzec identifies himself as one who excitedly awaits the changes Wikidata promises:

By providing and integrating Wikipedia with one common source of structured data that anyone can edit and use, Wikidata should enable higher consistency and quality within Wikipedia articles, increase the availability of information in and across Wikipedias, and decrease the maintenance effort for the editors working on Wikipedia. At the same time, it will also enable new types of Wikipedia pages and applications, including dynamically-generated timelines, maps, and charts; automatically-generated lists and aggregates; semantic search; light question & answering; etc. And because all these data will be available as Open Data in a machine-readable form, they will also benefit thrid-party [sic] knowledge-based projects at large Web companies such as Google, Bing, Facebook and Yahoo!, as well as at smaller Web startups…

Asked for comment by CNet, Andrew Lih, author of The Wikipedia Revolution, called it a “logical progression” for Wikipedia, even as he worries that Wikidata will drive away Wikipedians who are less tech-savvy, as it complicates the way in which information is recorded.

Also cautious is SEO blogger Pat Marcello, who warns that human error is still a very real possibility. She writes:

Wikidata is going to be just like Wikipedia in that it will be UGC (user-generated content) in many instances. So, how reliable will it be? I mean, when I write something — anything from a blog post to a book, I want the data I use in that work to be 100% accurate. I fear that just as with Wikipedia, the information you get may not be 100%, and with the volume of data they plan to include, there’s no way to vette [sic] all of the information.

Fair enough, but of course the upside is that corrections can be easily made. If one already uses Wikipedia, this tradeoff is very familiar.

The most critical voice so far is Mark Graham, an English geographer (and a fellow participant in the January 2010 WikiWars conference) who published “The Problem with Wikidata” on The Atlantic’s website this week:

This is a highly significant and hugely important change to the ways that Wikipedia works. Until now, the Wikipedia community has never attempted any sort of consistency across all languages. …

It is important that different communities are able to create and reproduce different truths and worldviews. And while certain truths are universal (Tokyo is described as a capital city in every language version that includes an article about Japan), others are more messy and unclear (e.g. should the population of Israel include occupied and contested territories?).

The reason that Wikidata marks such a significant moment in Wikipedia’s history is the fact that it eliminates some of the scope for culturally contingent representations of places, processes, people, and events. However, even more concerning is that fact that this sort of congealed and structured knowledge is unlikely to reflect the opinions and beliefs of traditionally marginalized groups.

The comments on the article are interesting, with some voices sharing Graham’s concerns, while others argue his concerns are overstated:

While there are exceptions, most of the information (and bias) in Wikipedia articles is contained within the prose and will be unaffected by Wikidata. … It’s quite possible that Wikidata will initially provide a lopsided database with a heavy emphasis on the developed world. But Wikipedia’s increasing focus on globalization and the tremendous potential of the open editing model make it one of the best candidates for mitigating that factor within the Semantic Web.

Wikimedia and Wikipedia’s slant toward the North, the West, and English speakers are well-covered in Wikipedia’s own list of its systemic biases, and Wikidata can’t help but face the same challenges. Meanwhile, another commenter argued:

The sky is falling! Or not, take your pick. Other commenters have made more informed posts than this, but does Wikidata’s existence force Wikipedia to use it? Probably not. … But if Wikidata has a graph of the Israel boundary–even multiple graphs–I suppose that the various Wikipedia authors could use one, or several, or none and make their own…which might get edited by someone else.

Under the canny (partial) title of “Who Will Be Mostly Right … ?” on the blog Data Liberate, Richard Wallis writes:

I share some of [Graham’s] concerns, but also draw comfort from some of the things Denny said in Berlin – “WikiData will not define the truth, it will collect the references to the data…. WikiData created articles on a topic will point to the relevant Wikipedia articles in all languages.” They obviously intend to capture facts described in different languages, the question is will they also preserve the local differences in assertion. In a world where we still can not totally agree on the height of our tallest mountain, we must be able to take account of and report differences of opinion.

Evidence that those behind Wikidata have anticipated a response similar to Graham’s can be found on the blog Too Big to Know where technologist David Weinberger shared a snippet of an IRC chat with he had with a Wikimedian:

[11:29] hi. I’m very interested in wikidata and am trying to write a brief blog post, and have a n00b question.
[11:29] go ahead!
[11:30] When there’s disagreement about a fact, will there be a discussion page where the differences can be worked through in public?
[11:30] two-fold answer
[11:30] 1. there will be a discussion page, yes
[11:31] 2. every fact can always have references accompanying it. so it is not about “does berlin really have 3.5 mio people” but about “does source X say that berlin has 3.5 mio people”
[11:31] wikidata is not about truth
[11:31] but about referenceable facts

The compiled phrase “Wikidata is not about truth, but about referenceable facts” is an intentional echo of Wikipedia’s oft-debated but longstanding allegiance to “verifiability, not truth”. Unsurprisingly, this familiar debate is playing itself out around Wikidata already.

Thanks for research assistance to Morgan Wehling.