William Beutler on Wikipedia

Archive for the ‘Visualization’ Category

Wiki: The Story of a Word

Tagged as , , , , ,
on June 3, 2013 at 3:13 pm

Only very occasionally do I write about myself or business pursuits, and I won’t waste too much of your time here, but those of you who follow me in a non-The Wikipedian capacity may be aware that the small company I began a few years ago has become just a bit less small in recent months, and that we’ve rebranded as Beutler Ink. (You’ll notice that link goes to our Facebook page; our new website is still about a week away.)

Along with our new name comes expanded offerings in creative services and visual communication. Today we’ve launched a project for our own fun and your edification, which has a decidedly Wikimedia-friendly angle. It’s a vertical infographic about the evolution of the meaning and usage of the word “wiki” called Wiki: A Word’s Journey. Click on the title, or the preview graphic below and see the full thing—along with a blog post expanding on the topic at the Beutler Ink Tumblr:


Trendy Thinking: Contemplating Wikipedia Contributorship

Tagged as , , ,
on March 17, 2011 at 8:49 am

Last week, the Wikimedia Foundation published some early results in an ongoing study of trends in editor participation, both in a detailed analysis by the survey’s leaders and a general summary by Wikimedia executive director Sue Gardner. I’d actually started writing a summary of my own before I read Gardner’s letter… only to find that Gardner had already made the exact same “Eternal September” comparison as I had planned. (Which makes sense, since I first learned of the term from Wikipedia.) Anyhow, both are worth reading if you are so inclined, but here’s a key excerpt from Gardner’s summary:

Between 2005 and 2007, newbies started having real trouble successfully joining the Wikimedia community. Before 2005 in the English Wikipedia, nearly 40% of new editors would still be active a year after their first edit. After 2007, only about 12-15% of new editors were still active a year after their first edit. Post-2007, lots of people were still trying to become Wikipedia editors. What had changed, though, is that they were increasingly failing to integrate into the Wikipedia community, and failing increasingly quickly. The Wikimedia community had become too hard to penetrate.

In the first half of Wikipedia’s first ten years, it experienced exponential growth in the absolute number of editors, from barely 100 active participants in 2001 to about 44,000 in 2006. The community continued to grow in 2007, cresting at nearly 52,000 active editors. Interestingly, though, 2007 brought fewer new editors: the peak owed to a one-year spike in retention. Thereafter, the number of total editors (and new editors) has dropped each year, with about 33,000 active contributors in 2010. Granted, that’s a pretty big drop. While it hasn’t bottomed out, it does seem to be stabilizing.

At the moment, Wikipedia has somewhat fewer editors than it had in 2006 and more than double the editors it had in 2005. But it only has slightly more new editors than it did that year: about 13,000 in 2010 compared to 12,200 new editors in 2005. For a better understanding of these trends, see this chart prepared for the survey:

Gardner continues:

Our new study shows that our communities are aging, probably as a direct result of these trends. I don’t mean that the average age of editors is increasing: I’m talking about tenure. Newbies are making up a smaller percentage of editors overall than ever before, and the absolute number of newbies is dropping as well. That’s a problem for everyone, because it means that experienced editors are needing to shoulder an ever-increasing workload, and bureaucrat and administrator positions are growing ever-harder to fill.

My initial reaction is to say this is not necessarily a problem. Yes, over time the proportion of new editors is shrinking, but this is the flipside of editor retention. The community “growing older” as a proportion of all editors does not necessarily mean number of editors is getting smaller, but that longtime editors are sticking around. Except that the community actually is getting smaller.

How many Wikipedians does the community really need to sustain itself? This is another open question. Some editors may point to the rapid development of impressive new articles such as Fukushima I nuclear accidents whereas interesting but less timely articles (let’s pick on the Assassination Records Review Board) languish.

If you joined Wikipedia in 2004, there is about a 40% chance you were still editing Wikipedia after one year. All things considered, that’s a pretty solid number, but that’s about as good as it got: by mid-2005 those retention rates started plummeting. If you joined in early 2007, there was about a 15% chance you were still editing after one year. Interestingly, the drop in retention more or less coincides with the explosion in new contributors: new editorship grew most between early 2005 and early 2007; the drop in retention begins about the same time and continued falling into the middle of 2007.

This makes some sense: those were the years with the greatest number of new editors, so it makes sense that a larger number would wash out. On the other hand, even as trends have stabilized, only about 10% of editors who joined in 2009 are still editing today. That’s a pretty remarkable drop-off in retention, and so the class of 2004 and 2009 today have about the same number of editors currently active.

Why the drop-off? Hard to say, but as the study’s authors put it: “[W]e do know something drastically changed during this time period, which corresponds to the period of massive influx of New Wikipedians.” This almost sounds like the influx of new editors drove the old ones out, although there’s no way to know that. So this raises an interesting question: were all those new editors necessarily good for the community?

For a snapshot of editor participation trends based on which year one joined, see this chart:

Wikipedia is an incredible resource and, like natural resources, it needs to be both developed and preserved. That means more editors are needed, and this study is just one step in a long process of figuring out how best to do that. Fortunately, there is time.

Wikipedia’s Most Wanted

Tagged as , ,
on February 13, 2011 at 9:06 am

With more than 3.5 million articles on the English-language Wikipedia, it’s almost difficult to believe there could be much left to write. Although Wikipedia’s “hockey stick” growth has begun to slow down somewhat, the truth is that it still is growing very quickly—the English Wikipedia passed 3 million articles last August, and may well hit 4 million this year.

Indeed gaps do remain, and finding them can be a challenge. Now Magnus Manske, one of Wikipedia’s longest-active contributors (and the programmer of many different cool things) has created (actually, re-created after a long absence) a new DIY service called “Most wanted articles”.

Manske’s tool searches Wikipedia for “redlinks”. You’ve probably seen these around Wikipedia, and they are what they sound like: anchor text which is colored red because there is no article behind it. By contrast, links on Wikipedia that are colored blue will actually take you somewhere. Redlinks are sometimes considered unsightly, and they can be, if overused. Used selectively, they can highlight new subjects possibly deserving of new Wikipedia articles. Until that time comes—theoretically speaking—one can determine which are the “most wanted” by counting redlinks.

What follows is a list of the most-wanted Wikipedia articles, as of February 7, 2011:

  1. British films of 2011 (1842)
  2. British films of 2012 (1841)
  3. List of Argentine films of 2011 (1712)
  4. Bazinaprine (1204)
  5. Tetrindole (1203)
  6. Sercloremine (1203)
  7. Befol (1203)
  8. Esuprone (1134)
  9. Siddapur, Belgaum (1117)
  10. Milacemide (1059)

More than 1,000 redlinks for each of these topics? How did this happen? The answer is templates, especially “navboxes” which sit at the bottom of various articles, helping to group topics together. In each of the above-listed non-articles, redlinks to prospective articles have appeared in the following templates: Cinema of the UK, Cinema of Argentina, Dopaminergics and Belgaum district.

It might be more interesting to find out which articles were the most-wanted according to organically-created redlinks in article text, but that’s a bit more challenging; such a list may or may not be forthcoming. That said, the more of these articles created or otherwise dealt with, the closer we’ll get to those ones, further down the list.

And in fact, as I post this on February 13, 2011, the list has changed as some of these articles have been created—almost certainly based on discussion among Wikipedia editors about this list. The next time you find yourself looking for information about Bazinaprine, a monoamine oxidase inhibitor believed to be useful for the treatment of depression, then you have Manske (and of course the editor who took up the cause) to thank.

The State of The State of Wikipedia

Tagged as , , , , , , , ,
on January 25, 2011 at 4:35 pm

Chances are good that if you follow Wikipedia closely, then you have probably seen the following video:

The State of Wikipedia from JESS3 on Vimeo.

Last week, it was featured on both TechCrunch and Mashable and, on YouTube alone, it’s climbing toward 100,000 views as of this writing. And you might have missed the following infographic that went along with it, although I hope you didn’t:

Right-click to view at full size in another tab.

Meanwhile, if you happened to see Jay Walsh’s post on the Wikimedia blog last week—or you watched carefully through to the very end—you may have noticed that among those involved was yours truly.

The story of this video’s development began early in 2010 with the launching of the “State of” video series by my friends at the DC-based creative agency JESS3. The first in the series was “The State of the Internet“; more recently, they produced “The State of Cloud Computing” in association with Salesforce.com.

Seeking new topics, JESS3 invited me to develop a story concept for the video you see above. I talked with some influential wiki-thinkers, some of whose names appear in “Special Thanks” at the video’s end, to write a script for the eventual narrator. Not unlike Dan Aykroyd’s first draft of “The Blues Brothers”—and like it in only this regard—it was much longer than what you see above. Left out were asides on the cause (and effects) of the Spanish Fork, the German-language Wikipedia’s different way of doing things, the development of chapters, the invention of bots, the most-visited Wikipedia articles, the most-visited-in-a-single day Wikipedia article, and more.

In the end, it was a good thing they asked me to scale it back, especially once Jimmy Wales agreed to provide the voice as narrator. And the shorter version perhaps better accomplishes the goal of giving viewers a bit of an answer to the questions of where Wikipedia came from, and why it works the way it does. At the very least, I hope it sparks a deeper curiosity among viewers and, perhaps, sufficient interest to get involved themselves.

Who knows if it will have that effect, but it was a great experience to be part of. The effort put into this by the JESS3 team—on art direction, animation and sound—was tremendous, and took it far beyond any concept I had of what it could become. And maybe we’ll do it again in ten years.

Charted Territory: When Good Infographics Go Bad

Tagged as , , , , , , , , , , , , , , , ,
on August 12, 2010 at 8:32 pm

I will be blunt: the new infographic from David McCandless (Information is Beautiful), called “Articles of War: Wikipedia’s lamest edit wars“, is so lazy as to be misleading, glib as to be condescending, and generally unhelpful that I’m inclined to say that it sets back the public understanding of how Wikipedia works all by itself.

Up front: I respect McCandless and like what he does, which includes some interesting and thoughtful work, especially his print of Left vs. Right (U.S. and Rest of the World editions) that is better than most professional political analysts could produce. Separately, I am collaborating with friends on a Wikipedia visualization project of our own, so call me an interested observer, but note also that I’ve been thinking about this kind of thing lately.

I have reproduced only the top section of “Articles of War” below, for the purposes of commentary (click through to see the full thing on McCandless’ site):

Articles of War (excerpt)

The first thing to know about “Articles of War” is that it was based on an essay to be found in the recesses of Wikipedia called “Lamest edit wars” that is specifically kept in the site’s intra-wiki space because, as it states at the top: “This page contains material that is kept because it is considered humorous.” McCandless & Co. do give credit where it is due, but that Wikipedia page surely does not and never did intend to be definitive — it’s just a series of cheekily-written paragraphs about various arguments occurring over time, so there is nothing like meaningful numbers to be gleaned from it.

Instead, McCandless and his researchers decided to generate data to visualize these edit wars by counting the total number of edits over each article’s lifetime, counting not just the edits specifically related to that particular dispute (a difficult and time-consuming thing to research, it goes without saying) but every single edit, ever, thereby giving a grossly distorted view of each article’s history. I’ll give them the fact that if one looks to the legend in the top lefthand corner, it indicates that the number listed (and I presume the size of each box) relates to the “Total no. of edits” but even if readers do notice that, it is at best confusing.

Likewise, the articles’ relative position on the chart accords to their creation, not when the described dispute took place. If you think 2,000+ edits were expended on a photograph in the Cow-tipping article in the middle of 2001, that’s too bad, but you were reasonably misled. Nor would would you know that the article did not include a photograph until several years later.

What you are left with is a decent visualization of how frequently edited some randomly selected articles — some popular, some timely, some but not all controversial — happen to be. Why not simply show that? Focusing on this alone we can see that the following articles have attracted tens of thousands of edits over the years:

  • The Beatles
  • Jesus
  • Wikipedia
  • Christianity
  • Ann Coulter
  • Star Wars
  • Wii

That’s not linkbait enough for you? Then please do the research.

Meanwhile, the infographic is also a little too snarky for its own good, especially toward its chosen subject. Color-coding is used to categorize certain types of edit wars; one is labeled “American Cultural Superiority” and exists mainly to identify debates between U.S. and British spellings. Which I find a little… superior itself, but hey, I suppose it’s a misdemeanor violation. Worse is that edit wars involving Wikipedia and site co-founder Jimmy Wales are coded as “Religion.” Too cute. Or maybe just an oversight?

Another oversight concerns an on-wiki debate about whether the most famous Palin was, at the time of its occurrence, Monty Python’s Michael or Alaska’s former governor Sarah. (Since then, I believe the one with decades of contributions to comedy has been definitively usurped by the mavericky one’s more recent, er, contributions.) According to “Articles of War” this happened in 2003. But if you think about it, this makes no sense at all — of course this happened in 2008, when John McCain chose Sarah Palin as his running mate. And the Lamest edit wars essay itself mentions that this happened in 2008. Pure oversight to be sure, but I have to wonder what other mistakes the research team made.

To their partial credit, they have opened their Google Spreadsheets for public inspection, so it’s clear they at least intended to impart real information. And there you can see that they are indeed using the total number of edits over time and that their “Palin” error was made early on. That seems to put the responsibility on the researchers, rather than McCandless himself, but of course it’s a total package.

I hold McCandless to a standard that I don’t the jokers at Cracked* or Something Awful because their job is to make you laugh, while McCandless’ job, according to his website’s own tagline, is to take “issues, ideas, knowledge, data” — and make it easier to understand by visualizing it. There are certainly issues and ideas to be found in “Articles of War” — but knowledge and data, not so much. And though I am getting a little more rant-y than usual about this, I do aim to be constructive, so I would very much like to see this infographic re-done with some extra research. This blog post may serve as a guide if they so choose. I hope they do.

P.S. The Gizmodo thread — where I found it — on this is hilarious, with many people re-fighting the same disputes that once arose on Wikipedia. However, only one that I saw came anywhere near noticing the fact that the methodology was suspect.

P.P.S. Am I being nitpicky to add that “Articles of War” appears to convey that Wikipedia’s articles about The Beatles and Jesus were created prior to 2001? That is to say before Wikipedia itself began? I don’t actually think so.

*Actually, about Cracked — a.k.a. Digg’s favorite website — as I have seen a prominent Wikipedian point out elsewhere, it often does a pretty good job using information from Wikipedia responsibly. Among their articles about Wikipedia, the title of “5 Terrifying Bastardizations of the Wikipedia Model” alone gives away that it’s implicitly pro-Wikipedia, as does “5 Celebrity Wikipedia Entries they Clearly Wrote Themselves“. Even “8 Most Needlessly Detailed Wikipedia Entries” knows what’s good about Wikipedia, even when it isn’t. Cracked writers clearly know their way down through a history page — like say, Corey Feldman’s — but it doesn’t appear that McCandless and his researchers looked as closely.

Change Your Wikitude

Tagged as , , , , , , , , , , , , , , ,
on October 4, 2009 at 10:00 am

wikitude_itunesBecause I work in social media, every so often I’ll get the question: So, what’s the big new thing? For a couple of years now the answer has been “Twitter,” but the micro-blogging service finally “arrived” in early 2009, so I’ve needed a new answer. Lately, I’ve settled on “augmented reality.” As Wikipedia describes it:

Augmented reality (AR) is a term for a live direct or indirect view of a physical real-world environment whose elements are merged with-, or augmented by virtual computer-generated imagery – creating a mixed reality.

I.e. Terminator-vision, more or less. Now that the iPhone, Android-enabled devices and many more smartphones on the way have cameras and GPS (and compass in the iPhone 3GS) it becomes possible to determine where someone is, what they are looking at and serve up information to them on the spot. And it’s a no-brainer to imagine that one of the first information resources likely to be used is Wikipedia—especially considering how many articles about real-world objects contain geographic coordinates for their subjects (for this you can thank the people at WikiProject Geographical coordinates).

Just this week a program called Wikitude, available to Android users for several months, hit the iTunes store. Wikitude actually pulls information from elsewhere too, but like the name implies, Wikipedia is a key resource. Ben Parr at Mashable explains:

The app, which only works on the iPhone 3GS model (since it has a compass), utilizes three layers of information and superimposes them on your iPhone: information from Wikipedia, local reviews from London-based Qype, and finally crowdsourced information from its Wikitude.me website. With it, you can tag any location with personal notes that others can see. You can’t tell me that isn’t awesome.

He is right. I can’t. And Marshall Kirkpatrick at ReadWriteWeb writes:

It’s because Wikitude is so open to user generated content that I find it the most exciting of all the Augmented Reality apps. Unfortunately, none of these apps that I’ve tested on Android are performing fabulously yet – the GPS is just too imprecise and the data too sparse. These are early days though, and even today it’s a lot of fun to look at the world around you through Wiki articles.

As he indicates, Wikitude is not the only player in the game. Another one available for iPhone is Cyclopedia, which I didn’t focus on just because I didn’t want to pay for it (but here is Gizmodo’s review). Wikitude, on the other hand, is available for the low, low price of free. (And as Tom Peterson would say, “free is a very good price.”) I took it for a quick test run at the corner of 18th and Columbia in Washington, DC. Here’s what I saw looking south along 18th Street:


And looking west in the direction of Columbia Heights:


Not displayed here is the ability to adjust the distance it will scan, a list-view of POIs (Points of Interest) and settings, which include the ability to turn on and off the different sources of information as well as different types of information. If you just want information from Wikipedia, it’s just a few taps away. If you want information about shopping and sights but not traffic or towns, you can adjust this as well.

I’m not likely to use this a great deal here in Washington, DC where I’d at least like to think I know what everything is. But when I’m traveling, such as when I visit San Francisco for the first time later this month, I can see myself not only making use of the program but using it enough to move it temporarily onto my first page of apps.

Have you used Wikitude or a similar application? Anything you like or dislike about them? Please share in the comments.

Wikipedia On Dead Tree Redux

Tagged as , , ,
on June 20, 2009 at 3:31 pm

More than a week ago I posted a photo that’s been making the rounds lately — and even wound up as the basis for a joke on Conan O’Brien this past week — about a student artist who had created a physical book of Wikipedia’s Featured articles, one taking up approximately 5,000 pages. I noted at the time that the explanatory text

Reproducing Wikipedia in a dysfunctional physical form helps to question its use as an internet resource.

wasn’t terribly satisfying to me, and I asked at the time

Would printing all of Google’s search results also question its use as an Internet resource? Would printing an image of a sundial question its use as a physical timekeeping device?

and I resolved to find out more if I could. In fact I did hear back from the book’s creator, Rob Matthews, not long after. When posed with the question above, he responded at first:

I’m comparing the Internet Wikipedia to a traditional encyclopedia, by putting it in the same format, therefore suggesting that Wikipedia is dysfunctional compared to a normal encyclopedia. This is suggested by how I’ve conveyed Wikipedia physically.

I still wasn’t satisfied with this, but after a bit of back and forth, Matthews confirmed that his intention was to point out, compared to a traditional paper-based encyclopedia, it’s less reliable because of its radical openness, or hard to find what’s important among the incomplete and unbalanced articles that exist on the site. Those are my words, but he agreed with this much.

I actually do not agree with this view. Not that I don’t agree there is some truth to the point, because there is, but because I do not actually see how anyone is impeded from finding what they want because of Wikipedia. Moreover, “what’s important” is always in flux, and Wikipedia is a reflection of that.

wikipedia-in-print-rob-matthewsIt’s also nothing new. Those who lament the fact that Wkipedia gives disproportionate coverage to trivial matters — a criticism voiced by none other than Stephen Colbert, who sarcastically riffed on the subject, “any site that’s got a longer entry on ‘truthiness’ than on Lutherans has its priorities straight” — should also recognize that these imbalances are often corrected.

I’ve never been one to take my social commentary from visual art such as painting or sculpture, in significant part because it is rare that an image or an object can convey a subtle point while also succeeding as art. For such a purpose — in this case offering commentary on a subject which is overwhelmingly composed of words — I think nonverbal art is inferior to something like the novel, the essay or even the sitcom.

Even if I thought Matthews had a strong argument about Wikipedia to make, I think this fails as standalone commentary. But if Matthews does actually sell copies of this book, consider me interested (price dependent). Mr. Matthews doesn’t have answers for his questions, but his artwork would make for an excellent conversation piece.

Wikipedia On Dead Tree

Tagged as , ,
on June 10, 2009 at 6:35 pm

OK, now this is something else — artist Rob Matthews printed all of Wikipedia’s Featured articles as a 5,000 page book. It’s a great image:


Which raises the question — what would a book containing every article from Wikipedia look like?

Meanwhile, Matthews doesn’t offer much explanation for the art or what it is supposed to mean, although he does offer this much:

Reproducing Wikipedia in a dysfunctional physical form helps to question its use as an internet resource.

Hmm… it does? Would printing all of Google’s search results also question its use as an Internet resource? Would printing an image of a sundial question its use as a physical timekeeping device? I love the book as an art piece, but I’m not entirely sold on this point. (No matter what, though, it’s still more constructive than the other Wikipedia art.)

I will drop Mr. Matthews an e-mail and ask both questions — and I’ll update if I find anything out.

Mr. Wales’ Neighborhood

Tagged as , , , ,
on April 12, 2009 at 10:13 am

For the fourth year in a row, a company named Information Architects has released what it calls a “Web Trend Map” — based loosely on the Tokyo subway map — that is nothing if not prime link bait, and The Wikipedian is unashamed to chomp down. Here is a crop from the much larger original showing Wikipedia’s “neighborhood”:


For the record, the four websites situated closest to Wikipedia are HowStuffWorks, the non-Bill O’Reilly, Twitter and Huffington Post. To which I can only say: sure, okay.

Wikipedia is on the “Knowledge Line” which explains its proximity to O’Reilly and HowStuffWorks, where its connection to Twitter and Wikipedia is based on their relative popularity on each “Line.” The size of the name and height of the station both correspond to Wikipedia’s influence as a function of the creators’ estimation. Wikipedia is in fact listed fifth overall, behind only Google, Yahoo, MSN and Apple. It’s a little arbitrary, but these things always are.

As for the tiny figures saying the names of “Trendsetters,” well, I wonder how either Jimmy Wales or Larry Sanger feel about the latter’s inclusion at this late date. But that’s a subject for another post.