Saturday, October 20, 2012

Wikipedia addendum.

Of course, most of the articles in the Esperanto Wikipedia are crap. However, there are a substantial number with substantial human content. If you want to get a taste of how much of a given wikipedia is computer-generated crap, just go to a random page: As you can see, most articles are crap, generally. Volapuk is remarkable in that there is little else besides the crap generated by the tools. Project idea: discriminate between these classes of articles.

Why are there so many articles in the Volapuk Wikipedia?

Volapuk was the first popular constructed language. It was popular for maybe a decade before Esperanto took off and stole its base. At its height, there were hundreds of clubs around the world, hundreds of books, hundreds of thousands of enthusiasts. It's an odd language with a lot of funny sounds, it's probably as complicated as German. In Danish, the name has come to mean "nonsense". There are still a few enthusiasts, but people estimate their numbers in the dozens (besides hobbyists who care about all constructed languages but don't have a particular interest in Volapuk). By way of contrast, Esperanto has perhaps 2000 native speakers and they have an annual international congress with 2-4000 attendees from 70-ish countries each year. There has been a continuously active community of tens of thousands for the 125 (exactly!) years of its existence. The commonly quoted number of speakers - 2,000,000 - is a bit inflated, sure. Euskara (Basque) has 700,000 speakers, but almost all of them certainly speak the language and are native speakers, as nobody has any interest in trying to learn it - it's worse than Magyar. I would be surprised if there were 100,000 who spoke Esperanto well enough that somebody with a comparable level of Basque would be counted as a Basque speaker. But, still, there's a community, it includes native speakers, and there are people who do a substantial proportion of their daily communication in it, quite unlike Volapuk.

Why, then, does the Volapuk wikipedia have 120,000 entries? Esperanto, a "legitimate" language, has only about 170,000. A couple dozen enthusiasts and a handful of con-lang nerds who are interested in all con-langs can't, on their own, do that much work on a quality product. And there are plenty of the same sort of demographic in Esperanto, too.

Answer: the Volapuk wikipedia is mostly robot-generated crap. There are few substantive articles. It's mostly full of stubs. There are a large number of machine-translated articles that are very easy to generate (e.g., formulaic articles about each town in Italy). There seems to be a machine-generated stub article for every person who was ever tangentially involved with the Volapuk movement. I'm not saying there isn't some value in this - but this certainly solves the mystery. See their talk page for details: http://en.wikipedia.org/wiki/Talk%3AVolap%C3%BCk and also http://vo.wikipedia.org/wiki/Gebanibespik:Smeira

If you're interested in hearing more about Volapuk, read Puk, Memory. Even if you're not interested, it's a good article. If you're interested in learning constructed languages, you should learn Esperanto instead of anything else. Nobody should try to learn a language without a use for it (and without using it), and the only constructed language with a community of speakers is Esperanto. If you're interested in learning another language and are in the US, the language you are most likely to be able to learn is whatever dialect of Spanish is most common in your area because there will be people to talk to. But if there's an Esperanto club in town, sure, you could try that, you will definitely be able to get to a "I can express myself incompetently and get the gist of what people say" level much more easily and much faster, but I would not recommend it if you don't have people to talk to.

Other uses of "volapuk" besides the Danish:

  • It's a slang term ("волапюк") for rendering Cyrillic characters with ASCII Latin characters.
  • A few other languages besides Danish use it as a word for "nonsense" - I don't have a full list. It's used in sentences where we might say, "It's all Greek to me."
  • There's a band called Volapuk.
  • That's about it.