Four Short Rants on International Data

On Naming Conventions

I'm going to name some countries, and there's nothing you can do to stop me:

- "Macedonia"
- "The Former Yugoslav Republic of Macedonia"
- "Macedonia, the Former Yugoslav Republic of"
- "Republic of Macedonia"
- "Република Македонија"
- "Республика Македония"
- "Republik Mazedonien"
- "Makedonian tasavalta"
- "Republika Makedonija"

Can you tell what all these country names have in common? Chances are, yes, you probably can. You know who can't tell? Computers. Each one of these strings looks different to whatever naive automated process is going through your country data. When you use an idiosyncratic naming convention, you force a slow, expensive, fleshy human (like me) to manually interpret your data.

We have this great little thing called the ISO 3166 standard. It was invented in the 1970s alongside floppy discs and the VCR, and it eliminates these ambiguities. Please use this in your international dataset. I'm looking at you, United Nations Development Programme.

On Availability

There is no well-established indicator for a country's attitudes towards LGBT issues, but enterprising people build their own anyway. The Wikipedia page on LGBT rights by country and the Spartacus International Gay Guide both have something approaching a composite index on these subjects. Both of these heavily rely on the ILGA annual report on state-sponsored homophobia as a source. The ILGA website itself has a (broken) interactive map on international LGBT legislation, presumably based on its own data.

The ILGA data is only publicly available in the form of prosaic exposition in its annual report, which is a PDF document. When those Wikipedian contributors, Spartacus guide editors, and ILGA map-makers produced their various data-based offerings, they almost certainly had to manually go through that document, country by country, territory by territory. This is a staggering waste of time, effort and resources.

Looking at its website, the ILGA clearly doesn't have the internal resources to offer up all-singing, all-dancing techno-stats wizardry, but if it published its raw data, it wouldn't have to. Other people could, and would, do it for them.

On Access

Transparency International is an NGO dedicated to combatting corruption. Every year they compile a composite index called the Corruption Perceptions Index, which ranks countries based on their publicly-perceived level of corruption. This is useful data for a lot of organisations, and is referenced extensively in work with an international scope. The full index data is available on the Transparency International website.

In an excel file. In a zip file.

I will say three very positive things about Transparency International:

1) From a scholastic perspective, the data they provide is impeccable. That excel file is a really good excel file, with lots of salient metadata, and it's bundled with their full methodology for compiling the index. Mad props. A++.

2) They practise what they preach regarding openness. Their operational budget for 2015 is €21,559,000, and finding this out took me a matter of seconds on their website.

3) They have clearly invested heavily in their website and presumably understand the importance of this.

I would accept an excel file from ILGA because they're a much less well-funded organisation. (Their current operational budget is more like €1,400,000, which I found in a Googled job advert). But Transparency International clearly has all the pieces in place for offering this up through a lightweight REST API, or at least as more easily-interrogable data. As it stands, someone's had to do it for them.

On Maps

I like a good map as much as the next nerd. In fact, "what's the most interesting map you've seen recently" is currently on my list of conversation-starters. Maps are an informative visualisation tool. Maps are good at telling stories. Maps are pretty.

But there are some things that aren't maps, and I think the international dataset "scene" has forgotten this.

It seems like every time I try and find some international dataset, someone wants to show me an interactive map, and I can't help but wonder if they've maybe forgotten that data has other uses, like helping people to make decisions or conduct scientific inquiry.

This becomes more annoying when they'll gladly give you a whole smorgasbord of maps, but won't let you get within ten feet of the actual data. "Why would you possibly want all those numbers?" they say. "We've made the maps for you already! Maps! Yay maps!"

Assisted Reading: World Edition

XKCD creator Randall Munroe has a bee in his bonnet about numbers and context. A couple of years ago, he blogged about the Dictionary of Numbers: a Chrome plugin which inserts comparative context for any values it detects in your browser. For example, if the value “1,200 people” appears on a webpage you're reading, the plugin will helpfully append this with “[≈ population of Niue, nation]” or something similar.

In a more recent FiveThirtyEight interview, he came out with this gem:

A good rule of thumb might be, “If I added a zero to this number, would the sentence containing it mean something different to me?” If the answer is “no,” maybe the number has no business being in the sentence in the first place.

His broader point, which you've probably surmised already, is that numbers aren't meaningful out of context. If you're presented with a digit sporting a bunch of trailing zeroes, your brain will fall into the trap of scope neglect and just parse it as “a really big number”. In order for these values to be informative, they need to connect with something you can comprehend. Whenever your brain goes “that's a really big number”, you need to balance the account by asking the question “compared to what?”

This is a very worthwhile bee for one's bonnet to have, and tools like the Dictionary of Numbers are useful for making sense out of such things. But I also have other, bigger bees, and I wonder if there might be analogous ways to deal with them.

***

Here is a fake fact I am tired of hearing: “Bhutan uses Gross Domestic Happiness to measure its wealth”. First of all, no it doesn't. If you can tell me what Bhutan's Gross National Happiness was for 2013 I will give you a million pounds, or a commensurate amount of happiness.

Leaving aside the matter of it not being true, and of “happiness” being a nebulous concept we can't coherently measure, what annoys me is how people lap up this idea without stopping to think “well, what's Bhutan actually like?” A bit of research would reveal that it has a pretty sketchy human rights record, a 59% literacy rate, and ranks between Indonesia and Guatemala on the Multidimensional Poverty Index. With context like that, it's hard to think "hey, that Bhutan place has got the right idea".

There is no Chrome plugin that would detect the Bhutanese Gross National Happiness fact and flag it up as spurious bullshit. For that matter, there's no Chrome plugin for immediately finding context on countries of the world1 , but this is at least feasible. I'm looking for a non-trivial Chrome extension project, and I might very well try putting it together myself.

This makes me wonder what other subjects people might want immediate context on, which could be provided in a similar way. They Work For You holds public voting records for all UK MPs, but there are 650 of them, and checking them out one by one whenever they appear in a news article is beyond most people's effort budget. A browser extension that let you view a digest of their affiliations and voting record at a glance would aid in following political news. Or consider the converse, which could do the same for electoral wards.

We'd certainly be more well-informed if we stopped every two minutes to look up important details on Wikipedia, but my point is that we don't have to. More and more of what we read comes to us through a web browser, a dynamic tool that's capable of doing work on our behalf. Why don't we let it?


  1. That I can find with five minutes of Googling, though the CIA World Factbook is available as an Android app  

Make me leader, and I will give you stuff

ALIEN: Explain to me this "election".

HUMAN: Well, we humans are social primates. Social animals commonly exist in small groups with a dominance hierarchy. They'll have some procedure for establishing social status, and this can be used as a proxy for the outcome of physical conflict. This way, every wolf in the pack doesn't have to be fighting every other wolf for food or mate selection or whatever. They'll only come into occasional conflict with other wolves that are marginally stronger or weaker than themselves.

ALIEN: With you so far...

HUMAN: Some social animals, particularly higher primates, are sophisticated enough to form coalitions. The largest chimp in a troop might be able to fend off any individual rival, but can't fend off the two largest rivals working together. Chimpanzees, bonobos and gorillas all exhibit coalition-forming behaviour. The largest chimp, to secure his position, can concede some food or mate selection options to the second-largest chimp. Alternatively, the second-largest chimp can make the same offer to the third-largest chimp. This is more sophisticated than it might seem to modern humans. "Make me leader and I will give you stuff" is a notion that's available to chimps.

ALIEN: Okay...

HUMAN: So humans, as well as being primates, are also capable of advanced long-term planning and shaping our surroundings. We can overcome environmental limitations like food scarcity and harsh climate to coexist in bigger and bigger groups and carry out social projects on a longer and longer time-scale. For a troop of chimpanzees, the most sophisticated social project might be "let's go and live in that clearing over there". For humans, the most sophisticated social project might be "let's go to the moon". Just as the chimps might fight about the clearing, the humans will fight about the moon.

ALIEN: So the biggest human that wants to go to the moon forms a coalition with other humans who want to go to the moon, and promises food and mates to other humans who are indifferent to going to the moon, creating a group with greater social status than the non-moon goers, so the non-moon-people will defer to the moon-people rather than entering into conflict?

HUMAN: Well, large social projects such as going to the moon or not murdering each other are quite fragile. In smaller groups humans still behave a lot like chimpanzees, but in larger groups we need special social projects to let everyone work together. How these social projects are run is something people would fight about a lot. If the biggest human with the most supporters could just turn up and change this social project to their whim, this would be bad. So we have more elaborate procedures for deciding who has control over those social projects.

ALIEN: Okay. I was worried for a minute that humans organised their society by having people with a lot of social status say "make me leader and I will give you stuff", those people forming the biggest, strongest group, and having everyone else defer to them. It's good to know you've got a better system than that.

HUMAN: Well, uh...

ALIEN: Oh, sweet blue buggery...

HUMAN: Look, we might have six thousand years of civilisation, but we're still basically primates. "Make me leader and I will give you stuff" is still the rough template we go by, but we've learned a few things about how this should work. It's quite important that a clear procedure exists for selecting the next leader, for example. If leadership is contested, you get power struggles and instability. It's also important that the leadership has some sort of "legitimacy", in the sense of people accepting that the leadership is there through a collectively-sanctioned process. Also the leadership needs to persist long enough to accomplish things, but not so long that bad leadership can't be removed without blood running through the streets. It's also generally the case that the more people the would-be leader has to promise to give stuff to, the less stuff that leader can keep for themselves.

ALIEN: So the election..?

HUMAN: ...is a well-defined procedure where all humans in a large political community (who meet some eligibility criteria) register their preference for which coalition of high-status humans are given short-term control over a selection of large and important social projects.

ALIEN: And how is it working out for you?

HUMAN: I'll admit it could be going a little better, but given it started out as chimps ganging up to kill each other, it's an absolute miracle.

The Other "Literally"

When I say "I'm sick to death of your constant bickering", I am not actually claiming that your constant bickering has left me sickly to the point of mortal death, and you know it. This is a common figure of speech. Moreover, if you were to try and criticise this statement by pointing out that it was untrue, owing to my rude health and stubborn habit of not being dead, everyone would think you were being an arse.

In saying "I'm sick to death of your constant bickering", I'm saying something I don't literally mean. I don't expect to be held accountable for it, (at least not the "to death" bit), and everyone is complicit in this. This is fine, and happens for understandable reasons. I'm also not really so hungry that I could eat a horse, and whether Colin could actually organise a piss-up in a brewery is besides the point when assessing his competence. We have an elaborate zoo of literary and rhetorical devices in which we do not say what we literally mean and yet people still understand us. Fine. Fine and peachy.

But there is another type of wordy embuggerance I see people using, where they say something they don't mean and don't expect to be held accountable for it, but they very much should be. I don't have a good name for it, and I don't know how to deal with it. If I had the former but not the latter, I would call it out every time I saw it until my lungs gave way. If I had the latter but not the former, I'd be entirely satisfied with burying it in an unmarked grave.

*****

If you were listening to Romeo waxing poetic about Juliet, and he said "Juliet is the sun", you'd know he wasn't literally saying Juliet is an astronomical ball of plasma approximately 1.4 million kilometres across. You'd surmise that he was employing a figure of speech, because the literal interpretation is absurd.

If he were to say "Juliet is the worst person in the history of the world", you'd probably think that he was using hyperbole. You'd be on slightly shakier ground here, because there is a literal interpretation of his statement that makes sense, but to the best of your knowledge Juliet hasn't committed any genocidal atrocities or instigated any transglobal conflicts.

If he were to say "Juliet thinks she's Ginger Rogers, but she isn't", you would not think that Juliet is experiencing an identity crisis, which Romeo is dutifully reporting. Romeo is probably implying that Juliet has an over-inflated opinion of her own dancing. This is a more impressive inference on your part. If Juliet did think she was Ginger Rogers, and Romeo wanted to express this fact, he'd probably say something like this. In spite of this obstacle, we understand the sentiment he is expressing.

So far, so satisfactorily non-literal. Here is where we cross the threshold.

If Romeo were to say "Juliet expects to get what she wants because she's wealthy and pretty", I wouldn't be sure what to do with this one. On the one hand, it's a factual claim. You could plausibly (and reasonably) interpret it as Romeo asserting that Juliet expects to get what she wants, and that this is caused by some combination of her being wealthy and pretty. As a parseable declarative statement, it's vulnerable to counter-assertions: if Juliet doesn't always expect to get what she wants, if she isn't wealthy or pretty, or if the causal claim between the two doesn't hold, it is false.

On the other hand, you could imagine that Romeo is simply expressing the sentiment "Juliet is spoiled and conceited", or perhaps just "boo Juliet!" If you confronted him on the factual merit of the statement, you could imagine him defensively making further disparaging remarks about Juliet, swigging from his ninth bottle of Blue WKD and shaking his fist at the sky.

This is the wordy embuggerance I'm talking about. On one level it has a plausible literal truth condition with implications if that truth condition is met, and there is no obvious indication that it's not meant to be taken literally. On another level, it is expressing a sentiment the speaker feels passionately about, and you can't take that away from them.

*****

Many years ago, on the lofty philosophical forum known as LiveJournal, I was in a discussion about Stephen Fry. He'd apparently said something disparaging about women, and my interlocutor claimed this was because he was a gay man.

I questioned this. If he wasn't gay, would that have made him less likely to say it? Are other gay men at greater risk of saying these things than heterosexual men? Is his gay-mannitude really the dominant causal factor in making this specific remark about women?

Her response was something along the lines of "well, if you're going to take it to ridiculous extremes..."

Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaah

I don't think it's taking things to ridiculous extremes to hold people to task on the meaning of what they say and the implications thereof. But there is a problem here. If you refer to that meaning as "the literal meaning", it opens the door for arguments about implied meaning and subjective interpretation and hermeneutic witchcraft that are completely besides the point of not saying things which have readily-available meanings you don't care about.

So what do you call that interpretation if not "the literal interpretation"? The actual meaning? The for-realsies that-which-people-who-care-about-what-words-mean meaning? The woman I was talking to on LiveJournal had a legitimate sentiment about the uniquely-privileged position of gay men (the veracity of which is outside the scope of this post), yet I can't help but feel that sentiment is subordinate to what the words actually mean, and in the absence of it being a common figure of speech, or other indicators that it's not to be taken literally, that's the standard you should hold it to. That doesn't feel like the curmudgeonly insistence of a square-headed logic-warrior. It feels like a sensibly-prescribed method for using words.

My verbal response to that LiveJournal comment was something about how words mean things, how they do important work, and how they aren't just billboards for political causes. My psychological response was to have a part of my mind start screaming, and to this day it hasn't stopped.

*****

Once you start looking for truth conditions, it's hard to stop. The process is quite simple: just ask yourself "what would it mean for this statement to be true?"

Here is an opinion piece from The Independent about how masculine stereotypes are horrible and damaging and generally boo-worthy. I'm not a massive fan of opinion pieces, since reading them will probably not make me smarter, happier, or more well-informed. That said, I'm pretty on-board with this one. A friend of mine shared it on Facebook, along with the following singled-out quote:

...young men need to understand as early in their lives as possible that men have a long history of getting their way for no good reason.

There's a charitable sentiment to extract from this about male privilege, and a slightly less charitable sentiment along the lines of "Go Team Feminism". I don't object to either of these sentiments, but I do object to the statement itself. It seems indefensible from a variety of angles. There are many adjacent statements on much sturdier ground, but the author didn't choose any of those. There are also many less-objectionable sentences in the piece, yet the friend who shared it on Facebook didn't choose any of those either.

Most importantly, if you try and reason with this statement, weird answers will fall out of your reasoning process. This doesn't seem to stop people, though they may as well be reasoning with a flag or a Beach Boys song for all the good it will do. Try it yourself: what would it mean for this statement to be true? What are the necessary conditions for this to be the case? What are the necessary consequences of this? Are these conditions and consequences met and born out by other observable facts about the world, or do they generate conflict? If you assume its truth, can you only generate reasonable implications from this, or does it produce implications that are inconsistent with other things we hold to be true?

If this feels like it's being taken to a ridiculous extreme, I have to ask: why give someone the opportunity to take it there? Reductio ad absurdum isn't something Alice does to Bob that reflects badly on Alice; it's something Bob allows Alice to do that reflects badly on Bob. If you can express a sentiment using literally true statements, (as is quite achievable in the above case), why don't you? If you can't express a sentiment in this way, maybe there's a reason for that.

*****

There's a simple take-away from this post: don't make words that don't mean what you want them to mean. In the words of H.P. Lovecraft, do not call up that which you can not put down.

I would like to think that there are two sorts of people in the world: those who value precise, meaningful discourse evaluated by a common standard, and those who don't. That way we could slowly convert the latter into the former. Unfortunately, I don't think the world is that tidy. Plenty of smart people will throw words around without stopping to consider what those words mean. I can't see this being good, and I'm not sure how to convince them to do otherwise.

It feels like it's a straightforward question: "what would it mean for this statement to be true?" Having an answer for this question in response to the statements you make feels like a reasonable minimal standard, but I can appreciate it's quite hard to live up to. Still, as mental habits go, it's a powerful one to cultivate.

Say it to yourself a few more times: what would it mean for this statement to be true? For the next five claims someone makes, ask it to yourself and see what happens.

Imaginary Expertise

I'm currently reading The Sense of Style, partly because everyone else seems to be reading it, and partly because it's so good I have trouble putting it down. The book is a contemporary style guide, combining linguistics, cognitive science and writing convention in a compendium of advice on writing well. Linguist, writer and cognitive scientist Stephen Pinker is obviously well-placed to write such a book, and I'm glad he has.

Aside from being an interesting and useful guide on forming pleasant English, this book has also given me something else: it's allowed me to capture an example of my own bad behaviour out in the wild. You see, I think I know about grammar, but I don't. Not really. Beyond secondary school I have no formal education in the subject, but I've read enough around it to feel like I know what I'm talking about. This is useful, because I have an in-situ example of what this behaviour feels like from the inside.

It's not that I don't know much about the subject. Far from it. Compared to Average McStreetPerson I know an awful lot about grammar and linguistics, but that's a lousy scale to hold oneself against. Someone who reads a lot of popular science knows much more about physics than Average McStreetPerson, but that doesn't mean they know a lot about physics compared to the field of academic inquiry known as physics.

There are a few features of Pinker's book that alerted me to my imaginary expertise in grammar, most notably his talk about contemporary theories of grammar overthrowing incumbent ideas I thought of as canonical. In the abstract I was dimly aware that people must be doing modern research into the subject, but I could't have told you what any of it was.

I can't remember where, but I distinctly remember reading about some research on how humans have trouble distinguishing between familiarity and knowledge. This is presumably responsible for the experience of looking over what you've just been revising, and feeling like you know it without actually having retained anything. I wonder if imaginary expertise is like this. I'm familiar enough with, say, verb forms to follow a discussion about them, but I don't have command of them. My familiarity is reinforced every time I follow such a discussion, but my command faculty is rarely challenged, as it would be if, say, I took an exam.

If this is the case, it gives a useful rule of thumb, which in retrospect seems obvious: take tests on subjects and see where you fail. The less obvious aspect of this rule of thumb is presumably to avoid questioning the need to take a test, or to make excuses as to why they're not necessary.

The Other Groupthink

[Experimenting with bashing out a few hundred words on something I'm thinking about. I start writing too many things before abandoning them for not being immaculate. Perfection is the enemy of the tolerably mediocre.]

I have spent a considerable amount of time, effort and resources learning about subjects such as social choice theory, international development and government regulatory mechanisms. These are diverse areas of inquiry which, for historical and administrative reasons, are placed under a common category called "economics", but which otherwise don't resemble one another very much. So when someone comes along and makes broad sweeping statements about "economics" or "economists", as if these words point to coherent homogeneous groups of people and activity, I want to scream in their face until the seas boil and the skies fall down.

There may very well be all sorts of legitimate criticism to be levied at some subset of economists, but if you try and reason non-trivially at the group level about all economists, the chances of you not being somehow wrong are as good as zero. I've made this case in a couple of places on the subject of economics, (c.f. "screaming", "seas boiling", etc.), but the general pattern keeps on coming back to haunt me for groups in general:

Although [group] sounds like it's referring to a coherent class of people or activity, [group] doesn't actually capture a homogeneous category in general, and doesn't define the features you're arguing about in particular. As a consequence, thinking and talking about [group] as if this wasn't the case is going to have undesirable results.

*****

In a few places recently, I've heard complains about superfluous use of the word "white". This is probably best exemplified by the term "white nerdy males". The word "white" in this construct isn't really doing any work in whittling down the referent. Someone making a statement about "white nerdy males" isn't trying to exclude ethnic-minority nerdy males from their statement, but trying to stick as many "privileged-group" labels onto the referent group as they can before making claims about it. There may very well be a group of people to whom their statement or reasoning applies, but all literal members of the group defined by "white nerdy males" probably isn't it.

Scott Alexander made a very interesting post on the subject of unhelpful political coalitions accidentally falling out of perceived threats to a wider group identity. Then he shot himself in the foot a bit by reasoning about feminism as a coherent homogeneous category of activity. More recently, I think he's been getting better at this.

*****

For the past couple of weeks I've been trying to notice whenever I, or anyone else, makes claims about a defined group that probably isn't the intended group to whom the claim applies. This has been quite humbling. I've received specific training in figuring out to what extent individual behaviour generalises and generalised behaviour is applicable to individuals, and yet I clearly don't apply this as a matter of course.

A lot of very silly disputes go away when you realise the object of dispute doesn't really exist in any meaningful sense, and if I can get into the habit of questioning whether any given group I find myself reasoning about is fit for purpose, I hope to make a lot of problems go away, or at least flatten out into more manageable ones.

The Arse-Cheeks of Statistics, and Titillating one's Smart-Parts

[This post is brought to you by me being grumpy about Epidemiology: A Very Short Introduction.]

One of the reasons I went off popular non-fiction a couple of years ago was that it got repetitive. There are only so many times you can read an introductory explanation of Prisoner's Dilemma without wanting to chew up the pages and spray them like confetti out of the window. It makes me wonder how many pages I've read, redundantly reintroducing me to something I know already. How many books do they all add up to?

I can forgive something like Prisoner's Dilemma because game theory is slightly esoteric, but I seem to have a lot less patience when it happens with statistics. I have lost count of the number of places where I've been "introduced" to Pearson's r, or regression to the mean, or sampling biases. My dear host, we've already met, and I know them well. Very well. In fact, I can tell you what kind of underwear they prefer and which songs they end up singing when they're drunk at 3 o'clock on a Saturday morning.

It's not just me. A lot of people are very friendly with introductory statistical concepts. It is not a shy discipline. Anyone who's completed a quantitative research methods course (the majority of science graduates?) has seen at least one arse-cheek of statistics. This is a fairly standard and cross-domain body of knowledge, so let's not be coy.

I assume authors don't want to spend their time torturing metaphors for linear regression either. Nate Silver must prefer to state facts about his material in the obvious language for doing so. I can't help but feel that there's the opportunity for a much better class of popular science writing, if we could just nudge the general public's statistical literacy up a couple of notches.

This would possibly seem a bit elitist as recently as ten years ago, but there are so many resources for gaining statistical literacy in 2014. If you're in a position to read Freakonomics, you're probably in a position to take one of the dozens of MOOCs on introductory stats, or to pick up the quite fine Cartoon Guide to Statistics.  If everyone learned this once, we could factor out hundreds of pages of insipid statistics in all our future reading.

In my ideal world, authors should be able to stick a logo on the front of their book. Maybe a nice red bell-curve. That logo will say:

To read this book you should know, more or less, what a probability distribution is. You should understand measures of dispersion and central tendency. You should understand logs and e. You should know what 'population' and 'sample' are referring to in context. 'Significance testing', 'statistical power' and 'regression' should not be alien terms to you. It would be nice if you had some idea what 'Bayesian' means. We trust that if you don't know a particular piece of terminology, you can look it up on Wikipedia. This book is not just here to titillate your smart-parts. It's here to teach you something.

I think a lot of people read pop-sci books to feel smart, and to signal their smartness to other people. If you make those little red logos an intellectual status symbol, you're going to seriously put the boot up the level of public discourse.

Stuff for the Ironing Pile

"This is an interesting avenue of thought. I should iron this out, and then write about it on that blog I allegedly maintain."

Who has time for ironing?

Can you computationally discriminate obscurantist nonsense from legitimately hard subject matter on a linguistic basis?

Kicked off by this post on Hegel. Are there linguistic (lexical, syntactic, whatever) differences between texts that are deliberately written to sound impenetrably profound, texts that are explaining something badly, and texts that are working with legitimately complex subject matter? Could you write a document classifier to discriminate between the different cases?

Related #1: is "bad communication" stylistically convergent (characterised by common amateurish mistakes), or stylistically divergent (the result of any number of deleterous mistakes, cf. the Anna Karenina Principle)?

Related #2: It's my observation that people emulate the written style of their influences. Is it possible to track this?

Why do people want to pathologically classify things?

I occasionally find myself in "multidisciplinary" study groups. Just recently I've been doing a "for fun" MOOC on political economy, and the discussion forums are particularly bad for a behaviour which I've noticed before but not thought about too much: obsessive classification. Is this an art or a science? Is this a positivist or an interpretivist approach? Is this a left-wing or right-wing idea? Is it holistic or reductionist? This seems to be universally pointless class of discussion. Why is it so pervasive?

Infinite-Oregano Concerns

In my last post, I talked about how shifted burden-of-proof and Courtier's Replies allowed one person in a dialogue to generate arbitrary amounts of work for the other. I described this as looking like a "rules-lawyery, infinite-oregano concern". After making that post, I realised that a vanishingly small number of readers will understand that reference. I'd like to try and remedy that with this post, and perhaps elaborate on the idea a little further. Once I'm done, I may confusingly forward-reference this post from that one to confound future web-archiving software.

*****

In 2008, Wired Online put up a short commentary piece on what cookbook reviews would be like if they were subject to the same criticism as RPG books. I used to be a role-player in a previous life, and this was so on-the-nose it got linked like wildfire1 by many of my old role-play buddies. By far, my favourite excerpt is the following:

Posted: 12:48 a.m. by Goku1440 I found an awesome loophole! On page 242 it says "Add oregano to taste!" It doesn't say how much oregano, or what sort of taste! You can add as much oregano as you want! I'm going to make my friends eat infinite oregano and they'll have to do it because the recipe says so!

This is an example of rules-lawyering: being more concerned with what the rules allow you to get away with than playing the game as intended. Goku1440 has interpreted the vagueness of the recipe as a vulnerability that is open to abuse. Feeding your dinner guests infinite oregano is absurd, and hence the scenario is funny2. In the context of a recipe in a cookbook, and in real life in general, you would never be realistically concerned that someone would force you to eat infinite oregano.

This should hopefully convey what I mean by "infinite-oregano concern": a concern that a rule, policy, or convention (or lack thereof) might be open to abuse, even though such an abusive outcome is unrealistic. I described shifted burden-of-proof as looking like one because in an actual discussion you wouldn't blithely and meticulously evaluate every unsubstantiated claim your interlocutor made; you'd go and find someone else to talk to. You wouldn't just sit there and eat the infinite oregano they were trying to feed you.

*****

When I found myself using the phrase "infinite-oregano concern", one very clear example of such a concern came to mind. Several months ago I put forward an argument against allowing people to select their own arbitrary set of personal pronouns3. I won't go into it in detail, but the gist is as follows: although personal pronouns may refer to a subject, it's generally not the subject who has to use them. It might seem reasonable that we refer to people how they wish to be referred, but it's possible for those wishes to comprise an unreasonable or unworkable expectation on the part of others.

Consider the person who changes their name to "(+)--(*)-(@@@)-(*)--(+)"4. If this is the name you wish other people to use when referring to you, it seems unlikely that many people would find this reasonable, and even fewer would find it reasonable if you expand this out to a whole set of declined parts of speech.

My discussion partner, though amenable to this argument, proposed that "reasonableness" was a sketchy basis for issuing policy about identity politics, and my concern only became a problem when taken to extremes, which are rare and identifiable on a case-by-case basis.

In other words, it's an infinite-oregano concern.

*****

At first blush, it seems that raising infinite-oregano concerns is pedantic and unhelpful. No-one can force you to use their exotic pronouns that look like they were lifted from the Linux command line. Why express concern for an event that will never come to pass?

I think this is the wrong question to ask. The burden-of-proof convention doesn't safeguard us against our interlocutor forcing us to verify all their claims, but against a breakdown of dialogue. I'm not worried that someone issuing a bunch of Courtier's Replies will oblige me to read a stack of antiquated theological essays; I'm worried that someone issuing a bunch of Courtier's Replies will mean I'll have to stop talking to that person.

In the case of personal pronouns, names, and forms of address, the Infinite-Oregano Concern is slightly more fiddly. "(+)--(*)-(@@@)-(*)--(+)" is clearly unfit for purpose as a name, but the deed poll restrictions prohibit a wide selection of options that probably are fit for that purpose. No doubt some people would like these restrictions to be more stringent and others would like them to be less so, but the existing restrictions provide a common standard of expectations for what people might accept or exhibit as a name. This isn't a trivial concern. You can only obtain a bank account if you have a name the bank's software can validate, and many legal procedures and ceremonies require the utterance of one's name5.

In the absence of any similar clear set of restrictions for whatever pronouns people might like, it's hard to coordinate on a set of standard expectations, and without those expectations, it's easy to imagine how the occasional untenable request might pose a setback to the broader discussion. I won't go so far as to claim some imposed restriction would fix this, but I will tentatively speculate that it might help.

*****

When I first started writing this, I became mildly concerned that I was just reinventing the slippery slope argument6, but I'm now reasonably convinced I've carved out more of a negative image of the slippery slope. A slippery slope argument would run "if we allow people to use any oregano at all, they'll end up forcing us to eat infinite oregano", whereas the infinite-oregano concern would go "we have to safeguard against the risk of being forced to eat infinite oregano in order for people to feel comfortable about having any oregano at all".

The concepts are parallel enough that I don't expect "infinite-oregano concern" to catch on, but I think it's a useful pattern to consider. The next time you find someone protesting against an outcome you know to be implausible, could there be an underlying fruitful negotiation that has to fall apart to prevent that outcome from happening?


  1. I'm not sure wildfire is actually linked all that much 

  2. Jokes become way better when you explain them, right? 

  3. I don't think too many people are actually for this specific proposal, but it was part of a broader discussion about pronoun usability 

  4. I should have checked this at the time, but there are an existing set of restrictions on what you can change your name to by deed poll, and pronounceability is the second requirement on the list. 

  5. Fun fact: if you're deaf-mute in the UK, you require an interpreter to make your wedding vows. 

  6. For a good take on the legitimacy of slippery slopes, I recommend Scott's LW post on the subject 

The Underlying Problem of The Courtier's Reply

You're probably already familiar with the Hans Christian Andersen story of The Emperor's New Clothes, in which the eponymous emperor is fooled into purchasing a set of clothes so fine that only the most intelligent and sophisticated people can see them. In fact, the clothes don't exist, but the emperor, all his courtiers, and much of the population all pretend to be able to see them for fear of appearing foolish and unsophisticated.

I have often wondered why the emperor didn't raise the objection that he didn't want unsophisticated people looking at his balls.

This tale also lends a name to an argumentative tactic known as the Courtier's Reply, echoing what one of the emperor's courtiers might say when quizzed on the emperor's junk-baring status: "you're not sophisticated enough to see the clothes". The prototypical example is the theist who turns to the atheist and says "if you'd read all of Augustine and Anslem and Thomas Aquinas, you'd see that, actually, our arguments for the existence of God are very well-substantiated, and until you read and understand this material, you're not in a position to criticise it."

The Courtier's Reply is regarded in some quarters (particularly atheist areas of the internet) as an out-and-out fallacy. I have some sympathy for this position. As a rhetorical device, "you don't know enough about this topic, and if you did you would agree with me" is frustrating, unproductive and just plain rude. That said, I don't think we can legitimately call it fallacious in its own right. Moreover, its existence points to a genuine underlying problem for which I don't have a good answer.

*****

We should probably first look at the problems with issuing Courtier's Replies from a practical standpoint.

So, burden of proof is a messy and awkward concept that lots of people get wrong, but at its heart, in the context of argumentative discourse, it boils down to the notion that if you make an assertion, you have to provide support for it. There are a few practical reasons behind this, one of the most salient being that without it, Interlocutor-B can force Interlocutor-A to carry out an arbitrary amount of work before conceding the point. Unless Interlocutor-B holds responsibility for substantiating their own assertions, they can keep manufacturing claims for Interlocutor A to evaluate for relevance, at Interlocutor-A's expense.

Issuing a Courtier's Reply has a very similar problem. Interlocutor-B can generate an arbitrary amount of work for Interlocutor A for as long as they can name areas Interlocutor-A is not familiar with. This might sound like a rules-lawyery, infinite-oregano concern, but I suspect most people who have tried to carry out productive discussion on the internet will concede that other people have minimal regard for your time or effort.

It's worth mentioning that, much like the relevance problem which burden-of-proof tries to circumvent, this isn't an infraction of logic, but of courtesy. You can't use either to substantiate factual claims. Provided you're not doing this, you're not making a fallacious argument. You are, however, being kind of a dick, and if you persistently generate work for your interlocutor, they would be quite justified in not engaging with you.

*****

There is another very important reason why a Courtier's Reply is not intrinsically fallacious: sometimes it's quite correct. If you believe the moon is made of green cheese, there are some books you can read, and once you've read and understood them, they will quite probably change your mind.

The green-cheese-moon example, though exotic, is a surprisingly apposite one. Someone who has come to the belief of the moon being made of green cheese has a very wrong conceptual model of a lot of astronomical phenomena, and rather than figure out how this bad model is put together, the least amount of collective work probably involves pointing them in the direction of appropriate educational materials. There are a lot of similarly exotic memeplexes out there which people buy into through ignorance. I have no idea what diseased ideas lead people to believe in the Redemption Movement, or the Phantom Time Hypothesis, but the materials necessary to disabuse them of these notions are fairly identifiable, and it's doesn't feel like it's worth your time or mine to hold their hand while they work through it1.

As you may have established from other posts on this blog, I have a modest economics background. This is an area of public discourse where ignorance reigns like a mighty tyrant god-emperor. A significant number of pressing public issues are economics issues, and it should come as no surprise that the discipline has input into them. Nonetheless, certain subsets of the public imagination have cast economics as the bastard offspring of Mr. Spock and Margaret Thatcher, and many people, from a position of complete ignorance, have decided the subject is end-to-end nonsense unworthy of their time. When these people make pronouncements about Robin Hood Taxes and the inevitable collapse of global capitalism, the Courtier's Reply looks deliciously tempting.

*****

We now come to what I see as the underlying problem of the Courtier's Reply: how do I know I'm not the ignorant one?

It's not like I've never thought along the same lines as my economically-illiterate nemeses. I'm pretty sure that if I were presented with a post-structuralist Marxist critical post-colonial analysis of Russia's current actions in Ukraine, I'd probably assume it was of extremely limited value, and this is largely based on my own preconceptions. But how do I know these preconceptions are serving me well? How do I know I'm not labouring under some exotic combination of falsehoods that would be torn apart if I just read the original Jacques Derrida?

The "PoMo cluster" radiates very strong repulsion forces to those of a technical bent. I shall ostensively define the PoMo cluster with the following examples: postmodernism; post-structuralism; post-*; critical anything; continental philosophy. The output from these areas all look like the same sort of loopy word-salad to the median STEM-background observer.

At time of writing, I'm fairly sure I could provide an explanation for what is meant by "postmodernism" that would satisfy a plurality of proponents, and I'm deferring final judgement on the broader cluster until I've explored it further, but I still mistrust the cluster. Part (though by no means all) of the reason for this mistrust is that various bits of it feel like machines for manufacturing irrelevant claims and Courtier's Replies.

That said, I imagine this is what economics might look like to my economically-illiterate nemeses.

*****

As I said at the beginning, I don't really have a satisfactory answer for this. It falls into the general problem of "where is the good stuff that I should be reading?" The very term "Courtier's Reply" implies that the replier is defending a naked emperor. Naked emperors certainly do exist (figuratively); there are ideologies and disciplines and sets of belief which must be wrong, and yet their proponents will, in good faith, say they are right, and provide you with a litany of corroborative material beyond your resources to study. There are also ideologies and disciplines and sets of belief that look exotic from our perspective, but turn out to be extremely useful. In some cases they can demonstrate that usefulness immediately, but in others, there's little to distinguish them from a naked emperor.

How do you distinguish a naked emperor from an emperor who merely looks like his balls are showing?


  1. Less charitably, we might suppose that people who have manoeuvred themselves into seemingly-untenable positions have done so for motives other than pure rational inquiry, in which case we're probably wasting everyone's time even further.