science – long trails

It’s not about avoiding to offend…

Women in STEM, diversity in the workplace, problematic sexual or racial imagery at tech conferences: Not a week goes by without a discussion along these lines. There’s an aspect to the conversations for which I’d like to have a handy reference, because it often gets lost in the heat of the situation. This post is intended to fill the role.

Here’s a sample situation. Let’s say I’m looking over someone’s presentation slides and I’m seeing a male pronoun where both sexes apply (“the researcher then saves his data on his thumb drive…”), or an image of a female body used merely to create drama. I would point it out and suggest to reformulate or rethink. And often enough, since my interlocutor is more likely to be clueless than a raging misogynist (after all, they’re asking me for advice!), the reaction is embarrassment: “I didn’t know this is offensive,” or even “I didn’t mean to offend you, sorry!”

There are two things I want to say at this juncture. The first is that I’m unlikely to be actually offended. Certainly not about a thoughtless pronoun, and believe me, I’ve seen erotic images before. These days it takes a lot to make me uncomfortable. Sure, it does happen, a few times a year, when some idiocy feels like a punch to the stomach. It would be more frequent if I hung out more in certain corners of the tech world (keyword “gamergate”). But my personal threshold is irrelevant here, and in any event, don’t presume you know someone else’s feelings.

Second, your goal shouldn’t be to avoid offence under all circumstances: it should be to consider what signals you’re setting, and what these signals say about you and the community you’re addressing. Do they say “my peers may be male or female, and my pronoun choice reflects that” and “stereotyping and objectification do not reflect an acceptable way of relating to each other in this community”? Or do they say “people from underrepresented groups will occasionally have to put up with being the butt of jokes or be forgotten in our planning, because we historically didn’t have to think of such trivial matters”?

I would even go so far as to say giving offence is sometimes inevitable. The racist reader of Houstonia Magazine who called in to complain about an ad because he “just can’t go for racial mixing” quite likely feels genuinely offended at the sight of a picture of a mixed-race family. Similarly, the homophobe may feel sincere discomfort at the sight of two men kissing. And I remember discussions during my youth when it was considered quite reasonable for a man to feel uncomfortable about reporting to a female boss, and an unfortunate fact of life that women who want careers would have the extra task of dealing with such obstacles. In all three cases my attitude, and surely not just mine, is to put the onus firmly back onto the racist reader, homophobic neighbour or sexist employee to a) put up with it and b) use it as an opportunity to examine their prejudices and biases.

I’m not making the moral relativist’s argument here: quite on the contrary. Feeling offended at sexist jokes is not equivalent to being offended about women having access to roles of authority. The hurt feelings of the racist don’t have the same weight as as the hurt feelings of a non-white person who has to prove their competence multiple times all over. As for our professional (or recreational) communities, we cannot resolve an ethical problem (equality of opportunity) without making a commitment to a set of values about diversity and inclusiveness, even if it means the traditionalists have to adapt.

The problem I’m interested here is offending someone or making someone uncomfortable merely for not being part of the majority group, in a situation where they could reasonably expect to be free of discomfort and treated with professional courtesy. When I say “ugh, this is really offensive” this is usually what is meant.

You might think I’m stating something that everyone implicitly understands. But I still think it’s important to be clear and precise about the distinction, for a number of reasons.

1. To counter a dismissive “she just takes offence easily”. Sure, some people take offence more easily than others. People vary. Some even take offence based on a misunderstanding. It happens. But it’s irrelevant. A point stands whether the person highlighting it speaks with perfect calm and detachment or with visible pain and anger.

2. Because otherwise the problem may be relegated to an inter-personal matter even though it is about systems and communities. It’s not about avoiding to step on someone’s toes, but about who is made feel welcome and who is being excluded or pushed to the margins.

3. Because the focus on offence seeks simple formulaic solutions to ethical problems. We can’t make our communities inclusive by box-ticking. Removing some symbols of discrimination (such as sexualized images) doesn’t automatically make peers consider each other’s contributions fairly.

4. Because offence and discomfort cut many ways. Already we’re seeing attempts to borrow the language of diversity and inclusion to remove challenging literature from school curricula or material about sex and sexuality from youth sections of libraries, or to justify restrictive dress codes. There is no contradiction between rejecting eroticised images on presentation slides and wanting libraries to offer factual, complete information about the anatomy of human bodies and the biological, social and psychological aspects of sex.

To finish, lest it seem I’m slamming the use “offensive” without further qualification: Even though there’s no right not to be offended, offence and discomfort are still symptoms of a problem. It’s not hypocritical to complain about it. Simply, when examining one’s own values and biases, or when writing, say, a code of conduct for a community, it’s a good idea to figure out what exactly is the kind of inclusiveness and freedom of offence we want to achieve.

Friday link dispatch 01

On one of my blogs, there used to be automatically generated link posts via Delicious.com. The method was never very reliable, and I abandoned it as it was never updated from its rather basic functionality. In particular, every single link I saved on Delicious.com was re-posted (instead of, say, just the links marked with a “post-me” tag). But I miss the link roundups. So let’s bring them back.

How to choose appropriate terminology when writing a historical novel. Which of the following words would you expect were not being used at all in the early 19th century, or had a markedly different sense than in today’s English: manipulate, blink, looped, conversationalist, knowledgeable, traipsing? The writer Marie Robinette Kowal, author of (among other works), Glamour in Glass, which is set in 1815, presents her anachronism-busting method. It involves extracting a word list from Jane Austen’s oeuvre and looking up each non-Austen word in the OED. (Via Language Hat.)

Earliest know uses of some (many) of the words of mathematics and earliest known uses of some mathematical symbols:

FRACTAL. According to Franceschetti (p. 357):

In the winter of 1975, while he was preparing the manuscript of his first book, Mandelbrot thought about a name for his shapes. Looking into his son’s Latin dictionary, he came across the adjective fractus, from the verb frangere, meaning “to break.” He decided to name his shapes “fractals.”

Fractal appears in 1975 in Les Objets fractals: Forme, hasard, et dimension by Benoit Mandelbrot (1924- ). The title was translated as Fractals: Form, Chance, and Dimension (1977).

These pages, which must have been around for some time, are the work of Jeff Miller. Full of historical, lexical and typographical information and rich in references.

Tai, Chen-To: A historical study of vector analysis. I’m reviewing some of the maths I knew 15 years ago (gracious, am I rusty!) and came across this 1995 paper (available as a PDF file),which is even geekier (and certainly more specialized) than the pages in the previous link. It presumes familiarity with the subject of vector analysis as taught to math, physics or engineering students in their first years and covers historical texts mostly from mathematics and electromagnetism with respect to the notation of the derivatives (gradient, divergence, curl), with or without the Nabla operator ∇ (also called del). The author is opinionated and also has a second text, A Survey of the Improper Uses of ∇ in Vector Analysis.

Personal names around the world. A short but useful page from the World Wide Web Consortium.

People who create web forms, databases, or ontologies are often unaware how different people’s names can be in other countries. They build their forms or databases in a way that assumes too much on the part of foreign users. This article will first introduce you to some of the different styles used for personal names, and then some of the possible implications for handling those on the Web.

(Hat tip: Pat Hall on Facebook.)

Google’s h mystery

A few days ago, my friend Melinda Shore, who knows I’m interested in internationalization, sent me a screenshot from the search bar of her Safari browser. It is a drop-down list of search suggestions provided by Google just after typing the letter h:

The top suggestion is a mess:

What does it mean?
Why is it a legitimate search suggestion for the letter h? (If it is.)

Regarding 1., the search suggestion in Firefox is nearly identical, but I cannot reproduce the effect in Google’s own browser Chrome or on the search page directly. In the Safari example, we’re dealing with an odd mix of regular character strings (6.626068, 10, sup, -34), numeric HTML (or XML) entities (×) and raw Unicode-escaped characters that you might find in Python, C or Java source code (\u003C, \u003E). Let’s decode the second and third type of components:

\u003C and \u003E simply represent the Unicode code points U+003C and U+003E: the less-than and the greater-than signs < and >.
× is U+00D7 MULTIPLICATION SIGN: ×

Putting it together, we get the already much more user-friendly form

6.626068 × 10<sup>-34

or, completing and resolving the HTML: 6.626068 × 10^-34.

Once I realized this, my physics training kicked in and the answer to 2. became clearer – Planck’s constant, abbreviated as h, has the value of 6.62606889 × 10^-34 J s (or m² kg / s). This is not the result of injecting broken text into the search engine results, but a feature of Google’s calculator. Typing “G” into the browser’s search bar also yields similar semi-numeric character salad, while the results for “c“, “e” or “pi” are much more legible.

Still, the entire story raises questions about intent and execution. This is not really an internationalization issue because the form of those physical and mathematical constants is largely invariant by convention. Yet, the tools of internationalization — HTML entities, Unicode code point escapes — have leaked into scientific character display, too. Internationalization is a user interface (usability, user experience) issue [1].

On the execution side, Google got it wrong on several counts, and Apple and Mozilla share some of the blame. Browser search bar drop-down lists don’t allow for superscripts and aren’t sophisticated enough to strip markup, so they display ugly raw HTML. Choosing a numeric entity instead of the character × probably led to its display breaking. And < and > are even in ASCII, so they should display fine, but probably security concerns and their status as reserved HTML characters led to the odd choice of escaping method. All in all, at least one decoding step was not carried out.

More fundamentally, should Google suggest “6.626068 × 10^-34 m² kg / s” when you type a lowercase h? There was a time in my life when I used Planck’s constant daily, and I do use Google’s handy calculator via my browser’s search bar for quick arithmetic and unit conversions. But I think just spitting out the value with no label is going a little too far, and will for more than 99% of users be entirely unexpected: too different from the genuinely useful (for Americans) “hotmail”, “hulu” and “home depot”. Especially considering that for most letters of the alphabet, you could possibly find a scientific constant, function or theorem that starts with it.

Though maybe it is a ploy to spread more science among the people.

[1] It is also a design issue. The two aren’t mutually exclusive.

Google search result for "h" - very different from the broken suggestion

EDIT (2010-08-02): Commenters inside Facebook’s walled garden have remarked that if you actually take up the suggestion and search for it, you get to Planck’s constant. Currently this is partially right, but these things are constantly shifting: In my tests this morning, whether you use the Safari/Firefox search field or Google’s search page directly, you get a mix of results, the first of which are people wondering about the odd string on SEO forums. A little further down you do get collections of scientific constants, but you have to attentively read the result. Right now, this post is (after less than 12h) number 7 on the results page. None of the pages looks like what you get if you do a Google search for “h” (and hit return) — which is nice and helpful.

Another commenter remarks that for her, the suggestion is now prefaced with “Planck’s constant”, which is a vast improvement.

Of boys, girls, goats and probabilistic intuition

Oh, well done, CodingHorror! Jeff Atwood posted a statistics puzzle and its solution, and his readers’ reaction was just about identical to what happened in the probability & statistics course I attended 15+ years ago, when a mathematically very similar problem was set on our weekly exercise sheet. Jeff’s two posts have been commented upon more than 1500 times in about 3 days, people are vociferously defending their reasoning and accusing “the other side” of idiocy, lack of maths skills, and boneheadedness, some careful and constructive chaps have written simulations… a nice mayhem.

Back in my statistics class, getting us students all riled up and passionate about solving an exercise was of course the intended effect.

Jeff’s puzzle was formulated this way: “Let’s say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl. What are the odds that person has a boy and a girl?”

(My old statistics exercise was given in a form that’s harder to grasp intuitively: You’re on a game show, three closed doors in front of you. You know that behind one of them is a car and behind the others are goats. If the door you’ll be opening has a car behind it, you get to keep the car; if it’s a goat, you lose. You pick a door, but don’t open it yet. Then, the game show host opens a different door, one you haven’t picked, and reveals that there is a goat behind it. You now have a choice between sticking to your original selection and switching to the other still closed door. What should you do? We didn’t have Wikipedia then, or Google.)

Intuition of probabilities is a tricky thing. There are studies that show, for example, that people’s intuitive judgments are much more often correct if they are asked to reason about frequencies (how often something will happen, out of a number of repeated tries) than in terms of probabilities (a number between 0 and 1) — even though both are mathematically equivalent.

It’s the same for Jeff’s problem. My reaction to reading it was “well 50% of course… one second… there’s something fishy about it… YOU’RE DOING IT WRONG… “. Turns out, how easy it is to see the correct answer depends on what sort of interpretation you put on the original formulation. So let’s take this apart and develop the puzzle into a longer narrative:

“You’re a teacher, and overhear a father enrolling his two children, Sam and Robin. You didn’t catch all of the conversation, so you don’t know the sex of the kids. However, the father is picking up some leaflets for after-school activities and you hear him exclaim ‘You have a girl geek computer camp? Great, I’ve got someone at home who will love this!’ What are the odds that this father has a boy and a girl?” The solution is straightforward: One of the two, Sam or Robin, is a (computer-loving) girl, and for the other we don’t know — they may be male, or dislike computers, or the wrong age for the group, or not interested for any other reason. The possibilities are: Sam is a girl, and Robin is also a girl; Sam is a girl and Robin is a boy; Sam is a boy and Robin is a girl. As the probability of a random child being male or female is roughly 50%, these three combinations are equally probable. Two of the three combinations are mixed sex. Therefore, the odds of the father having a boy and a girl are 67%, approximately. This is the intended correct interpretation and solution of the puzzle.
A lot of readers, however, form a different mental representation of the problem, which translates into a scenario that’s subtly different from the above: What you hear the father exclaim is “You have a girl geek computer camp? Great, my Sam will love this!” In this case, you know which of the children can be identified as known to be female, so the possible combinations are reduced to: Sam female – Robin male, and Sam female – Robin female. 50% odds of the family having a boy and a girl. However, nothing in the original formulation indicates that you know which kid is female, only that (at least) one of them is. This is why this solution is incorrect.
There is yet another, “common sense, normal English” interpretation: If I meet someone and they tell me, outright, they have two children and one of them is a girl, I can pretty much assume that the other one isn’t. This, of course, would make this not a maths puzzle, but a trick question, with 100% odds that the family has two kids of different sex. Correct or incorrect? You decide. (I’m leaning towards “correct”.)

Our frequently wrong intuition about problems like this, even among people with mathematical training and jobs that require formal reasoning, is an interesting feature of cognitive psychology. The Wikipedia article about the “Monty Hall problem” (the one with the goats) contains and links to fascinating material. Jeff’s commenters also reveal the pitfall in their mapping of the formal problem to a mental picture:

So are you actually saying that once you’ve had one girl child the odds of having a boy increase????

This example is rubbish. The question is simply “What is the probability that my other child is a boy”. The answer is 50%. The sex of the first child has no bearing on the sex of the second.

If the person has two children, there are three possible combinations of gender: BB BG GG. If we rule out BB that leaves us with two options — BG and GG — which results in 50% chance of the other kid being a boy.

Solution is flat out wrong, it assumes a dependency that does not exist. If I flip a coin, its a 50% chance of being heads, and a 50% chance of being tails. This is independent of, and completely regardless of the outcome of a previous coin toss. This is exactly the same. Having one child that is a girl has absolutely no bearing on the sex of the other child, its still the same completely random 50% chance for each sex.

I’ve just written some code to test this out, and in the procces (and result) I’ve come to the 2/3 chance (25% no girls, 50% boy+girl, 25% boy boy) conclusion. […] I’m still quite confused by the implications. It feels like I’m being told that past events influence future events.

I’d say it’s 50%. If you have two children, you have BB, BG, GB, or GG. One of them is a girl then it’s not BB. When you mention that one of the children is a girl it’s 25% chance you are talking about a girl in BG, 25% a girl in GB and 50% (25%+25%) a girl in GG. (Four girls and you’re talking about one of them with equal probability.) Then chance to have a girl and a boy is 50% because it’s either BG or GB, not GG.

Fundamentally, analysing a statistical problem in terms of equally likely, atomic combinations, is not something that comes naturally to us.

Nudibranch and emperor

While I was looking through a stack of postcards, this image from London’s Natural History Museum jumped out, so I’ll be using it today to write to a friend. The picture was taken by Malcolm Hey, is entitled “Reclining emperor shrimp” and won a Wildlife Photographer of the Year award in 2005.

The textures, the colour, and generally the calm and sense of whimsy that emanates from it are what makes this piece of photography so attractive. But but what triggered my posting this right now is the explanatory paragraph on the back of the card, which starts as follows:

Twirling and whirling in a crimson leotard and white tutu, the Spanish dancer (a large nudibranch, or seaslug) emerges to feed at night. Sometimes it has a passive partner, an emperor shrimp, tucked in the frilly folds of its gills. The tiny shrimp (about a centimetre – 0.4 inches long) turns red to blend in with its host’s costume.

Nudibranch is a great word.

(So is seaslug.)