long trails – Page 2 – chris waigl on language, science, technology, society, and her life exploring the spaces in-between

Jacques Chirac’s magical stickbread

Arnold Zwicky’s lovely post this morning about baguette and how it’s surprisingly not a diminutive of bague threw me into reminiscing about my time in Paris — 12 years of my life. Instead of continuing to hijack the comment space over there, this is something to pursue on this blog, even though we seem to be averaging a post every year and a half.

In particular, there’s the half-remembered anecdote about Jacques Chirac, the former French president, which I’ve now chased up acros the ‘nets.

Baguette (the bread), of course, is a part of life in France with a high level of cultural significance, but the word can refer to all sorts of things parting from the basic meaning “small stick”: chopsticks are baguettes, and so are drumsticks (the kind you use for operating drums with [1]); there’s conductor’s batons; and there are magic wands. And this is where the anecdote picks up: In 1995, Jacques Chirac was elected president in an election that tipped from the political left to the right. In his first public speech, he said that he didn’t have a magic wand to solve France’s economic problems (unemployment was very high and I remember research labs getting close to being broke). The phrase was seen as the new president’s central statement and widely reported including in the international press. Unfortunately, in some countries the meaning of the word baguette was so strongly linked to the bakery product — 250 g of delicious crust with a little bit of relatively heavy white dough inside — that reporters didn’t think when translating baguette magique (“magic wand”) into their own language. A Belgian paper reported on the election and speech as follows:

The Belgian paper De Morgen of September 1995 reporting that the newly elected president of France “has no ‘magical stickbread'” (heeft geen “magisch stokbrood”) in a mis-translation of *baguette magique* [2]

Those crazy French with their over-emphasis on food — ascribing magical qualities to something mundane as bread.

As far as I can tell from trying to find a correct account of this story online, the phrase magisch stokbrood has since become a little bit of a jocular cliché in Dutch and/or Flemish (I do not know if the spelling differs in the two languages), following the rise of the Harry Potter book series.

(As a final note, I chose between magic and magical in English on intuition. It’s clearly magic wand in the idiomatic expression, but I think magical stickbread sounds better than magic stickbread. Opinions?)

[1] As for the chicken parts, in France chicken legs aren’t usually separated into what in English are called “thighs” and “drumsticks”, and the entire thing — about a meat portion’s worth for a smallish chicken — is referred to as cuisse (“thigh”).
[2] I was very happy to find this image in a Belgian blog in Flemish, where I took the liberty to steal it.

Friday link dispatch 03

Today’s links still follow the endangered language theme with special emphasis on Alaska Native languages.

The first one is fun. Frozen Whitefish is a rock band from Bethel (a town and Yup’ik village of 6500 off the road system in south-west Alaska close to the coast) that was features in the Discovery Channel series Flying Wild Alaska. They sing in Central Yup’ik, so if you’re interested in learning the language, you may want to listen. And the link goes to their MySpace page, where you can listen to a number of quite well produced tracks. Here is a video, in somewhat lower sound quality, but still, charming (via the Alaska Daily News Rural Blog)

Frozen Whitefish performing Maani Alaskami live at the 2011 Alaska State Fair

The second one is serious and comes out of a gallery & workshop entitled “Living Our Cultures, Sharing Our Heritage: The First Peoples of Alaska” of the Smithsonian Arctic Studies Center in Anchorage: Sharing the Dena’ina Language (via Talking Alaska):

Sharing the Dena'ina language - a language instruction video

The third one is a news report about how to preserve an endangered language: Living Languages reports on cumpulsory Ijaw in Bayelsa schools in Nigeria. Balyelsa is a state of Nigeria. Now not all of the 10 Ijoid languages may be endangered and I have no way of gauging the effectiveness and coverage of the Bayelsa school system. Still, the approach of making a declining local language compulsory is the winning formula if the basic conditions are united. I remember that when I was a teenager in the 80s, there was much sadness and nostalgia about the imminent death of Irish and Welsh, two Celtic languages and thereby preeminent vehicles of European culture. Well, no one does this any more. It makes me very happy to hear teenagers speak Irish among each other in the streetcars of Dublin, thereby escaping the danger of being overheard by old ladies like myself — the middle-aged being the generation with the lowest rate of competency in the language. As for Wales, I hear that the demand for Welsh instruction for adults is up significantly.

Friday link dispatch 02

Today we have two Inuit (Canadian) videos to complement the recent Alaska Native language/culture resources post.

Two school girls practicing Inuit throat singing (YouTube). There are many videos on the various video services that demonstrate this art form, which can be referred to by a variety of terms and is carried out typically by two women standing close to each other, face to face. I particularly liked this video because the young women are doing it casually between school classes:

Janet Aglukkaq and Kathy Keknek throat singing between their classes at Qiqirtaq Ilihakvik High School in Gjoa Haven, Nunavut.

Anirniq – (Breath), Winner Best Short Film at the Vancouver International Mountain Film Festival 2010 (Vimeo). A magical tale in Inuktitut with English subtitles about death, hunting, nature, and the belief that when we die, our soul goes into the living beings around us:

Aniriniq - Breath (Brüdder Productions, Canada, 2010)

Friday link dispatch 01

On one of my blogs, there used to be automatically generated link posts via Delicious.com. The method was never very reliable, and I abandoned it as it was never updated from its rather basic functionality. In particular, every single link I saved on Delicious.com was re-posted (instead of, say, just the links marked with a “post-me” tag). But I miss the link roundups. So let’s bring them back.

How to choose appropriate terminology when writing a historical novel. Which of the following words would you expect were not being used at all in the early 19th century, or had a markedly different sense than in today’s English: manipulate, blink, looped, conversationalist, knowledgeable, traipsing? The writer Marie Robinette Kowal, author of (among other works), Glamour in Glass, which is set in 1815, presents her anachronism-busting method. It involves extracting a word list from Jane Austen’s oeuvre and looking up each non-Austen word in the OED. (Via Language Hat.)

Earliest know uses of some (many) of the words of mathematics and earliest known uses of some mathematical symbols:

FRACTAL. According to Franceschetti (p. 357):

In the winter of 1975, while he was preparing the manuscript of his first book, Mandelbrot thought about a name for his shapes. Looking into his son’s Latin dictionary, he came across the adjective fractus, from the verb frangere, meaning “to break.” He decided to name his shapes “fractals.”

Fractal appears in 1975 in Les Objets fractals: Forme, hasard, et dimension by Benoit Mandelbrot (1924- ). The title was translated as Fractals: Form, Chance, and Dimension (1977).

These pages, which must have been around for some time, are the work of Jeff Miller. Full of historical, lexical and typographical information and rich in references.

Tai, Chen-To: A historical study of vector analysis. I’m reviewing some of the maths I knew 15 years ago (gracious, am I rusty!) and came across this 1995 paper (available as a PDF file),which is even geekier (and certainly more specialized) than the pages in the previous link. It presumes familiarity with the subject of vector analysis as taught to math, physics or engineering students in their first years and covers historical texts mostly from mathematics and electromagnetism with respect to the notation of the derivatives (gradient, divergence, curl), with or without the Nabla operator ∇ (also called del). The author is opinionated and also has a second text, A Survey of the Improper Uses of ∇ in Vector Analysis.

Personal names around the world. A short but useful page from the World Wide Web Consortium.

People who create web forms, databases, or ontologies are often unaware how different people’s names can be in other countries. They build their forms or databases in a way that assumes too much on the part of foreign users. This article will first introduce you to some of the different styles used for personal names, and then some of the possible implications for handling those on the Web.

(Hat tip: Pat Hall on Facebook.)

Alaska Native languages

So I live in Alaska now: circumstances change, and life remains endlessly fascinating. ¹

For a new European expat in North America, Alaska is one of the more unusual places to land on. Compared to Texas, the second largest US state, it’s 2.5 times the size, but less than 3% of the population (about 700,000, half of them living in the Anchorage area). It has a variety of climates, most of them extreme, and endless environmental, geophysical and atmospheric phenomena rarely found elsewhere, from volcanoes, via the swampy tundra to the aurora borealis. Even many Americans seem to be unaware, or astonishingly dismissive, of the ways day-to-day life in Alaska is unlike any other place in the US.

One of many language-related features is that Alaska is the US state with the largest percentage (15%), if not absolute number, of inhabitants of Native American heritage. As far as language families are concerned, most Alaskan Native languages belong either to the Eskimo-Aleut (such as Iñupiaq, Central Yup’ik, Alutiiq etc.) or the Na-Dené (also Athabaskan-Eyak-Tlingit) family. Many of them, especially in the second group, are endangered (or worse).

Even though identical or related groups are involved, terminology both for people and languages is not uniform across the Alaskan/Canadian border. “Eskimo”, for example, is regarded as derogatory in Canada (and Greenland), and you’d most likely find references to Inuit peoples and (though this is a less universal term) Inuktitut for their languages, which may well be written in Inuktitut syllabics. In Alaska, while it seems appropriate to use the term somewhat self-consciously as an outsider, “Eskimo” is often found in self-descriptions and seen as useful as it is a general term covering distinct but related groups of people: “Iñupiat Eskimo”, “Yup’ik Eskimo”, though the second part’s optional: “I’m Iñupiaq and I count” was proudly written on some T-shirts for last year’s census. Oh, and as for pronunciation, I haven’t figured it out entirely, but “Iñupiaq/Iñupat” has three syllables and is stressed on the first.

My employer, the University of Alaska Fairbanks, pays attention to how it serves the educational needs of Native students and rural communities (overlapping but not identical categories), and also has a number of research interests, in particular through its Alaska Native Language Center.

The ANLC web site is worth digging around in. My favourite is the Native Peoples and Languages of Alaska map, first published by Michael Krauss in 1974 and recently (2011) updated. It can be ordered, and there is an interactive (zoomable) online version on the Alaskaskool web site (Alaska Native culture resources for kindergarten through high school teaching).

Now for learning an Alaska Native language, UAF of course offers classes (I’m tempted), but barring that, there are a number of sites that have “word of the day/week” features. Some, though currently inactive, may still be worth discovering (Athabascan word of the week, Iñupiaq Word of the Day, the Inupiatun language circle on Facebook). My favourite is the Alutiiq word of the week from the Alutiiq Museum on Kodiak Island, which I really want to go visit in person. There’s also an online shop with artwork as well as more Alutiiq language resources.

Last, blogs. Talking Alaska is a group blog on “topics related to Alaska Native languages, including language documentation, language revitalization, language activism, and language endangerment”. A recent interesting post, for example, approached the issue of whether to replace the (non-indigenous) term “Athabascan” with “Dene” (also: Dené), and why.

Via Talking Alaska I found Writing Raven, a Tlingit/Dena’ina Athabascan, and her blog Alaska Real. She has a three (1) part (2) series (3) on why it matters to keep Native languages alive and addresses a series of misinformed arguments against language revitalization. An excerpt:

For the most part, what happened to the Native languages of the Americas wasn’t a natural evolution. What happened was traumatic, invasive and left no room for real adaptation. […]
I had a great Tlingit teacher who talked to us about a common Tlingit expression I heard growing up. When someone says “Gunalcheesh” (thank you) – the response is often “Ho ho!” (you’re welcome.) I really did hear this often.
What a surprise to learn it didn’t mean what I think it meant over 20 years later! “Gunalcheesh ho ho” actually is one phrase, and is used to emphasize the thank you – like “Thank you VERY much.” There is no phrase commonly said, traditionally, to respond to thank you, as there is in English. But the “young kids” as she said (she meant my parents generation!) were changing this, and this new kind of word was emerging.
To a language, she said, this is a great thing. It shows the language is alive, and adapting. The “young kids” were choosing to change this on their own, because it suited the younger culture more, and it brought two languages together.

I love the story, and think she’s entirely right.

Notes:

Two countries and a blog or three ago there were France and Diacritiques, the bilingual language blog: rough around the edges, but well-liked and well-linked by a small number of interesting people. Then, in 2006, came a big jump to the UK, an employment in commercial software replaced freelancing and occasional teaching. It was a good step in many ways, but not for my blogging, and this place never took off. Now, as of six months ago (February 2011), another big jump: after 15 years I left European capitals behind and joined my partner to live outside Fairbanks, latitude 64.8, to go back to working in a scientific environment. This footnote is for the benefit of any old reader from 5 years ago who might be interested. There are no promises or big announcements: I dislike blogging-about-one’s-blogging, so the note ends here. ↩

Welcome Unicode 6.0 and your crazy stable of symbols

Yesterday, the a new major version of the Unicode Standard was published in Unicode 6.0, a year after version 5.2 and more than four after the last major upgrade to 5.0.

There is of course a slew of new stuff in it, and I’m sure I’ll spend a good while digesting at least some of it. The most visible effect of a Unicode Standard revision are the new characters — 2,088 of them, bringing the total to 109,449. Most of them are added to the Supplementary Planes, outside the Basic Multilingual Plane, and therefore require surrogate pairs in UTF-16. (In other words, they are encoded in UTF-16 using two 16-bit code units, unlike the BMP characters, which use only one.)

But enough initialisms: what’s in it? New scripts for languages of course: Brāhmī, an abugida — an alphasyllabary based on consonants with secondary but required vowel notation — which is a historical script from India and of interest to archaeologists and historical linguists; the Mandaic alphabet for a variant of Aramaic that has a classical (liturgical Christian) and a vernacular form used by small communities in Iran and Iraq; and Batak, another abugida used to write an Austronesian language spoken by millions of people in northern Sumatra. In addition, of course, numerous updates, additions and improvements to existing scripts.

More striking is the number of new symbols and pictograms, including entirely new blocks. Emoticons (including Western ones, emoji, and also for example U+1F648 SEE-NO-EVIL MONKEY, U+1F649 HEAR-NO-EVIL MONKEY and U+1F64A SPEAK-NO-EVIL MONKEY). Useful transport and map symbols. Alchemical symbols certain to be welcomed by historians and fortune-tellers. Playing cards. And the catch-all block “Miscellaneous Symbols And Pictographs”, which brings us hundreds of animals, vegetables, fruit, tools, office symbols, communication symbols etc. pp. down to stuff like U+1F4A9, useful if you need to represent a pile of dog poop in a comic-book style.

Some have joked the date was a-propos: in the US, it was National Coming-Out Day, so fittingly we now have U+1F46C and U+1F46D: TWO MEN HOLDING HANDS and TWO WOMEN HOLDING HANDS.

But to get those new characters on paper or screen, we need fonts. Unfortunately, fonts are often many versions behind, and usually only implement specific ranges or blocks of the standard, depending on the font’s purpose. I opened Google to search for what’s out there already, and thanks to Hacker News found an impressively up-to-date font called Symbola by George Douros, third from the top on this collection of fonts for Ancient scripts.

It downloaded and installed fine on my Mac (running OS X Snow Leopard), but the OS X Character Viewer application is clearly not updated (I ran System Update just to make sure I wasn’t missing anything): As the highlighted areas show, neither the character names nor the Unicode blocks are known to OS X just yet.

OS X Character Viewer with new Unicode 6.0 pictogram characters

This doesn’t prevent us from using the font, though, but in the end, whether the characters are displayed depends on the application. My browser, unfortunately, seems to be stuck on Unicode 5.2. But still, here I bring you U+1F427 PENGUIN, in the hope it will automagically appear on this page as soon as the application stack has caught up:

🐧

Update: I can see it in Firefox! But not in Chrome or Safari, so it may be a Webkit problem.

Can you read 19th century txt spk?

@guardianstyle on Twitter points to an article by Mark Brown announcing what sounds like a wonderful exhibition the British Library is preparing: Evolving English: One Language, Many Voices (Nov 12, 2010 – Apr 3, 2011). There’s even a second piece, by Alison Flood.

British Library exhibits are reputed to be large, well-made and almost over-abundant (I’ve only been to one, Taking Liberties, and it was of this style.) If the Guardian is to be believed, usage controversies get a large place in Evolving English, which makes it particularly relevant to this blog. The press has its favourite topics, and one is text-message style abbreviated writing — which is only the latest manifestation of a type of language play that is of course much older. Here is a 19th century example, from the first article:

There will be examples of the linguistic games people played, and a poem from Gleanings From the Harvest-Fields of Literature, published in 1867. In it, 130 years before the arrival of mobile phone texting, Charles C Bombaugh uses phrases such as “I wrote 2 U B 4”. Another verse reads: “He says he loves U 2 X S,/ U R virtuous and Y’s,/ In X L N C U X L/ All others in his i’s.”

I think that modern txt spk would spell XLNs. Also, note the apostrophes in “wise” and “eyes”.

Small dangers of social media integration

Sometimes automatic social media integration on news sites can be a little… callous. The fact that there may have been realtively well-known former US senator on the plane only underlines the macabre element.

One News site offers a plethora of social media re-posting options. Highlighting mine.

Twitter’s American Airlines i18n mystery

In the Twitter client Tweetdeck, and on my Twitter page itself, I run a search for “i18n”: the frequently used abbreviated form of “internationalization” (or “internationalisation” in BrE). A while ago, I noticed that this search feed contained some odd posts that seemed to have nothing to do with the topic, but originated from accounts belonging to the company American Airlines or from other Twitter users retweeting such posts. Here is a sample screenshot from Tweetdeck:

Odd tweet out: American Airlines in the i18n Twitter search

While all the other tweets have to do with multi-lingual software in some sense, the highlighted one doesn’t. It comes from the user PointsAdvisor and re-posts an American Airlines special offer posted by the corporate account AAirwaves.

I noticed such posts about two weeks ago, both in Tweetdeck and on the Twitter web page, but was stumped until I clicked by chance (or rather, by mistake) on the shortened link inside one of these posts. The AAirwaves account uses the URL (better: URI) shortener Bit.ly, and Tweetdeck will show the original long URI for me to click on (it’s a configurable option, which helps prevent accessing malicious pages). Here is the Tweetdeck link preview for the link in the highlighted tweet above, using data from Bit.ly:

And highlighted, there’s the solution of the riddle: American Airlines uses in the URIs of www.aa.com a path segment (some text enclosed by / characters after the host name) that reads “i18n” — and the Twitter search picks up on this component.

Now on the one hand, this is quite bad URI design on the part of American Airlines, but what’s more interesting is that Twitter’s search engine resolves shortened links and includes the target URIs into the search. I didn’t expect this, as the shortened URIs are posted to Twitter as-is. It could be that the search inclusion is a by-product of resolving and storing the full links for security reasons: to protect against malicious code obscured by a link shortener.

In any event, the effect may be ephemeral. For the last two days, there haven’t been any new AA tweets in my “i18n” search feed on Tweetdeck (which uses the Twitter API). And on the Twitter page, they seem to have disappeared even from the history. I imagine that the Twitter people have to maintain a number of manually created rules to keep search feeds free of accidental spill-over.

This still does not even begin to address the problem of genuinely ambiguous search terms. Wikipedia lists over 20 senses for “FAI” for example, from Fairbanks International Airport to the French term for “ISP” via the Football Association of Ireland, but a Twitter search for the term is overrun by the extremely common Italian verb phrase “fai”. One thing we can expect is for Twitter and similar services to come up with prioritisation and disambiguation options, which, I’d expect, will introduce problems of their own.

Google’s h mystery

A few days ago, my friend Melinda Shore, who knows I’m interested in internationalization, sent me a screenshot from the search bar of her Safari browser. It is a drop-down list of search suggestions provided by Google just after typing the letter h:

The top suggestion is a mess:

What does it mean?
Why is it a legitimate search suggestion for the letter h? (If it is.)

Regarding 1., the search suggestion in Firefox is nearly identical, but I cannot reproduce the effect in Google’s own browser Chrome or on the search page directly. In the Safari example, we’re dealing with an odd mix of regular character strings (6.626068, 10, sup, -34), numeric HTML (or XML) entities (×) and raw Unicode-escaped characters that you might find in Python, C or Java source code (\u003C, \u003E). Let’s decode the second and third type of components:

\u003C and \u003E simply represent the Unicode code points U+003C and U+003E: the less-than and the greater-than signs < and >.
× is U+00D7 MULTIPLICATION SIGN: ×

Putting it together, we get the already much more user-friendly form

6.626068 × 10<sup>-34

or, completing and resolving the HTML: 6.626068 × 10^-34.

Once I realized this, my physics training kicked in and the answer to 2. became clearer – Planck’s constant, abbreviated as h, has the value of 6.62606889 × 10^-34 J s (or m² kg / s). This is not the result of injecting broken text into the search engine results, but a feature of Google’s calculator. Typing “G” into the browser’s search bar also yields similar semi-numeric character salad, while the results for “c“, “e” or “pi” are much more legible.

Still, the entire story raises questions about intent and execution. This is not really an internationalization issue because the form of those physical and mathematical constants is largely invariant by convention. Yet, the tools of internationalization — HTML entities, Unicode code point escapes — have leaked into scientific character display, too. Internationalization is a user interface (usability, user experience) issue [1].

On the execution side, Google got it wrong on several counts, and Apple and Mozilla share some of the blame. Browser search bar drop-down lists don’t allow for superscripts and aren’t sophisticated enough to strip markup, so they display ugly raw HTML. Choosing a numeric entity instead of the character × probably led to its display breaking. And < and > are even in ASCII, so they should display fine, but probably security concerns and their status as reserved HTML characters led to the odd choice of escaping method. All in all, at least one decoding step was not carried out.

More fundamentally, should Google suggest “6.626068 × 10^-34 m² kg / s” when you type a lowercase h? There was a time in my life when I used Planck’s constant daily, and I do use Google’s handy calculator via my browser’s search bar for quick arithmetic and unit conversions. But I think just spitting out the value with no label is going a little too far, and will for more than 99% of users be entirely unexpected: too different from the genuinely useful (for Americans) “hotmail”, “hulu” and “home depot”. Especially considering that for most letters of the alphabet, you could possibly find a scientific constant, function or theorem that starts with it.

Though maybe it is a ploy to spread more science among the people.

[1] It is also a design issue. The two aren’t mutually exclusive.

Google search result for "h" - very different from the broken suggestion

EDIT (2010-08-02): Commenters inside Facebook’s walled garden have remarked that if you actually take up the suggestion and search for it, you get to Planck’s constant. Currently this is partially right, but these things are constantly shifting: In my tests this morning, whether you use the Safari/Firefox search field or Google’s search page directly, you get a mix of results, the first of which are people wondering about the odd string on SEO forums. A little further down you do get collections of scientific constants, but you have to attentively read the result. Right now, this post is (after less than 12h) number 7 on the results page. None of the pages looks like what you get if you do a Google search for “h” (and hit return) — which is nice and helpful.

Another commenter remarks that for her, the suggestion is now prefaced with “Planck’s constant”, which is a vast improvement.