Can you read 19th century txt spk?

@guardianstyle on Twitter points to an article by Mark Brown announcing what sounds like a wonderful exhibition the British Library is preparing: Evolving English: One Language, Many Voices (Nov 12, 2010 – Apr 3, 2011).  There’s even a second piece, by Alison Flood.

British Library exhibits are reputed to be large, well-made and almost over-abundant (I’ve only been to one, Taking Liberties, and it was of this style.) If the Guardian is to be believed, usage controversies get a large place in Evolving English, which makes it particularly relevant to this blog. The press has its favourite topics, and one is text-message style abbreviated writing — which is only the latest manifestation of a type of language play that is of course much older. Here is a 19th century example, from the first article:

There will be examples of the linguistic games people played, and a poem from Gleanings From the Harvest-Fields of Literature, published in 1867. In it, 130 years before the arrival of mobile phone texting, Charles C Bombaugh uses phrases such as “I wrote 2 U B 4″. Another verse reads: “He says he loves U 2 X S,/ U R virtuous and Y’s,/ In X L N C U X L/ All others in his i’s.”

I think that modern txt spk would spell XLNs. Also, note the apostrophes in “wise” and “eyes”.

Small dangers of social media integration

Sometimes automatic social media integration on news sites can be a little… callous. The fact that there may have been realtively well-known former US senator on the plane only underlines the macabre element.

One News site offers a plethora of social media re-posting options

One News site offers a plethora of social media re-posting options. Highlighting mine.

Twitter’s American Airlines i18n mystery

In the Twitter client Tweetdeck, and on my Twitter page itself, I run a search for “i18n”: the frequently used abbreviated form of “internationalization” (or “internationalisation” in BrE). A while ago, I noticed that this search feed contained some odd posts that seemed to have nothing to do with the topic, but originated from accounts belonging to the company American Airlines or from other Twitter users retweeting such posts. Here is a sample screenshot from Tweetdeck:

Odd tweet out: American Airlines in the i18n Twitter search

Odd tweet out: American Airlines in the i18n Twitter search

While all the other tweets have to do with multi-lingual software in some sense, the highlighted one doesn’t. It comes from the user PointsAdvisor and re-posts an American Airlines special offer posted by the corporate account AAirwaves.

I noticed such posts about two weeks ago, both in Tweetdeck and on the Twitter web page, but was stumped until I clicked by chance (or rather, by mistake) on the shortened link inside one of these posts. The AAirwaves account uses the URL (better: URI) shortener Bit.ly, and Tweetdeck will show the original long URI for me to click on (it’s a configurable option, which helps prevent accessing malicious pages). Here is the Tweetdeck link preview for the link in the highlighted tweet above, using data from Bit.ly:

Tweetdeck link preview for AA link

Tweetdeck link preview for AA link

And highlighted, there’s the solution of the riddle: American Airlines uses in the URIs of www.aa.com a path segment (some text enclosed by / characters after the host name) that reads “i18n” — and the Twitter search picks up on this component.

Now on the one hand, this is quite bad URI design on the part of American Airlines, but what’s more interesting is that Twitter’s search engine resolves shortened links and includes the target URIs into the search. I didn’t expect this, as the shortened URIs are posted to Twitter as-is. It could be that the search inclusion is a by-product of resolving and storing the full links for security reasons: to protect against malicious code obscured by a link shortener.

In any event, the effect may be ephemeral. For the last two days, there haven’t been any new AA tweets in my “i18n” search feed on Tweetdeck (which uses the Twitter API). And on the Twitter page, they seem to have disappeared even from the history. I imagine that the Twitter people have to maintain a number of manually created rules to keep search feeds free of accidental spill-over.

This still does not even begin to address the problem of genuinely ambiguous search terms. Wikipedia lists over 20 senses for “FAI” for example, from Fairbanks International Airport to the French term for “ISP” via the Football Association of Ireland, but a Twitter search for the term is overrun by the extremely common Italian verb phrase “fai”. One thing we can expect is for Twitter and similar services to come up with prioritisation and disambiguation options, which, I’d expect, will introduce problems of their own.

Google’s h mystery

A few days ago, my friend Melinda Shore, who knows I’m interested in internationalization, sent me a screenshot from the search bar of her Safari browser. It is a drop-down list of search suggestions provided by Google just after typing the letter h:

Safari search bar Google suggestions

Safari search bar Google suggestions

The top suggestion is a mess:

  1. What does it mean?
  2. Why is it a legitimate search suggestion for the letter h? (If it is.)

Regarding 1., the search suggestion in Firefox is nearly identical, but I cannot reproduce the effect in Google’s own browser Chrome or on the search page directly. In the Safari example, we’re dealing with an odd mix of regular character strings (6.626068, 10, sup, -34), numeric HTML (or XML) entities (×) and raw Unicode-escaped characters that you might find in Python, C or Java source code (\u003C, \u003E). Let’s decode the second and third type of components:

  • \u003C and \u003E simply represent the Unicode code points U+003C and U+003E: the less-than and the greater-than signs < and >.
  • &#215; is U+00D7 MULTIPLICATION SIGN: ×

Putting it together, we get the already much more user-friendly form

6.626068 × 10<sup>-34

or, completing and resolving the HTML: 6.626068 × 10-34.

Once I realized this, my physics training kicked in and the answer to 2. became clearer – Planck’s constant, abbreviated as h,  has the value of 6.62606889 × 10-34 J s (or m2 kg / s). This is not the result of injecting broken text into the search engine results, but a feature of Google’s calculator. Typing “G” into the browser’s search bar also yields similar semi-numeric character salad, while the results for “c“, “e” or “pi” are much more legible.

Still, the entire story raises questions about intent and execution. This is not really an internationalization issue because the form of those physical and mathematical constants is largely invariant by convention. Yet, the tools of internationalization — HTML entities, Unicode code point escapes — have leaked into scientific character display, too. Internationalization is a user interface (usability, user experience) issue [1].

On the execution side, Google got it wrong on several counts, and Apple and Mozilla share some of the blame. Browser search bar drop-down lists don’t allow for superscripts and aren’t sophisticated enough to strip markup, so they display ugly raw HTML. Choosing a numeric entity instead of the character × probably led to its display breaking. And < and > are even in ASCII, so they should display fine, but probably security concerns and their status as reserved HTML characters led to the odd choice of escaping method. All in all, at least one decoding step was not carried out.

More fundamentally, should Google suggest “6.626068 × 10-34 m2 kg / s” when you type a lowercase h? There was a time in my life when I used Planck’s constant daily, and I do use Google’s handy calculator via my browser’s search bar for quick arithmetic and unit conversions. But I think just spitting out the value with no label is going a little too far, and will for more than 99% of users be entirely unexpected: too different from the genuinely useful (for Americans) “hotmail”, “hulu” and “home depot”. Especially considering that for most letters of the alphabet, you could possibly find a scientific constant, function or theorem that starts with it.

Though maybe it is a ploy to spread more science among the people.

[1] It is also a design issue. The two aren’t mutually exclusive.

Google search result for "h"

Google search result for "h" - very different from the broken suggestion

EDIT (2010-08-02): Commenters inside Facebook’s walled garden have remarked that if you actually take up the suggestion and search for it, you get to Planck’s constant. Currently this is partially right, but these things are constantly shifting: In my tests this morning, whether you use the Safari/Firefox  search field or Google’s search page directly, you get a mix of results, the first of which are people wondering about the odd string on SEO forums. A little further down you do get collections of scientific constants, but you have to attentively read the result. Right now, this post is (after less than 12h) number 7 on the results page. None of the pages looks like what you get if you do a Google search for “h” (and hit return) — which is nice and helpful.

Another commenter remarks that for her, the suggestion is now prefaced with “Planck’s constant”, which is a vast improvement.

S****ing with all these hookers

Over on Facebook, a friend posted a link to the article “Lotto lout Michael Carroll going back to being a binman after blowing £9.7m win”, in which the Daily Mail, a paper known for its even-handed quality reporting, is nearly falling over itself in breathless excitement over the story of a man who spent a £9.7 Mio lottery win in only 8 years. My friend commented: “One thing I’ll say is thank God the Daily Mail starred out the word “sleeping”. Unless it was something else…”. Here is the passage in question (including the surrounding paragraphs for context):

Daily Mail article asterisking out slumming (probably)

Daily Mail article asterisking out "slumming" (probably)

I rather do think it’s “something else”, given that “sleeping” (in the sense of “having sex”) appears without asterisks in just the previous paragraph.

Whenever I happen to open a copy of what is called the Red Tops here in the UK, the numerous words that are being camouflaged by asterisks surprise me anew. Once, it took me a minute or more staring at “b******s” to finally figure out the word was “bastards”.

As always when you make people work harder for understanding, it increases the salience of the object they have to put in all this effort for and thereby draws greater attention to it — as evidenced by my friend’s comment when posting the article. So this, rather then prudishness, may be the real reason the tabloid press is so fond of the asterisks of avoidance.

Votes on Facebook

The polling stations for the UK General Election 2010 have closed, the exit poll predicts a hung (some call it “balanced”) parliament, a loss of seats of the Liberal Democrats, and a Conservative party only a few seats away from a majority. The first MP has been announced — Sunderland South, a safe Labour seat, but with a swing to the the Conservative party that, if extrapolated to all of England, would probably translate into an outright Conservative majority. As-is, I’m listening to the usual speculations in the absence of hard data, about alliances of the Lib Dems with Labour, or maybe the Tories with the Northern Irish Democratic Unionists. It’ll be a long night.

Meanwhile, a different titbit. All day, my Facebook page has had this box right above my “Wall”:

The number on the right has been going up in real-time all day. It is the number of Facebook members that have hit the “I voted” button. This is, apparently, a Facebook feature that is switched on for users from a country during elections in that country, as I learnt after clicking on “What’s this?”.

The first interesting point about this is the figure. It’ll probably go up a little further during the evening, and I’ll be curious to see where it ends up. The number of registered voters is given in the press as 44 Mio. If the turnout ends up at about 75%, that means that 33 Mio. people will actually vote. Out of those, nearly 2 Mio. will not just be on Facebook, but engaged enough with this site (or product) to click on “I voted” on election day. That’s about 6%. Not at all negligible.

The second point that comes to mind is the surprise that I’m only discovering the feature today, even though it was visible to, and presumably being used by, friends of mine when there were elections in their countries. I may even have voted in at least one of those (the last German Bundestag election). It would be interesting if Facebook managed to publicise at least the results elections internationally through such a tool.

As for me, I clicked “I voted”, even though in reality I only voted in the local election that’s taking place today in my borough as well: Being an EU citizen, I am not allowed to vote for UK parliament. Strangely, if I were a UK resident from a Commonwealth country, it would be much harder to live and work here, but I would be allowed to vote.

So modern snakes eat dinosaur eggs?

From the article “Prehistoric snake gobbled-up dinosaur babies” by Jeremy Hance, which was published on mongabay.com on March 2, 2010:

A fossilized snake has been discovered inside a titanosaur nest in India, leading researchers to conclude that the snake fed on newly-hatched dinosaur babies, rather than their eggs like modern snakes.

The thought process is quite clear, though probably even in its long form simplistic:

  1. This prehistoric snake ate freshly hatched baby dinosaurs.
  2. Modern snakes evolved from prehistoric snakes.
  3. Birds evolved from dinosaurs (though there’s some fuzziness around the edges of this statement).
  4. Birds (and most reptiles) lay eggs.
  5. Modern snakes eat eggs.

But here, it got telescoped into an over-shortened version in which the pronoun “their [eggs]” carries the entire weight of referring, simultaneously  to  dinosaurs in the prehistoric case and (unnamed) birds in the modern case.

Some referential ambiguity.

In the Daily Telegraph, an article illustrated with this captioned image:

wild boar

 

Frankly, I’d have mistaken him for a wild boar too.

An illegal translation

Via a recent Failblog post, our attention is drawn to a very bizarre sign in Czech:

Bilingual Czech-English sign forbidding... translating

Bilingual Czech-English sign prohibiting... translating

What is most puzzling about this sign is that it is not an example of what we’ve seen in the past, a translation error: Zákaz tlumočení does indeed mean “translating [interpreting] prohibited”. Apparently, and without explanation, the sign’s injunction doesn’t apply to the sign itself  – how else would it have been possible to make the sign without the act of translation?

According to the comments in the Failblog thread, the most likely explanation is that at this particular spots, noisy tours for tourists are unwelcome. Except if the tour guide speaks Czech.

Prague photos & a monitor background

I finally edited and uploaded to Flickr pictures from a trip I took to Prague a year ago (August 2008).

A few of them looked like they’d be nice as screen backgrounds, so I experimented with making backgrounds in the correct monitor sizes. I have this one right now on my Mac – external screen and the Macbook Pro’s built-in LCD:

TV Tower in Žižkov with David Černý's sculptures

TV Tower in Žižkov with David Černý's sculptures

Click on the image (or the following link) for the 1920×1200 version, or choose one of the smaller sizes: 1680×1050, 1600×1200, 1440×900, 1024×768.

Let me know what you think of the quality, the sizes, and if you like it, in which case I can make more. I didn’t want the image sizes to be too large, but there is a quality trade-off involved.

Unlike the rest of this blog, the CC License on these is BY-NC-SA.