Google – long trails

A few days ago, my friend Melinda Shore, who knows I’m interested in internationalization, sent me a screenshot from the search bar of her Safari browser. It is a drop-down list of search suggestions provided by Google just after typing the letter h:

The top suggestion is a mess:

What does it mean?
Why is it a legitimate search suggestion for the letter h? (If it is.)

Regarding 1., the search suggestion in Firefox is nearly identical, but I cannot reproduce the effect in Google’s own browser Chrome or on the search page directly. In the Safari example, we’re dealing with an odd mix of regular character strings (6.626068, 10, sup, -34), numeric HTML (or XML) entities (×) and raw Unicode-escaped characters that you might find in Python, C or Java source code (\u003C, \u003E). Let’s decode the second and third type of components:

\u003C and \u003E simply represent the Unicode code points U+003C and U+003E: the less-than and the greater-than signs < and >.
× is U+00D7 MULTIPLICATION SIGN: ×

Putting it together, we get the already much more user-friendly form

6.626068 × 10<sup>-34

or, completing and resolving the HTML: 6.626068 × 10^-34.

Once I realized this, my physics training kicked in and the answer to 2. became clearer – Planck’s constant, abbreviated as h, has the value of 6.62606889 × 10^-34 J s (or m² kg / s). This is not the result of injecting broken text into the search engine results, but a feature of Google’s calculator. Typing “G” into the browser’s search bar also yields similar semi-numeric character salad, while the results for “c“, “e” or “pi” are much more legible.

Still, the entire story raises questions about intent and execution. This is not really an internationalization issue because the form of those physical and mathematical constants is largely invariant by convention. Yet, the tools of internationalization — HTML entities, Unicode code point escapes — have leaked into scientific character display, too. Internationalization is a user interface (usability, user experience) issue [1].

On the execution side, Google got it wrong on several counts, and Apple and Mozilla share some of the blame. Browser search bar drop-down lists don’t allow for superscripts and aren’t sophisticated enough to strip markup, so they display ugly raw HTML. Choosing a numeric entity instead of the character × probably led to its display breaking. And < and > are even in ASCII, so they should display fine, but probably security concerns and their status as reserved HTML characters led to the odd choice of escaping method. All in all, at least one decoding step was not carried out.

More fundamentally, should Google suggest “6.626068 × 10^-34 m² kg / s” when you type a lowercase h? There was a time in my life when I used Planck’s constant daily, and I do use Google’s handy calculator via my browser’s search bar for quick arithmetic and unit conversions. But I think just spitting out the value with no label is going a little too far, and will for more than 99% of users be entirely unexpected: too different from the genuinely useful (for Americans) “hotmail”, “hulu” and “home depot”. Especially considering that for most letters of the alphabet, you could possibly find a scientific constant, function or theorem that starts with it.

Though maybe it is a ploy to spread more science among the people.

[1] It is also a design issue. The two aren’t mutually exclusive.

Google search result for "h" - very different from the broken suggestion

EDIT (2010-08-02): Commenters inside Facebook’s walled garden have remarked that if you actually take up the suggestion and search for it, you get to Planck’s constant. Currently this is partially right, but these things are constantly shifting: In my tests this morning, whether you use the Safari/Firefox search field or Google’s search page directly, you get a mix of results, the first of which are people wondering about the odd string on SEO forums. A little further down you do get collections of scientific constants, but you have to attentively read the result. Right now, this post is (after less than 12h) number 7 on the results page. None of the pages looks like what you get if you do a Google search for “h” (and hit return) — which is nice and helpful.

Another commenter remarks that for her, the suggestion is now prefaced with “Planck’s constant”, which is a vast improvement.

So after a week of travelling (to MAAWG in Amsterdam, which was a thought-provoking experience) and several being busybusybusy all around, I finally managed to watch the Google I/O 2009 presentation about the upcoming Google Wave messaging and collaboration platform.

There are three thoughts that this video inspired:

When I first heard about Wave, despite the positive noises from people I trust, I was sceptical: it sounded like something I’d very much enjoy using myself, but as a successor of email for a great number of people? Email may be antiquated, as internet technologies go, yet it is the primary means of addressing messages to those connected to the net — a great number of whom aren’t collaborating on documents or even using IM very much. After watching the presentation, I think this judgment was premature.

To step back a little… Email right now comes in three forms: first, spam; second, what has come to be called “bacn” by some, ie automated but legitimate messages (from post-signup confirmations, via notifications of activities on social networks, to marketing messages and newsletters we opted to receive or that are addressed to us at work); and finally the prototypical email: conversations between real people. The first, we can discount for the moment — no one wants those. For the second, the added value from Google Wave is limited; at most, I might want to annotate such a message for my own use, or link it to my calendar or to-do list (“deadline for signing up to benefit X”, “interesting exhibition at museum Y”). The third is different. If, and from the demo it looks as if Google could pull this off, the user interface is seamless enough, I could indeed see regular people conversing in waves instead of cumbersome email threads. Even better, if, say, Facebook (replace with social platform of choice) messaging threads could be conducted through a Wave client, we’d probably have a winner.
Second thought, if we do think if Google Wave as a potential successor for email, the one central problem that the protocol should be solving is that of spam and abuse. From the limited time I’ve spend with the documents, it seems that the danger of spamming an existing wave is reduced, as each wave carries a globally unique wave id, and messages are transmitted encrypted. What about starting a new wave though? How would one wave provider authenticate with other wave providers? Maybe someone could point me to the relevant section in the protocol, that woud be great. Then there’s the problem of compromised wave accounts, especially if desktop clients appear on the scene. Last, if Wave accounts with Google are free and tied to Google accounts, there’s a need to become more efficient preventing automated account creation for abusive purposes: Nearly all of the Eggcorn Forum‘s spam problems came from accounts registering with a Gmail address, who managed to navigate the confirmed registration process just fine and were without doubt created by bots.
Two short segments in the video particularly piqued my interest: automated translation — on the fly — between 40 languages? Google Translate has become much better over the last two years or so, and it would be great to run some large-scale quality checks on translation features. Oh and that spellchecker, which is the first I’ve ever seen to take context into account. Maybe Google would be interested in throwing eggcorns into the spellcheck-heuristics mix? [My own spellchecker, untrained and brand-new, just complained about “aren” … in “aren’t”. Sigh.]

Tag: Google

Google’s h mystery

The (Google) wave of the (messaging) future?