Friday link dispatch 01

On one of my blogs, there used to be automatically generated link posts via Delicious.com. The method was never very reliable, and I abandoned it as it was never updated from its rather basic functionality. In particular, every single link I saved on Delicious.com was re-posted (instead of, say, just the links marked with a “post-me” tag). But I miss the link roundups. So let’s bring them back.

How to choose appropriate terminology when writing a historical novel. Which of the following words would you expect were not being used at all in the early 19th century, or had a markedly different sense than in today’s English: manipulate, blink, looped, conversationalist, knowledgeable, traipsing? The writer Marie Robinette Kowal, author of (among other works), Glamour in Glass, which is set in 1815, presents her anachronism-busting method. It involves extracting a word list from Jane Austen’s oeuvre and looking up each non-Austen word in the OED.  (Via Language Hat.)

Earliest know uses of some (many) of the words of mathematics and earliest known uses of some mathematical symbols:

FRACTAL. According to Franceschetti (p. 357):

In the winter of 1975, while he was preparing the manuscript of his first book, Mandelbrot thought about a name for his shapes. Looking into his son’s Latin dictionary, he came across the adjective fractus, from the verb frangere, meaning “to break.” He decided to name his shapes “fractals.”

Fractal appears in 1975 in Les Objets fractals: Forme, hasard, et dimension by Benoit Mandelbrot (1924- ). The title was translated as Fractals: Form, Chance, and Dimension (1977).

These pages, which must have been around for some time, are the work of Jeff Miller. Full of historical, lexical and typographical information and rich in references.

Tai, Chen-To: A historical study of vector analysis. I’m reviewing some of the maths I knew 15 years ago (gracious, am I rusty!) and came across this 1995 paper (available as a PDF file),which is even geekier (and certainly more specialized) than the pages in the previous link. It presumes familiarity with the subject of vector analysis as taught to math, physics or engineering students in their first years and covers historical texts mostly from mathematics and electromagnetism with respect to the notation of the derivatives (gradient, divergence, curl), with or without the Nabla operator ∇ (also called del). The author is opinionated and also has a second text, A Survey of the Improper Uses of ∇ in Vector Analysis.

Personal names around the world. A short but useful page from the World Wide Web Consortium.

People who create web forms, databases, or ontologies are often unaware how different people’s names can be in other countries. They build their forms or databases in a way that assumes too much on the part of foreign users. This article will first introduce you to some of the different styles used for personal names, and then some of the possible implications for handling those on the Web.

(Hat tip: Pat Hall on Facebook.)

Google’s h mystery

A few days ago, my friend Melinda Shore, who knows I’m interested in internationalization, sent me a screenshot from the search bar of her Safari browser. It is a drop-down list of search suggestions provided by Google just after typing the letter h:

Safari search bar Google suggestions
Safari search bar Google suggestions

The top suggestion is a mess:

  1. What does it mean?
  2. Why is it a legitimate search suggestion for the letter h? (If it is.)

Regarding 1., the search suggestion in Firefox is nearly identical, but I cannot reproduce the effect in Google’s own browser Chrome or on the search page directly. In the Safari example, we’re dealing with an odd mix of regular character strings (6.626068, 10, sup, -34), numeric HTML (or XML) entities (×) and raw Unicode-escaped characters that you might find in Python, C or Java source code (\u003C, \u003E). Let’s decode the second and third type of components:

  • \u003C and \u003E simply represent the Unicode code points U+003C and U+003E: the less-than and the greater-than signs < and >.
  • &#215; is U+00D7 MULTIPLICATION SIGN: ×

Putting it together, we get the already much more user-friendly form

6.626068 × 10<sup>-34

or, completing and resolving the HTML: 6.626068 × 10-34.

Once I realized this, my physics training kicked in and the answer to 2. became clearer – Planck’s constant, abbreviated as h,  has the value of 6.62606889 × 10-34 J s (or m2 kg / s). This is not the result of injecting broken text into the search engine results, but a feature of Google’s calculator. Typing “G” into the browser’s search bar also yields similar semi-numeric character salad, while the results for “c“, “e” or “pi” are much more legible.

Still, the entire story raises questions about intent and execution. This is not really an internationalization issue because the form of those physical and mathematical constants is largely invariant by convention. Yet, the tools of internationalization — HTML entities, Unicode code point escapes — have leaked into scientific character display, too. Internationalization is a user interface (usability, user experience) issue [1].

On the execution side, Google got it wrong on several counts, and Apple and Mozilla share some of the blame. Browser search bar drop-down lists don’t allow for superscripts and aren’t sophisticated enough to strip markup, so they display ugly raw HTML. Choosing a numeric entity instead of the character × probably led to its display breaking. And < and > are even in ASCII, so they should display fine, but probably security concerns and their status as reserved HTML characters led to the odd choice of escaping method. All in all, at least one decoding step was not carried out.

More fundamentally, should Google suggest “6.626068 × 10-34 m2 kg / s” when you type a lowercase h? There was a time in my life when I used Planck’s constant daily, and I do use Google’s handy calculator via my browser’s search bar for quick arithmetic and unit conversions. But I think just spitting out the value with no label is going a little too far, and will for more than 99% of users be entirely unexpected: too different from the genuinely useful (for Americans) “hotmail”, “hulu” and “home depot”. Especially considering that for most letters of the alphabet, you could possibly find a scientific constant, function or theorem that starts with it.

Though maybe it is a ploy to spread more science among the people.

[1] It is also a design issue. The two aren’t mutually exclusive.

Google search result for "h"
Google search result for "h" - very different from the broken suggestion

EDIT (2010-08-02): Commenters inside Facebook’s walled garden have remarked that if you actually take up the suggestion and search for it, you get to Planck’s constant. Currently this is partially right, but these things are constantly shifting: In my tests this morning, whether you use the Safari/Firefox  search field or Google’s search page directly, you get a mix of results, the first of which are people wondering about the odd string on SEO forums. A little further down you do get collections of scientific constants, but you have to attentively read the result. Right now, this post is (after less than 12h) number 7 on the results page. None of the pages looks like what you get if you do a Google search for “h” (and hit return) — which is nice and helpful.

Another commenter remarks that for her, the suggestion is now prefaced with “Planck’s constant”, which is a vast improvement.