Yesterday, the a new major version of the Unicode Standard was published in Unicode 6.0, a year after version 5.2 and more than four after the last major upgrade to 5.0.
There is of course a slew of new stuff in it, and I’m sure I’ll spend a good while digesting at least some of it. The most visible effect of a Unicode Standard revision are the new characters — 2,088 of them, bringing the total to 109,449. Most of them are added to the Supplementary Planes, outside the Basic Multilingual Plane, and therefore require surrogate pairs in UTF-16. (In other words, they are encoded in UTF-16 using two 16-bit code units, unlike the BMP characters, which use only one.)
But enough initialisms: what’s in it? New scripts for languages of course: Brāhmī, an abugida — an alphasyllabary based on consonants with secondary but required vowel notation — which is a historical script from India and of interest to archaeologists and historical linguists; the Mandaic alphabet for a variant of Aramaic that has a classical (liturgical Christian) and a vernacular form used by small communities in Iran and Iraq; and Batak, another abugida used to write an Austronesian language spoken by millions of people in northern Sumatra. In addition, of course, numerous updates, additions and improvements to existing scripts.
More striking is the number of new symbols and pictograms, including entirely new blocks. Emoticons (including Western ones, emoji, and also for example U+1F648 SEE-NO-EVIL MONKEY, U+1F649 HEAR-NO-EVIL MONKEY and U+1F64A SPEAK-NO-EVIL MONKEY). Useful transport and map symbols. Alchemical symbols certain to be welcomed by historians and fortune-tellers. Playing cards. And the catch-all block “Miscellaneous Symbols And Pictographs”, which brings us hundreds of animals, vegetables, fruit, tools, office symbols, communication symbols etc. pp. down to stuff like U+1F4A9, useful if you need to represent a pile of dog poop in a comic-book style.
Some have joked the date was a-propos: in the US, it was National Coming-Out Day, so fittingly we now have U+1F46C and U+1F46D: TWO MEN HOLDING HANDS and TWO WOMEN HOLDING HANDS.
But to get those new characters on paper or screen, we need fonts. Unfortunately, fonts are often many versions behind, and usually only implement specific ranges or blocks of the standard, depending on the font’s purpose. I opened Google to search for what’s out there already, and thanks to Hacker News found an impressively up-to-date font called Symbola by George Douros, third from the top on this collection of fonts for Ancient scripts.
It downloaded and installed fine on my Mac (running OS X Snow Leopard), but the OS X Character Viewer application is clearly not updated (I ran System Update just to make sure I wasn’t missing anything): As the highlighted areas show, neither the character names nor the Unicode blocks are known to OS X just yet.
This doesn’t prevent us from using the font, though, but in the end, whether the characters are displayed depends on the application. My browser, unfortunately, seems to be stuck on Unicode 5.2. But still, here I bring you U+1F427 PENGUIN, in the hope it will automagically appear on this page as soon as the application stack has caught up:
🐧
Update: I can see it in Firefox! But not in Chrome or Safari, so it may be a Webkit problem.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Many thanks for this interesting and useful information. But I can’t help wondering who asked for the inclusion of symbol 1F4A9, and why.
If you look at the Miscellaneous Symbols and Pictograms chart you’ll see that this one belongs to the subsection “Comic style symbols”. I think it’s simply a comic book trope, along with splashing sweat, speech bubble, collision, bomb and flexed bicep, which all got their own code points in 6.0.