Comments:"Schrödinger's 😻 and outside-the-box naming [LWN.net]"
URL:http://lwn.net/Articles/545741/
By Nathan Willis
April 3, 2013
What's in a string? That depends on who you ask, apparently; a lesson that Fedora recently learned when it unexpectedly ran into a problem with the release name for the upcoming Fedora 19, "Schrödinger's Cat"—and all of the unusual characters contained within. Typographic oddities might seem like a trivial reason to upend the distribution release process, but a validation tool in the bug reporting system objected to the name, so Fedora developers found themselves asking whether it was more practical to stop and fix all of the utilities, or to change the release name itself.
The problem, of course, is that unlike previous Fedora release names, "Schrödinger's Cat" contains some characters outside of the basic Latin alphabet: an o with umlaut and an apostrophe. But the specific issue encountered in the wild is even more specific than that; the "apostrophe" in question is frequently typed as the similar-looking but different single-quote character, and quotes can wreak havoc when the release name is processed by a shell script. On March 16, Adam Williamson reported a bug in the Fedora bug reporting tool: when reporting a bug against Fedora 19, the server side threw an error when it tried to validate the name of the release, complaining of "illegal characters."
The root of the bug was quickly traced to libreport, which contains an is_text_file() function. The function determines whether or not a given file is text by whether 2% of the bytes are greater than 0x80. Two percent is a rather arbitrary limit, and in this case the file triggering the error was /etc/os-release, which consisted of a single line:
Fedora release 19 (Schrödinger's Cat)
Dave Malcolm pointed out that the /etc/os-release manual page says non-alphanumeric characters should be escaped "with backslashes, following shell style," and Denys Vlasenko patched is_text_file() to bump the acceptable-character threshold from 2% to 10%. But that fix was a simple workaround; as others in the bug comments pointed out, the function should test whether the contents of the file are really valid UTF-8 text, which the 0x80 test does not do.
Vlasenko did commit a more substantive patch a few days later, but libreport was not the only utility to stumble when it encountered the new release name. Another bug opened by Williamson reported that grub2 also broke when it encountered /etc/os-release, due to the un-escaped single-quote character.
Schrödinger, Schmodinger
On the Fedora development list, Sérgio Basto proposed one change that would solve both problems (and, hopefully, any others stemming from the unusual release name): formally change the release name from "Schrödinger's Cat" to "Schrodingers Cat" or some similar variation that stuck to pure ASCII characters. After all, as Chris Murphy commented, there are likely to be many more utilities that cannot handle the release name, and the project will continue to encounter them as the development cycle progresses.
But, to others, simply changing the release name amounts to "papering over" the real issue, which is ensuring that the build and QA tools can handle arbitrary UTF-8 text. Surely it is better to spend a little time now to fix the issues than to avoid them, the thinking went. Williamson, however, disagreed, calling it "a question of priorities" in light of Fedora's human resources and release schedule. Later, he elaborated that fixing UTF-8 support in the problematic tools in separate branches would be acceptable, if it did not slow down the release:
If we have to compromise on just papering it over for Alpha, I mean, _fine_. But seriously: sometimes papering it over is just the right thing to do.
Similarly, Chris Adams pointed out that the deadline for adding new features for Fedora 19 had already passed; adding UTF-8 support to a variety of tools may be important, but there is no doubt that it amounts to a feature. But G.Wolfe Woodbury contended that the real issue was proper internationalization, and that "not defensively programming for such cases is short-sighted."
Solutions and open questions
Jaroslav Reznik opened a Fedora Engineering Steering Committee (FESCo) ticket on the subject, offering two alternatives: fixing UTF-8 and character-handling issues as they arise, or changing the release name to something similar but less problematic (perhaps "Cat of Schroedinger" or the proper German "Schroedinger Katze").
The discussion on the mailing list continued, including mention of the very real risk that after Fedora 18's lengthy delays, the prospect of holding up Fedora 19's release to fix a character string would amount to a terrible public relations blunder. But Peter Jones found a compromise solution and posted a patch changing Schrödinger's Cat to Schrödinger’s Cat in the affected files. The two strings may not look too different (in fact, depending on one's font, they may look identical), but the second replaces the "typewriter apostrophe" character at Unicode point U+0027 to the "punctuation apostrophe" at U+2019. The typewriter apostrophe is interpreted as a shell quote character, but the punctuation apostrophe is not. Rarely do the differences in Unicode's byzantine slate of similar code points solve more problems than they create—just look at curly- versus straight-quotes in HTML, for example—but in this case, the change allowed /etc/os-release to work once again. FESCo voted to approve the apostrophe change and to fix any other UTF-8 support issues encountered during the development cycle.
Of course, the apostrophe compromise leaves the potential for other UTF-8 support issues to be encountered, and sidesteps the quote-character issue. That bodes well for Fedora 19's release date not getting pushed back due to a last-minute "umlaut bug," but it means less rigorous testing on the release build tools. FESCo subsequently ruled that future release names shall not include "shell metacharacters." That is a practical trade-off; as several list members pointed out, by changing the problematic string, an unknown number of character-handling bugs may go undetected by Fedora—but they could still bite other projects that use the Fedora tools. In the long run, the tools will still need fixing.
In fact, some participants in the mailing list discussion proposed adding non-alphanumeric characters to future release names just to see what happens. Paul Flo Williams predicted someone proposing "Motörhead's Moshpit" as the Fedora 20 release name because of the non-ASCII characters, while Richard M. Jones suggested☃ (the Unicode "snowman" character U+2603, also known as HTML character entity &9731; or ☃). Peter Robinson proposed the project go right for the goal and choose "DROP table *;".
On the other side of the debate, some developers were less than amused. Fedora has had its share of project members who object to release names altogether; Jóhann B. Guðmundsson said:
They also get the benefit of fixing what breaks in the process.
Anti-release-name comments did not elicit much further debate, so it seems likely that release names will continue to cling on for at least one more release cycle. But it is true that "Schrödinger's Cat" caused some problems due to the unpredictable effect it has on development and release tools. On the whole, though, the problems it revealed are problems worth solving—there is no telling what characters downstream spins and Fedora derivatives might put into a string.
The distribution will be better for catching and correcting assumptions about character encodings and non-alphanumeric strings. Robinson noted that Fedora 19's release name was chosen roughly six months ago during the Fedora 18 Alpha period; nevertheless it took six months for anyone to encounter a bug related to it precisely because of how deeply buried the problem was. A release name might be a lowly string, primarily chosen for amusement value, but the issue should remind all distributions how subtle such bugs can be, and Fedora clearly stands to benefit now that the cat is out of the bag.
[Special hat tip to Don Marti for proposing "Schrödinger's 😻" as an alternative name. "If you're going to do Unicode, do Unicode."]
(Log in to post comments)