Weird utf-8 bug in QuickLook: it’s the EA

A while ago, I noticed a weird bug affecting the way QuickLook on Leopard showed characters with french accents, while being careful to save in UTF-8 from TextMate:

wrong characters showing up in quicklook

TM’s on left, QL on the right. Unix deities seemed to confirm that TM was not to blame:

$ file test.txt 
test.txt: UTF-8 Unicode text
$ cat test.txt 
é ç ë

On the other hand, opening test.txt with TextEdit gave the same result as QuickLook — messed up characters. If I fixed the characters in TextEdit and saved, the display of this particular file was always correct from then on (even if I modified it with TM afterwards). Weird.

Apart from a grand total of one (1) related and unanswered thread at Apple Discussions, a Google search for “Quicklook utf8″ or “Quicklook unicode” turned out nothing – so at first it seemed like there were only two people on the entire planet affected by this bug (well, three). By being a little more creative with my keywords, however, this post by Nico Weber on a vi-related thread turned up:

Indeed, if you check the two files with `ls -l`, you’ll see that the
file written by TextEdit has an “@” (that means it has an extended
attribute). Now, if you do `ls -l@` or `xattr -l filename`, you’ll see
that the TextEdit file has the com.apple.TextEncoding attribute set:

Macintosh-2:b nico$ xattr -l texteditfile.txt
com.apple.TextEncoding: UTF-8;134217984

The file written by MacVim does not have this attribute. If you add it
(`xattr -w com.apple.TextEncoding ‘UTF-8;134217984′ macvimfile.txt`),
the file shows up correctly in Quicklook (and in TextEdit too; it
didn’t do that before)

Applying the `xattr` command on test.txt did the trick: non-ASCII UTF-8 characters now show up fine in QL.
Characters showing up fine in QL after xattr
So what gives? It seems pretty harsh to me to demand extended attributes just to have the encoding right, especially when all 3-rd party programs are handling the matter perfectly fine without them, thank you very much. I don’t feel like applying xattr on all my text files either. I also don’t understand why the issue is not more widespread, i.e. why nobody talks about it. I feel like maybe this bug only appears with a specific combination of tools and locales? Should TextMate set the relevant attributes? I’m puzzled.

About these ads
This entry was posted in Technology and tagged , , , , , , . Bookmark the permalink.

32 Responses to Weird utf-8 bug in QuickLook: it’s the EA

  1. Moitah says:

    I found your post on google because I had the same problem.

    Maybe nobody talks about it because looking at text files content with QuickLook is useless.

  2. Oskar says:

    I have experienced exactly the same problem as you describe in your post. I can also confirm that using the “xattr -w” command fixes the problem temporarily..

    I would really like to see that Apple fixes this annoying bug because I use QuickLook to preview text files quite often.

  3. rashers says:

    i’m seeing the same thing using MS Word as an editor on OS 10.4. well at least i think it is, can anyone confirm?

  4. GK says:

    Same here, I am just as suprised as you are. QuickLook should use the BOM byte to identify the encoding (at least for Unicode files), instead of this unused xattr field.

    The other thing: Why is MacRoman is the default? I personally dont have _any_ MacRoman encoded files, the all are UTF-8 or ISO-8859-2 (I am Hungarian) or ISO-8859-1. MacRoman is obsolete, I think.

  5. vigo says:

    well, this utf8 issue is huge! ever try to sort via terminal? i’m Turkish and my env is UTF8 by default. when i cat a file and pipe to sort shit happens… also, after installing snow leopard, nano screwed up when i press Turkish letter… thanx to macports ( for nano )…

  6. Ozkan says:

    You’re not alone. I also got same problem. Seems like there’s no solution for this. Maybe a macro for TextMate to add xattr after file has been saved.. Don’t know, just an idea.

  7. Michael Rose says:

    I have this issue too.

    I found [the Apple thread](http://discussions.apple.com/thread.jspa?threadID=1479441), now closed, via Google and I quickly made my way here.

    For me UTF-8 doesn’t show in QuickLook but it is fine in TextMate (where to file originated). If I save as UTF-16 all is fine in both. Saving ‘down’ to UTF-8 produces the error in QuickLook again.

    Is there still not a permenant fix for this?

  8. Adam says:

    I have the same problem. And the solution to edit each file individually afterwards isn’t really viable. This is pretty annoying as I always use TM to edit text files, and I very often use quick look to view them

  9. Ludovic Kuty says:

    Same here with OS X 10.6.2. Pretty annoying as I quicklook text files quite often. I’m jut asking to myself how could they introduce such a bug.
    Hope somebody will post a solution on this blog or Apple makes a patch for it.

  10. FredB says:

    If you want to add this EA to files saved (via kb shortcut) in TextMate it’s really easy.
    Just make a new command like this:

    
    Save: Current File
    Command: xattr -w com.apple.TextEncoding 'UTF-8;134217984' "$TM_FILEPATH"
    Input: None
    Output: Show as Tool Tip
    Activation: Key Equivalent: CMD+S
    Scope: Empty for all files or what you want
    
  11. stan says:

    I am having the same problem. I am also wondering why TextEdit does not detect text file encoding properly. Auto detection of the encoding fails with TextEdit on 10.6.3 too.
    TextWrangler does here a perfect job without issues.

    For me, I wanted to use TextEdit because it is faster and I just don’t need all the features, that TextWrangers brings.

  12. Sam says:

    In my experience (10.6.3) TextMate is lying about saving in UTF8 (you can set default in settings).
    When you open resulting file in TextEdit – it’s garbled trash. Then when you Cmnd+A and Delete, and paste clean (e.g. Cyrillic) text you copied from same file from TextMate – it’ll tell you:
    “This document can no longer be saved using its original Western (Mac OS Roman) encoding.”

    I could only make it work with UTF16 (if saving directly from TextMate).
    Though it works w/UTF8 if you force : “Save As” with UTF8 and then “Replace” while you edit in TextEdit.

  13. Sam says:

    Forgot to tell you extent of the problem:
    Not only it’s a trash for TextEdit and QuickLook.
    SPOTLIGHT doesn’t see it as well.
    And that’s a f*&%ng big deal !!!
    So it seems it’s not Apple’s fault here.

  14. Sam says:

    well I figured you could apply [xattr -w com.apple.TextEncoding ‘UTF-8;134217984′ ] to all your files at once – just grab them all and throw at Terminal, and boom ;))

    So maybe my stupid TextMate will live another day 8))

  15. Pingback: indefinite hiatus | Bloody Fingers

  16. If you’re having this issue, please report the bug to Apple!

    Here’s the text I’ve used as bug description, feel free to copy-paste: http://paste.pocoo.org/raw/2uNvvKWWEpa1i4m66SYj/

  17. Sylow says:

    Hi,
    I have the same problem. One possible fix is to change the content of the file ~/.CFUserTextEncoding to
    0x08000100:0
    This sets the default encoding to UTF8 (first number 0x08000100) and the language to English (second number 0). The default content is 0:0 (MacRoman and English).
    Then you should restart the session and hopefully Quicklook will correctly read the utf8 files.

    More details here:

    http://superuser.com/questions/82123/mac-whats-cfusertextencoding-for/82194#82194

    I hope this can help.

    • Sylow says:

      In fact, for .txt files it is easier since QuickLook uses the preferences of TextEdit: thus it’s enough to tell TextEdit that the default encoding is UTF8 for opening and saving files (instead of “automatic”).

      As to the previous solution, it seems that it causes troubles with other applications such as Firefox and Thunderbird (they won’t start…). Too bad.

      • FWIW, I’m not having any issues with the solution you suggested (echo "0×08000100:0" > ~/.CFUserTextEncoding). Firefox still starts up just fine.

  18. iyepes says:

    I Have Lion and I just have found the same issue, I’m still looking for a way to solve it, since y make cross edition of the same files between Mac and Windows.

  19. iyepes says:

    I read on this Dropbox forum http://forums.dropbox.com/topic.php?id=20339 about turning off smart quotes to solve it, but I’m so newbie to OS X that I don’t know how to do it. Any help is appreciated

    • iyepes says:

      Please excuse me if I ask silly questions, I’m totally newbie on this. That echo “0x08000100:0″ > ~/.CFUserTextEncoding must be written in a command line or where? How do I get the command line?

      Thanks

    • iyepes says:

      I did it editing the file with Vim, since the command line answered “0×08000100:0″: command not found. However after reboot I received a message of file not being UTF8 and it didn’t open.

      I returned to 0:86 which is the original content of the file, they open but still accents aren’t recognized.

      Any other suggestion? these files come from a windows machine.

    • iyepes says:

      Hi, Same behavior, after changing .CFUserTextEncoding it continues saying files are not utf8 and they don’t open.

  20. iyepes says:

    It finally worked, I did several things.

    1. Updated .CFUserTextEncoding to 0×08000100:0 and reboot
    2. installed dropbox Latest Forum Build – 1.2.34 http://forums.dropbox.com/topic.php?id=44179&replies=60 reconnecting account (old folder was renamed).
    3. Changed TextEdit preferences on opening to normal text files to Occidental Windows Latin 1

    Thanks

  21. Josie says:

    Hello to every body, it’s my first visit of this web site;
    this webpage carries remarkable and genuinely good data for visitors.

  22. My family every time say that I am wasting my time here at net, except I know I am getting know-how every day by reading thes pleasant articles or reviews.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s