Weird utf-8 bug in QuickLook: it’s the EA

A while ago, I noticed a weird bug affecting the way QuickLook on Leopard showed characters with french accents, while being careful to save in UTF-8 from TextMate:

wrong characters showing up in quicklook

TM’s on left, QL on the right. Unix deities seemed to confirm that TM was not to blame:

$ file test.txt 
test.txt: UTF-8 Unicode text
$ cat test.txt 
é ç ë

On the other hand, opening test.txt with TextEdit gave the same result as QuickLook — messed up characters. If I fixed the characters in TextEdit and saved, the display of this particular file was always correct from then on (even if I modified it with TM afterwards). Weird.

Apart from a grand total of one (1) related and unanswered thread at Apple Discussions, a Google search for “Quicklook utf8” or “Quicklook unicode” turned out nothing – so at first it seemed like there were only two people on the entire planet affected by this bug (well, three). By being a little more creative with my keywords, however, this post by Nico Weber on a vi-related thread turned up:

Indeed, if you check the two files with `ls -l`, you’ll see that the
file written by TextEdit has an “@” (that means it has an extended
attribute). Now, if you do `ls -l@` or `xattr -l filename`, you’ll see
that the TextEdit file has the com.apple.TextEncoding attribute set:

Macintosh-2:b nico$ xattr -l texteditfile.txt
com.apple.TextEncoding: UTF-8;134217984

The file written by MacVim does not have this attribute. If you add it
(`xattr -w com.apple.TextEncoding ‘UTF-8;134217984’ macvimfile.txt`),
the file shows up correctly in Quicklook (and in TextEdit too; it
didn’t do that before)

Applying the `xattr` command on test.txt did the trick: non-ASCII UTF-8 characters now show up fine in QL.
Characters showing up fine in QL after xattr
So what gives? It seems pretty harsh to me to demand extended attributes just to have the encoding right, especially when all 3-rd party programs are handling the matter perfectly fine without them, thank you very much. I don’t feel like applying xattr on all my text files either. I also don’t understand why the issue is not more widespread, i.e. why nobody talks about it. I feel like maybe this bug only appears with a specific combination of tools and locales? Should TextMate set the relevant attributes? I’m puzzled.

Advertisements
Posted in Technology | Tagged , , , , , , | 31 Comments

Finally: Firefox 3 brings inline PDFs on mac

It took a while. Doesn’t want to be installed on minefield builds, though (via MacOSXHints).

Posted in Technology | Tagged , , , , | 1 Comment

Bye Matlab, hello Python, thanks Sage

For the past two months or so, I’ve been slowly migrating my scientific workflow (that’s a fancy way of saying “my chaotic data hacking”) from Matlab ((R) (TM) (C)) to Python. The results are overwhelmingly positive, so I’d like to rant about it a bit. First, some background.

My work typically involves the analysis of tons of remote sensing observations contained in files of various formats (netCDF if I’m very lucky, HDF if I’m lucky, some weird non-standard binary thing if I’m not); all these files span terabytes and terabytes of hard drive space stored in racks in a big temperature-controlled room somewhere high in the sky. I ssh to a central server on which all these drives are mounted; I then usually run there code in whatever language is the most convenient to analyze the data.

Why Matlab

After a few years of this, Matlab emerged as the best solution for several reasons:

  • interactive sessions let you play with the data and make the analysis algorithms “evolve” (the analysis procedure is often not cast in stone and writes itself as I go along and understand the data better);
  • the syntax is well-suited to work with numerical arrays (ie vectorized code, something also present in f90 but where it sometimes gives buggy results);
  • powerful input/output facilities, reading netCDF and HDF is as easy as ncload file.nc or hdfread(‘file.hdf’, ‘some_variable’), without all the administrative overhead of compiled languages (memory management, static typing, etc). This is an important point, in Fortran it often takes me as long to get the I/O right than the actual algorithm;
  • powerful plotting capabilities give you immediate visual feedback following your choices.

For me, this means I usually get results quicker with an interpreted language like Matlab’s, even taking into account the higher speed of compiled code like Fortran. A nice side-effect is that working suddenly becomes a lot more enjoyable when I don’t have to spend so much time remembering all the Fortran idiosyncrasies, the differences between compilers (will this code work with ifort/gfortran/g95/pghpf/etc?) and which libraries to link to, fixing messy mixes of f77 and f90 syntax to give a predictable output, etc. Once I run the actual program, I know it would have been faster using a compiled language, but it would have taken me longer to get right and the coding would have been a lot less fun.

Why !Matlab

Now, the problems. Matlab is not free as in speech, meaning you often can’t see the code. Matlab is not free as in beer, meaning our institution owns a limited number of licenses, meaning that during student rush hours you often can’t even launch Matlab at all. The initial goal of Matlab was the analysis of matrices (hence the “mat”), not general arrays, which makes the code look weird in places and explains the FUCKING SEMICOLON you have to append to every instruction to prevent millions of numbers no human will ever be able to read to flash in front of your eyes. The fact that you cannot launch standalone Matlab scripts without fishy syntax like “matlab -nodisplay < script.m”. Because of this (mostly because of the non-free), for some time now I have been looking for a replacement. I’ve tried R, Scilab, Octave, and tons of other stuff, but every time I’ve found the language and the plotting capabilities to be worse when I was hoping for, at least, similar (I guess I was also somewhat reluctant to learn another closed-system language).

But somewhere I always hid a secret wish… to use Python. I love its syntax, focus on simplicity and readability, but it lacked by default any capability for serious number crunching, so I had been patiently waiting from the sidelines for the maturation of Python packages for scientific work. Well, the stars are now aligning.

A year ago or so I took another look at Python’s scientific stack. I liked what I saw: everything good in Matlab (see above), without the annoyances, and free. But trying to get things running I got lost in the mess of version numbers and a never-ending chain of interdependent packages, which is even more fun when you have no root access and the machine you’re using comes with Python 1.5.2 and that’s it, no chocolate for you. Basically, you have to recompile everything by hand, and make sure you don’t forget that crucial compilation flag somewhere! Unfortunately I had other things to do (like actual work), so I reluctantly let go of the idea and stuck with Matlab.

Then came SAGE

Fast-forward to 2 or 3 months ago, when I stumble upon SAGE. SAGE (apart from being a RSS aggregator for Firefox and a satellite instrument) is basically a wrapper around Python with tons of scientific packages added, all nicely pre-compiled into tasty binaries just for you by very nice people (which involves tons of work, not as simple as it sounds). These goodies come in gzipped tarballs that you dump into your $HOME. You can then launch the sage program, which handles regular Python just fine and includes all the modules I was longing for: NumPy (easy, efficient handling of huge numerical array with slicing and dicing), SciPy (input/output and scientific functions), Matplotlib (lots of plotting tools with lickable, anti-aliased output and a syntax almost identical to Matlab)! Even IPython is there, meaning you get a comfortable interactive experience with tab completion on files, objects, dictionaries and tons of other niceties! Since SAGE lets you install additional packages with a single command, it’s a piece of cake to add wxPython to get direct-to-screen plotting within your interactive session. Apotheose! Great success! Matlab without Matlab. AND it’s Python, meaning you’re using an actual, REAL language with object-oriented programming, introspection, dictionaries, etc. And since Python fits your brain, the first code you come up with is most likely the right one. As a bonus, SAGE is available for linux (32/64), Mac OS X and even Windows (I think) so your code will work everywhere! Bliss.

The best part was when I realized, a few hours later, that I actually didn’t need to use the SAGE program itself… inside the SAGE directory lies a local/ folder containing all the binaries, libraries and Python packages it used. It even contains its own Python 2.5! Set the PATH, LD_LIBRARY_PATH and PYTHONPATH environment variables right and suddenly you have a perfectly consistent installation of everything that’s needed to do scientific work in Python! Other users on the same machine just need to change the same variables, and they can play too! Apotheose²! So in addition to its primary goals of providing a replacement for Mathematica/Maple/etc, SAGE, as a side-effect, provides the whole Python scientific shebang compiled and  wrapped up in a nice package, for your pleasure.

Since Python is pretty smart, new Python modules will then install themselves in the right place with python setup.py install. So go ahead: install Basemap, netcdf4-python, PyHDF, scipy-cluster, PyNGL, whatever you need.

(Sidenote: Other “integrated” Python distributions with a similar focus on scientific analysis are starting to pop up, like Python(x,y) or the Enthought Python Distribution. Travis Oliphant, one of the major architect of the recent NumPy restructuration, is now president of Enthought. They also hosts the SciPy website; you can’t get more central than that. Interesting stuff should happen there soon. They are a little too window-centric, though.)

Success

Migrating my last work project from Matlab to Python has been a success: all the figures in my last paper were generated in Python, they look almost exactly the same as the ones generated in Matlab (just as good or better — fonts are noticeably nicer thanks to anti-aliasing), and the code is as small and feels better. It seems like the only thing you could miss from Matlab are its numerous toolboxes, something which is slowly getting fixed within SciPy (I don’t actively use them so I don’t care). Adieu point-virgule!

Of course now that I’ve been bitten by the Python bug, I’m starting to follow the NumPy, SciPy and Matplotlib mailing lists. Some great things are afoot, like the imminent NumPy 1.1 (previously 1.0.5, including shiny masked arrays, histograms and I/O), the release of Travis Oliphant’s Guide to NumPy in august 2008, lots of integration and standardization efforts between the various components, etc.

I guess the best thing is that it made me excited again about the idea of hacking stuff…

Posted in Computer, Python, Science, Technology | Tagged , , , , | 75 Comments

Firefox bug 406730 has been fixed

As the title says. Now we can have correct background window appearance on Mac OS X, which was the number one reason to prefer Safari over Firefox 3 according to DF.

I can’t wait for the next nightly build and the associated Grapple Yummy, which will have many other changes (dixit Arronax). Sad, I know.

Posted in Technology | Tagged , , , ,

Scientific talks, “Lessig style”

Nice write-up on using the “Lessig style” of presentations for scientific talks. I especially liked this bit:

I reused slides frequently — even if just to flash the slide before them — in order to remind them of what they’ve seen and to draw connections to previous points. I did this because nobody remembers anything ever, so relying on people remembering a previous point — for which they were probably looking at their watch rather than paying attention to you! — is a sure way to lose people and make them hate you. I found that reusing images was a nice way to help people draw connections between what they knew from my introduction to current topics.

“nobody remembers anything ever”, indeed (myself included).

Posted in Science | Tagged , , ,

High CPU usage after Office 2008 upgrade

After installing tonight the update for Office Mac 2008, my CPU started going crazy. The fans went off on my powerbook, which they never do except if I’m compiling wxwindows. Investigation showed that, as always when the CPU goes mad with no good reason, the mds and mdworker processes (which are part of spotlight) were hard at work. I was afraid I was getting bit by a runaway syslogd process (a bug I already had encountered on another machine), to which there is no clear fix at this time. 
In this thread I found the great ls_usage command, which showed that spotlight was really, really busy indexing all kind of stuff inside the various packages that compose Office 2008. Other reports confirm this. Since the process had already taken more than 45 minutes, and that there was no clear indication that it would ever end soon, I removed the following files from the spotlight indexing in system prefs : 
/Applications/Microsoft Office 2008  /Library/Automator   /Library/Application Support/Microsoft

Of course, these locations are never gonna be indexed by Spotlight in the future, but since there’s almost no chance that I’ll ever have to search for stuff there, it should be fine, right? Right?I guess that if I had a faster machine I would just have waited a little longer for Spotlight to do its thing… So the shorter advice if this happens again could be “wait longer”.

 

Posted in Technology | Tagged , , , | 2 Comments

Papers → Bibdesk

Another nice feature of Papers (and Bibdesk) I just found out: when you export the bibliography in bibtex format, and open it in Bibdesk, the PDF files you attached in Papers (and keywords) show up in Bibdesk.  Papers in BibdeskNice. 

Posted in Technology | 3 Comments

Word 2008 = bibliography mess

After my last post about the joys of using Papers together with Word 2008 to create nice bibliographies, I tried to actually use the thing during the writing of an actual paper. I’m sorry to say I have to agree with the answers to the previous post: the bibliography tool in Word 2008 is a sad joke.

The Papers part, I have no problem with. Right now it’s the best tool to manage a lot of scientific papers,  the built-in search and organizing tools are definitely useful. But as soon as you export a bibliography to Word 2008, all hell break loose.First, something I don’t understand: when you open the Bibliography pane in Word, it’s empty. You first have to go into the “main source” and copy the references you want to use into the current document, THEN they appear into the pane. This is an unnecessary step: once in the document, you’ll have to select the references you want to use anyway. And I ended up copying the entire reference list anyway (I don’t want to have to *choose* which references I need beforehand, what’s the point of an integrated reference manager otherwise ?) — all of ~700 references.

The reason why you have to do that extra step becomes clear once you try to add a reference into the document by double-clicking on its entry in the Bibliography pane: it’s extra super-slow. My powerbook beachballed for what seemed like an eternity before the reference actually showed up. Moreover, the Bibliography pane is very small, so you don’t see a lot of details about items — if you have ten papers by the same first authors you’re out of luck telling them apart. You have to scroll within the list, vertically to find the first author and horizontally to check the publication years. There’s no built-in search. You can’t organise the references in folders, it’s just a flat list of 700 items. Switching to the Bibliography pane beachballs. All in all, finding the right reference is pretty hard, when it shouldn’t be.

I must admit that at this point I was already feeling discouraged — I couldn’t picture myself going through this ordeal for all ~30 references I wanted to add. But when I tried to create the actual bibliography inside Word, the “Insert/Document items/Bibliography” menu item was grey and inactive. I tried several things to get Word pick up the fact that I had actual references in there, but nothing worked.To sum up, a pretty disappointing experiment.That’s when I realized that all the other tools I had to manage references (Endnote, Zotero and the others) don’t work with Word 2008. Sucks. So I ended up going through all the references manually, using the “copy as reference” tool of Papers to create the bibliography.

2008, indeed. I had an easier time using latex back in 1998.

Check out the comments for other Word 2008 niceties.

Update: Thinking about it, now that I can’t envision using Word 2008’s bibliographic tools, I don’t see any reason using it at all. No reason to wait for Microsoft getting their shit together. The copy-n-paste of references, I can do it in Pages, and I will do it faster and with better style.Bleh.

Posted in Technology | Tagged , , , , , , | 19 Comments

Papers + Word 2008 = Bibliography heaven

Papers is a Mac sofware aimed at simplifying the management, searching and reading of scientific litterature. It’s more or less iTunes for your science papers — you can browse through them by authors or journals, include all the relevant information (e.g. an author’s contact address or email, a journal homepage URL) and “match” a paper PDF with its bibliographic information using a built-in search facility that connects to bibliographic databases such as Web of Science or Google Scholar. It includes a nice full-screen PDF viewer. Papers has already been reviewed at The Apple Blog or Ars Technica, and a slideshow explaining its development can be found here.

Unfortunately, until now you couldn’t actually create a bibliography for your own writing efforts using Papers — you still had to rely on additional software such as Sente (my favorite), Bookends, BibDesk (latex-oriented but versatile) or even Endnote. This meant you had to manage two separate bibliographic databases, which requires duplication of effort and information, which leads to errors, confusion, chaos, depression and, eventually, rejected articles.

But everyone should rejoice, as the new beta version (1.7) of Papers, apart from a very nice generally speedup (obvious on my Powerbook G4), now includes the option to export a selection of papers into Word 2008 as elements of the main bibliography source:

papers_export_bib

Once this is done, the papers appear in Word 2008’s citation manager:

word08_citation_manager

As the manager itself says, double-clicking an element will insert a citation at the typing point into the Word document. Creating the final bibliography is just one click away:

word08_insert_bibliography

End result: a very nicely-formatted bibliography.

word08_bibliography The quoting style can be changed over the entire document and the bibliography itself on-the-fly by simply selecting another style in Word’s citation manager.

From my (admittedly limited) testing, Papers 1.7 and Word 2008 appear to be a very strong end-to-end combo for the management and use of scientific bibliography during writing. Of course, this means using non-free, proprietary and (worse of all) Microsoft software, so this solution might not work for everyone, but it’s worth a try.

(for those wondering if they should upgrade Word, it is worth it if only for the fact that Command-left and Command-right now go to the beginning and end of a line, as they should.)

Posted in Apple, Computer, screenshot, Software, Technology | Tagged , , , , | 28 Comments

Google Talk in iChat behind a firewall

Note : this is an updated and (somewhat) tightened version of this hint at macosxhints for my personal reference. If it helps someone, great.

At work my computer is blocked from the outside world by a proxy. Here are the steps required to make iChat (Leopard) able to connect to Google Talk anyway.

1. Download the last version of proxytunnel. Version 1.8.0 compiles out of the box on Leopard, as long as you have the developer tools installed. Copy the proxytunnel binary somewhere nice, like /usr/local/bin.

2. In a terminal window, run

sudo /usr/local/bin/proxytunnel -a 5223 -p cache.bofbof.fr:8080 -d talk.google.com:5223&

(of course, replace cache.bofbof.fr by the actual URL or IP address of your proxy server)

3. In iChat Leopard, configure your google chat account as follows : server = localhost, port = 5223, Use SSL = yes, everything else ticked off.

After doing this, iChat should connect to your Google Talk account, and away you chat. If you are on a laptop, you’ll have to revert the iChat server setting to the regular talk.google.com when you leave the proxy jail, and change it back every time you come back to work (I guess proxytunnel can be left running without problems).

The original hint suggests a setup which takes care of doing all the heavy lifting through a customized /etc/hosts and Applescripts  — they works pretty well as-is, although 1) the proxytunnel syntax will have to be updated to look like what’s above when using 1.8.0 and 2) there’s no need to call “nslookupd –flushcache” on Leopard AFAICS.

After all these efforts, 2 nice bonuses to reach chat nirvana : 1) Chax will give you a nice unified contact list and Growl notifications in iChat, 2) How to chat with MSN, Yahoo and other contacts through Google Talk.

Posted in Apple, Computer, Mac, Technology | Tagged , , , , , , | 13 Comments