Finally: Firefox 3 brings inline PDFs on mac

It took a while. Doesn’t want to be installed on minefield builds, though (via MacOSXHints).

Posted in Technology | Tagged , , , , | 1 Comment

Bye Matlab, hello Python, thanks Sage

For the past two months or so, I’ve been slowly migrating my scientific workflow (that’s a fancy way of saying “my chaotic data hacking”) from Matlab ((R) (TM) (C)) to Python. The results are overwhelmingly positive, so I’d like to rant about it a bit. First, some background.

My work typically involves the analysis of tons of remote sensing observations contained in files of various formats (netCDF if I’m very lucky, HDF if I’m lucky, some weird non-standard binary thing if I’m not); all these files span terabytes and terabytes of hard drive space stored in racks in a big temperature-controlled room somewhere high in the sky. I ssh to a central server on which all these drives are mounted; I then usually run there code in whatever language is the most convenient to analyze the data.

Why Matlab

After a few years of this, Matlab emerged as the best solution for several reasons:

  • interactive sessions let you play with the data and make the analysis algorithms “evolve” (the analysis procedure is often not cast in stone and writes itself as I go along and understand the data better);
  • the syntax is well-suited to work with numerical arrays (ie vectorized code, something also present in f90 but where it sometimes gives buggy results);
  • powerful input/output facilities, reading netCDF and HDF is as easy as ncload or hdfread(‘file.hdf’, ‘some_variable’), without all the administrative overhead of compiled languages (memory management, static typing, etc). This is an important point, in Fortran it often takes me as long to get the I/O right than the actual algorithm;
  • powerful plotting capabilities give you immediate visual feedback following your choices.

For me, this means I usually get results quicker with an interpreted language like Matlab’s, even taking into account the higher speed of compiled code like Fortran. A nice side-effect is that working suddenly becomes a lot more enjoyable when I don’t have to spend so much time remembering all the Fortran idiosyncrasies, the differences between compilers (will this code work with ifort/gfortran/g95/pghpf/etc?) and which libraries to link to, fixing messy mixes of f77 and f90 syntax to give a predictable output, etc. Once I run the actual program, I know it would have been faster using a compiled language, but it would have taken me longer to get right and the coding would have been a lot less fun.

Why !Matlab

Now, the problems. Matlab is not free as in speech, meaning you often can’t see the code. Matlab is not free as in beer, meaning our institution owns a limited number of licenses, meaning that during student rush hours you often can’t even launch Matlab at all. The initial goal of Matlab was the analysis of matrices (hence the “mat”), not general arrays, which makes the code look weird in places and explains the FUCKING SEMICOLON you have to append to every instruction to prevent millions of numbers no human will ever be able to read to flash in front of your eyes. The fact that you cannot launch standalone Matlab scripts without fishy syntax like “matlab -nodisplay < script.m”. Because of this (mostly because of the non-free), for some time now I have been looking for a replacement. I’ve tried R, Scilab, Octave, and tons of other stuff, but every time I’ve found the language and the plotting capabilities to be worse when I was hoping for, at least, similar (I guess I was also somewhat reluctant to learn another closed-system language).

But somewhere I always hid a secret wish… to use Python. I love its syntax, focus on simplicity and readability, but it lacked by default any capability for serious number crunching, so I had been patiently waiting from the sidelines for the maturation of Python packages for scientific work. Well, the stars are now aligning.

A year ago or so I took another look at Python’s scientific stack. I liked what I saw: everything good in Matlab (see above), without the annoyances, and free. But trying to get things running I got lost in the mess of version numbers and a never-ending chain of interdependent packages, which is even more fun when you have no root access and the machine you’re using comes with Python 1.5.2 and that’s it, no chocolate for you. Basically, you have to recompile everything by hand, and make sure you don’t forget that crucial compilation flag somewhere! Unfortunately I had other things to do (like actual work), so I reluctantly let go of the idea and stuck with Matlab.

Then came SAGE

Fast-forward to 2 or 3 months ago, when I stumble upon SAGE. SAGE (apart from being a RSS aggregator for Firefox and a satellite instrument) is basically a wrapper around Python with tons of scientific packages added, all nicely pre-compiled into tasty binaries just for you by very nice people (which involves tons of work, not as simple as it sounds). These goodies come in gzipped tarballs that you dump into your $HOME. You can then launch the sage program, which handles regular Python just fine and includes all the modules I was longing for: NumPy (easy, efficient handling of huge numerical array with slicing and dicing), SciPy (input/output and scientific functions), Matplotlib (lots of plotting tools with lickable, anti-aliased output and a syntax almost identical to Matlab)! Even IPython is there, meaning you get a comfortable interactive experience with tab completion on files, objects, dictionaries and tons of other niceties! Since SAGE lets you install additional packages with a single command, it’s a piece of cake to add wxPython to get direct-to-screen plotting within your interactive session. Apotheose! Great success! Matlab without Matlab. AND it’s Python, meaning you’re using an actual, REAL language with object-oriented programming, introspection, dictionaries, etc. And since Python fits your brain, the first code you come up with is most likely the right one. As a bonus, SAGE is available for linux (32/64), Mac OS X and even Windows (I think) so your code will work everywhere! Bliss.

The best part was when I realized, a few hours later, that I actually didn’t need to use the SAGE program itself… inside the SAGE directory lies a local/ folder containing all the binaries, libraries and Python packages it used. It even contains its own Python 2.5! Set the PATH, LD_LIBRARY_PATH and PYTHONPATH environment variables right and suddenly you have a perfectly consistent installation of everything that’s needed to do scientific work in Python! Other users on the same machine just need to change the same variables, and they can play too! Apotheose²! So in addition to its primary goals of providing a replacement for Mathematica/Maple/etc, SAGE, as a side-effect, provides the whole Python scientific shebang compiled and  wrapped up in a nice package, for your pleasure.

Since Python is pretty smart, new Python modules will then install themselves in the right place with python install. So go ahead: install Basemap, netcdf4-python, PyHDF, scipy-cluster, PyNGL, whatever you need.

(Sidenote: Other “integrated” Python distributions with a similar focus on scientific analysis are starting to pop up, like Python(x,y) or the Enthought Python Distribution. Travis Oliphant, one of the major architect of the recent NumPy restructuration, is now president of Enthought. They also hosts the SciPy website; you can’t get more central than that. Interesting stuff should happen there soon. They are a little too window-centric, though.)


Migrating my last work project from Matlab to Python has been a success: all the figures in my last paper were generated in Python, they look almost exactly the same as the ones generated in Matlab (just as good or better — fonts are noticeably nicer thanks to anti-aliasing), and the code is as small and feels better. It seems like the only thing you could miss from Matlab are its numerous toolboxes, something which is slowly getting fixed within SciPy (I don’t actively use them so I don’t care). Adieu point-virgule!

Of course now that I’ve been bitten by the Python bug, I’m starting to follow the NumPy, SciPy and Matplotlib mailing lists. Some great things are afoot, like the imminent NumPy 1.1 (previously 1.0.5, including shiny masked arrays, histograms and I/O), the release of Travis Oliphant’s Guide to NumPy in august 2008, lots of integration and standardization efforts between the various components, etc.

I guess the best thing is that it made me excited again about the idea of hacking stuff…

Posted in Computer, Python, Science, Technology | Tagged , , , , | 75 Comments

Firefox bug 406730 has been fixed

As the title says. Now we can have correct background window appearance on Mac OS X, which was the number one reason to prefer Safari over Firefox 3 according to DF.

I can’t wait for the next nightly build and the associated Grapple Yummy, which will have many other changes (dixit Arronax). Sad, I know.

Posted in Technology | Tagged , , , ,

Scientific talks, “Lessig style”

Nice write-up on using the “Lessig style” of presentations for scientific talks. I especially liked this bit:

I reused slides frequently — even if just to flash the slide before them — in order to remind them of what they’ve seen and to draw connections to previous points. I did this because nobody remembers anything ever, so relying on people remembering a previous point — for which they were probably looking at their watch rather than paying attention to you! — is a sure way to lose people and make them hate you. I found that reusing images was a nice way to help people draw connections between what they knew from my introduction to current topics.

“nobody remembers anything ever”, indeed (myself included).

Posted in Science | Tagged , , ,

High CPU usage after Office 2008 upgrade

After installing tonight the update for Office Mac 2008, my CPU started going crazy. The fans went off on my powerbook, which they never do except if I’m compiling wxwindows. Investigation showed that, as always when the CPU goes mad with no good reason, the mds and mdworker processes (which are part of spotlight) were hard at work. I was afraid I was getting bit by a runaway syslogd process (a bug I already had encountered on another machine), to which there is no clear fix at this time. 
In this thread I found the great ls_usage command, which showed that spotlight was really, really busy indexing all kind of stuff inside the various packages that compose Office 2008. Other reports confirm this. Since the process had already taken more than 45 minutes, and that there was no clear indication that it would ever end soon, I removed the following files from the spotlight indexing in system prefs : 
/Applications/Microsoft Office 2008  /Library/Automator   /Library/Application Support/Microsoft

Of course, these locations are never gonna be indexed by Spotlight in the future, but since there’s almost no chance that I’ll ever have to search for stuff there, it should be fine, right? Right?I guess that if I had a faster machine I would just have waited a little longer for Spotlight to do its thing… So the shorter advice if this happens again could be “wait longer”.


Posted in Technology | Tagged , , , | 2 Comments

Papers → Bibdesk

Another nice feature of Papers (and Bibdesk) I just found out: when you export the bibliography in bibtex format, and open it in Bibdesk, the PDF files you attached in Papers (and keywords) show up in Bibdesk.  Papers in BibdeskNice. 

Posted in Technology | 3 Comments

Word 2008 = bibliography mess

After my last post about the joys of using Papers together with Word 2008 to create nice bibliographies, I tried to actually use the thing during the writing of an actual paper. I’m sorry to say I have to agree with the answers to the previous post: the bibliography tool in Word 2008 is a sad joke.

The Papers part, I have no problem with. Right now it’s the best tool to manage a lot of scientific papers,  the built-in search and organizing tools are definitely useful. But as soon as you export a bibliography to Word 2008, all hell break loose.First, something I don’t understand: when you open the Bibliography pane in Word, it’s empty. You first have to go into the “main source” and copy the references you want to use into the current document, THEN they appear into the pane. This is an unnecessary step: once in the document, you’ll have to select the references you want to use anyway. And I ended up copying the entire reference list anyway (I don’t want to have to *choose* which references I need beforehand, what’s the point of an integrated reference manager otherwise ?) — all of ~700 references.

The reason why you have to do that extra step becomes clear once you try to add a reference into the document by double-clicking on its entry in the Bibliography pane: it’s extra super-slow. My powerbook beachballed for what seemed like an eternity before the reference actually showed up. Moreover, the Bibliography pane is very small, so you don’t see a lot of details about items — if you have ten papers by the same first authors you’re out of luck telling them apart. You have to scroll within the list, vertically to find the first author and horizontally to check the publication years. There’s no built-in search. You can’t organise the references in folders, it’s just a flat list of 700 items. Switching to the Bibliography pane beachballs. All in all, finding the right reference is pretty hard, when it shouldn’t be.

I must admit that at this point I was already feeling discouraged — I couldn’t picture myself going through this ordeal for all ~30 references I wanted to add. But when I tried to create the actual bibliography inside Word, the “Insert/Document items/Bibliography” menu item was grey and inactive. I tried several things to get Word pick up the fact that I had actual references in there, but nothing worked.To sum up, a pretty disappointing experiment.That’s when I realized that all the other tools I had to manage references (Endnote, Zotero and the others) don’t work with Word 2008. Sucks. So I ended up going through all the references manually, using the “copy as reference” tool of Papers to create the bibliography.

2008, indeed. I had an easier time using latex back in 1998.

Check out the comments for other Word 2008 niceties.

Update: Thinking about it, now that I can’t envision using Word 2008’s bibliographic tools, I don’t see any reason using it at all. No reason to wait for Microsoft getting their shit together. The copy-n-paste of references, I can do it in Pages, and I will do it faster and with better style.Bleh.

Posted in Technology | Tagged , , , , , , | 19 Comments

Papers + Word 2008 = Bibliography heaven

Papers is a Mac sofware aimed at simplifying the management, searching and reading of scientific litterature. It’s more or less iTunes for your science papers — you can browse through them by authors or journals, include all the relevant information (e.g. an author’s contact address or email, a journal homepage URL) and “match” a paper PDF with its bibliographic information using a built-in search facility that connects to bibliographic databases such as Web of Science or Google Scholar. It includes a nice full-screen PDF viewer. Papers has already been reviewed at The Apple Blog or Ars Technica, and a slideshow explaining its development can be found here.

Unfortunately, until now you couldn’t actually create a bibliography for your own writing efforts using Papers — you still had to rely on additional software such as Sente (my favorite), Bookends, BibDesk (latex-oriented but versatile) or even Endnote. This meant you had to manage two separate bibliographic databases, which requires duplication of effort and information, which leads to errors, confusion, chaos, depression and, eventually, rejected articles.

But everyone should rejoice, as the new beta version (1.7) of Papers, apart from a very nice generally speedup (obvious on my Powerbook G4), now includes the option to export a selection of papers into Word 2008 as elements of the main bibliography source:


Once this is done, the papers appear in Word 2008’s citation manager:


As the manager itself says, double-clicking an element will insert a citation at the typing point into the Word document. Creating the final bibliography is just one click away:


End result: a very nicely-formatted bibliography.

word08_bibliography The quoting style can be changed over the entire document and the bibliography itself on-the-fly by simply selecting another style in Word’s citation manager.

From my (admittedly limited) testing, Papers 1.7 and Word 2008 appear to be a very strong end-to-end combo for the management and use of scientific bibliography during writing. Of course, this means using non-free, proprietary and (worse of all) Microsoft software, so this solution might not work for everyone, but it’s worth a try.

(for those wondering if they should upgrade Word, it is worth it if only for the fact that Command-left and Command-right now go to the beginning and end of a line, as they should.)

Posted in Apple, Computer, screenshot, Software, Technology | Tagged , , , , | 28 Comments

Google Talk in iChat behind a firewall

Note : this is an updated and (somewhat) tightened version of this hint at macosxhints for my personal reference. If it helps someone, great.

At work my computer is blocked from the outside world by a proxy. Here are the steps required to make iChat (Leopard) able to connect to Google Talk anyway.

1. Download the last version of proxytunnel. Version 1.8.0 compiles out of the box on Leopard, as long as you have the developer tools installed. Copy the proxytunnel binary somewhere nice, like /usr/local/bin.

2. In a terminal window, run

sudo /usr/local/bin/proxytunnel -a 5223 -p -d

(of course, replace by the actual URL or IP address of your proxy server)

3. In iChat Leopard, configure your google chat account as follows : server = localhost, port = 5223, Use SSL = yes, everything else ticked off.

After doing this, iChat should connect to your Google Talk account, and away you chat. If you are on a laptop, you’ll have to revert the iChat server setting to the regular when you leave the proxy jail, and change it back every time you come back to work (I guess proxytunnel can be left running without problems).

The original hint suggests a setup which takes care of doing all the heavy lifting through a customized /etc/hosts and Applescripts  — they works pretty well as-is, although 1) the proxytunnel syntax will have to be updated to look like what’s above when using 1.8.0 and 2) there’s no need to call “nslookupd –flushcache” on Leopard AFAICS.

After all these efforts, 2 nice bonuses to reach chat nirvana : 1) Chax will give you a nice unified contact list and Growl notifications in iChat, 2) How to chat with MSN, Yahoo and other contacts through Google Talk.

Posted in Apple, Computer, Mac, Technology | Tagged , , , , , , | 13 Comments

individual spying brought to you by Sears

Benjamin Googins, an anti-spyware researcher, finds out what really happens when you opt in a Sears “community” program. Basically, your network setup is modified so that ALL YOUR WEB TRAFFIC is being redirected through a proxy owned by a third-party Sears associate called “comScore”. This is much worse than any usual invasion of privacy. comScore (whoever they might be) are able to sit back, relax and look through all your clicking habits, web searches, shopping, tastes in movies, music and books, web-based emails, activities in online communities, political opinions, even banking/tax information! (and I’m not even talking about porn) This is unforgivable. Hiding these despicable acts behind a “community”, taking advantage of the user’s good will to voluntarily participate in human society, nicely adds insult to injury.

(technical digression: what happens when the proxy is down? Are you then cut off from your web access?)

Even better is the answer from a Sears person (who previously worked at comScore, which is funny in a sad sort of way). I am astonished that anyone would actually pretend that installing spyware that intercepts all incoming and outgoing traffic (including secure transactions!) can be defensible in any way. The honourable thing to do would be to immediately take down the proxy, kill this whole mess and grovel in apologies. To argue that the proxy crap is clearly explained on page 10 of the 54-pages user agreement is not a reasonable answer. It’s so obviously and blatantly *wrong* that it boggles the mind. It’s somewhat satisfying — it’s not very often in this day and age that you get the opportunity to witness pure, exposed wrongness that doesn’t try to weasel its way through emotional visuals and marketing hypocrisy.

There should be public uproar.

Posted in Technology | Tagged , , , , ,