On Perl

Today and yesterday I had my first significant exposure to Perl. Despite my initial thought that I would like Perl very little, I find it to be okay, and I in fact I like it a good deal better than Python.

My encounter yesterday was with a Perl script intended to read text files of weather balloon data from the South Pole and do some simple calculations with the data to arrive at a representative temperature value. I was called over to help out the person working on the script, who was himself fairly new to Perl, and was having some difficulties with logic and handling formatting. I was happy that I was able to help him out a good deal despite having never really seen Perl before, and within an hour we had the script slicing and dicing data satisfactorily. However, as we went I was struck by the impression that Perl is so well geared towards manipulating strings that it was not really a very good choice for handling simply formated numerical data.

One of the stumbling blocks we encountered was that the data had been printed into its files neatly in fixed width columns. The first column contained time values, measured in some unit, assumedly seconds, which began at 0 and ran into the thousands. This meant that the first several hundred lines, those for time values up to 999, had a bit of leading whitespace, later lines, those with four digit times, did not. My friend's parsing code was thrown off by this, as when he split the line at sections of whitespace, sometimes there was an extra empty string at the start of the resulting array, and sometimes not, which would then offset all of the actual data values, making the time be read as the altitude, the altitude be read as the pressure, and so on. It wasn't too hard to fix by adding a regular expression to trim leading whitespace off of the line, and I'm sure that there are a number of other fairly simple ways.

A few minutes later, with the Perl script done, we wanted to plot the results. In astrophysics it seems that everyone uses ROOT1, so we grudgingly rolled up our sleeves to write a C++ ROOT macro to load and plot the data. The thing was, aside from the annoyance getting ROOT to plot things properly, it took far less time and far less code to write the C++ macro to read the formatted data (the output of the Perl script was in virtually the same format as the input). Since we knew with absolute certainty that all the files contained was columns of numbers, we had merely to use the >> operator repeatedly to load in the data. Whitespace on the lines, in the form of newlines, or wherever didn't trouble us at all. My friend gave voice to what I was thinking, say that this seemed so much easier in C++ than Perl; indeed, it was.

Today however, I realized that a task I was working on was perfectly suited to Perl. I had a directory containing a few dozen subdirectories, each filled with skeleton HTML files. I'd written a program that had spat out all of the skeleton files months ago and I'm been systematically writing the actual content for them. Unfortunately, in the mean time I had realized that there were some improvements I wanted to make in the basic HTML structure I was using. I had been doing the changes by hand for each file when I got to it to insert the content, but I suddenly realized that sed could do the same thing a million times faster. After about a minute of work i had a sed command constructed that would automatically correct the flaws in one of my HTML files. The trick then was to run that command on each file in turn. I spent two hellish hours trying to do that. The problem was that many of my nice human-friendly subdirectory names contained spaces. Bash gets very picky about the handling of spaces in names, particularly in shell scripts. After two hours I was quite fed up that what had seemed like a simple task was proving virtually impossible. it then occurred to me that Perl, being based on sed, would very happily apply regular expressions for me, and would let me handle filenames with spaces far more easily. Since I had never written a perl script before in my life it took me about half an hour, but that time was spent learning how to do new things that work, not discovering useless methods that don't work so it was far less onerous. In the end I had my dream script: it require mealy to be pointed at the top of my directory structure and then went forth, fixing each HTMl file in place. Perfection.

The conclusion I drew from this is that Perl is very much a special purpose language. I would not want to build a tool of any great complexity in Perl, but a simple tool based on manipulating strings is realized in Perl with great ease and convenience.


  1. I'm not very fond of ROOT. At all. Remind me to rant about it sometime. 

No Comments

Comment on this post