Making More Work for Myself
I just realized a way to make the program I'm writing for work significantly niftier, and hopefully more useful as well. The program's job is to read an records out of an index file, and for each record open and read a separate file of space delimited data, then to write new record of a different type describing that data to a different file, and finally to overwrite the space delimited data file with a new version which has an extra column of data added. At work today I spoke with the other student who's writing the driver script that runs the various processing programs, including mine, and he commented on his dissatisfaction with the way that we generate thousands of tiny files, process them, write thousands of new files and delete the first set, then repeat this process a couple of times over. It's slow, in no small part because it thrashes the filesystem like crazy.
What I realized just now is that I can cut out one 'thousands of tiny files, which then get deleted' step by subsuming the step after mine into my program: Since my program runs last of the real processing, the step after it is to pile all of the resulting tiny files into a tarball. So, I might as well just write the tarball directly; it's not like the tar format is difficult1, and I happen to already have a chunk of code for generating tar header blocks.
Unfortunately, before I can implement this fun idea, I have to first figure out why my code is overwriting several of the sampled waveforms with sets of consecutive integers and fix it. This will also probably keep me up later tonight than I should be, doing something that doesn't strictly need to be done, but it's so much more interesting than trying to solve electrostatics problems.
Update: 4-ish hours later, and it works. Knew I would keep myself up way too late. Determined that what had appeared to be a bug wasn't, the data was just screwed up subtly to begin with, the tar format pads out each contained file's data to a multiple of 512 bytes (duh, that's what all that business about 512 byte blocks means, yet it somehow took me an hour to grasp this), and that file permissions come out really weird if you don't set a UID and GID.
-
I've found this summary most informative. ↩