On the Format of Pages Documents
This semester I have been writing down my vocabulary lists for my German class in Pages documents. I've found Pages support of multiple languages far superior to that of MS Office, and I have been quite happy with using it for this purpose.
Tonight however, I am supposed to be writing a report. I was going to put in a word that I've known for years in German, but suddenly recalled that i thought we had learned a more sophisticated synonym in our vocabulary recently; I just couldn't quite remember it. Good thing I have all those vocabulary lists, right? Well, I now have eleven separate vocabulary lists, each in a separate Pages document. rather than opening them all, I thought, "Aha! a task for grep!" Being aware that Pages documents are actually packages, I set grep to recursive, and set it loose searching for the english definition I remembered writing down for the target word. I was much puzzled when no results were returned. perhaps i was remembering wrongly how I had translated that word? I searched again for a word that I was certain was on one of the lists. still nothing. I searched for a string that should have come up many times, "to", and finally got some results. I was informed that Binary file ./Kurzbericht 1 final draft.pages/index.xml.gz matches. Hm, not a vocabulary list, but still a pages document . . . ".gz"? What's all this then?
It seems that someone at Apple got fiendishly clever. Pages writes out its data very verbosely in easy to parse XML in the index.xml file of the document bundle. But all those XML tags make it a bit bloated, so how to keep it's size down? GZip! It does do the job quite handily, as in the case of one typical short file I examined, the entire .pages bundle was only 32 Kb despite the decompressed index.xml being 196 Kb. That's not a bad savings.
So, this explains a few things: Why pages needs to show a progress bar (a fast progress bar, but it's still there) when loading and saving documents, but also why I can't search multiple pages documents at once. While this clever storage trick will be a big help to most everyone, it's still a bit of a hindrance to me, in this case.