A Little Optimization

On occasion I need to search for differences within large binary files. I have handy a program for doing just this, originally written by Dintiradan, but when he wrote it, he was in a hurry and used Java. The result was not all that speedy. I rewrote it, using C++, but I too was in a hurry, and transliterated the source code rather simplistically, so it was faster, but hardly meeting its full potential. Yesterday, I decided to improve upon that a bit.

The first step was to take a look with Shark and find out what the program was spending its time on. Since it was reading very large files and doing only a little computation, I had expected to find that it spent most of its time reading data. However, it turned out that, instead, writing output was taking most of the time. Here's the relevant group of functions:

void outputDiff (long int offset_start, 
  long int offset_end, const vector<unsigned char> &source, 
  const vector<unsigned char> target){
    if(offset_start>offset_end)
        return;
    cout << getString(offset_start) << "," 
      << getString(offset_end) << endl;
    printByteArray(source,"< ");
    cout << "---n";
    printByteArray(target,"> ");
    cout << endl;
}

void printByteArray(const vector<unsigned char> &array, 
  const string &prefix){
    for(int i=0; i<array.size(); i++){
        cout << prefix << getString(array[i]) << endl;
    }
}

string getString(unsigned int i){
    ostringstream buffer(ostringstream::out);
    bool nonZero=false;
    for (int n=2*sizeof(int) - 1; n>=0; n--) {
        char nextChar="0123456789abcdef"[((i >> n*4) & 0xF)];
        if(nonZero || nextChar!='0'){
            buffer << nextChar;
            nonZero=true;
        }
    }
    return(buffer.str());
}

A first glance it's easy to spot that I just made a mistake typing, causing the vector target to be passed by value into outputDiff(), which will get inefficient quickly as the vector gets larger. It tends to be small in most practical cases, but passing by reference is still worthwhile.

The next thing that caught my eye was getString(). Clearly, all it does is write an integer to a string in hexadecimal. However, not only is it not a superb way of doing so (I have the sneaking suspicion that I copy and pasted it from somewhere), but the standard library already provides a way to do exactly this1. Since the only way getString() was used was to have its output written to an ostream, and furthermore all of the numbers being written to that ostream were being written in hex, one might as well just set the stream's number base and let it take care of the work. This has the additional advantage of totally eliminating the ned to call getString(), or any other function, each time a number needed to be printed.

Based on discovering that I could eliminate getString() altogether, I next looked at printByteArray(). Its function is certainly necessary, but it didn't really seem all that worthwhile for it to be a separate function, seeing as it only has three lines and is only called from two places. I could have asked the compiler to inline it, but it seemed most straight forward just to do so myself. While I was at it I altered the loops to use iterators so that the vector's size need not be checked each time through the loop, and there's hopefully less calculation to locate the element in the vector. (The gain from doing so was measurable, but not terribly significant.)

The final change was based on noting that the line being printed for each element in each vector had three elements, the prefix, the number, and a newline, but that the first and last were unchanging could be combined, cutting the number of output operations by a third. So, the final version looked like this:

void outputDiff(long int offset_start, long int offset_end, 
  const vector<unsigned char>& source, 
  const vector<unsigned char>& target){
    if(offset_start>offset_end)
        return;
    cout.setf(ios::hex,ios::basefield);
    cout << offset_start << ',' << offset_end;
    vector<unsigned char>::const_iterator it=source.begin(), 
    vector<unsigned char>::const_iterator end=source.end();
    for(;it!=end;++it)
        cout << "n< " << (unsigned int)(*it);
    cout << "n---";
    it=target.begin(), end=target.end();
    for(;it!=end;++it)
        cout << "n> " << (unsigned int)(*it);
    cout << "nn";
    cout.setf(ios::dec,ios::basefield);
}

Here, then, are the running times of the two versions of the program to compare two 4 MB bitmap images, when compiled with a variety of optimization levels.

-O0, original: 14.9374 seconds

-Os, original: 12.8845 seconds

-O2, original: 12.1210 seconds

-O3, original: 12.0436 seconds

-O0, new: 2.09754 seconds

-Os, new: 1.11492 seconds

-O2, new: .850161 seconds

-O3, new: 0.819352 seconds

I found it interesting that for this program, -O3 actually was the most effective (if only by a small margin). My guess is that this happens because the program is already so small that making the executable code a little bulkier really doesn't hurt it. This is consistent with the fact that -Os doesn't perform all that superbly, as trying to make a small piece of code smaller just doesn't accomplish much, even if successful.

WHile the speedup was fantastic for moderately sized files like the bitmaps I tested with, it was naturally more modest when the input was the >500 MB files I actually wanted to work with. However, the running time was noticeably far shorter, and the program appears to now be limited by the rate at which it can pull data in from the hard-disk, as originally expected.


  1. getString() is made worse by the fact that it does extra work to avoid printing leading zeros, which happened to be an annoyance rather than a feature in the context where I was using it. In particular, it turns the number 0 into an empty string. 

No Comments

Comment on this post