Object-Oriented Script Interpreting
I promised (way back in April!) to write more about my script interpreter, so here goes.
What I wanted to talk about today is the basic structure of the interpreter. According to my understanding1 there are two major types of interpreters: byte code based and tree based2. Since I'm making up what I'm doing as I go along, the former seemed uninteresting and potentially hard to do fancier things with, so my approach looks more like the latter, although I'm not sure what I've created resembles what a professional would build.
My solution has been to observe that a program is made up of various types of expressions, such as arithmetic, loops, and conditionals. Therefore the basis of my design is the Expression which (currently) looks like this:
class Expression{
protected:
VariableTable* varTable;
public:
virtual ~Expression(){}
virtual value evaluate()=0;
virtual NewVariableType getType() const=0;
virtual bool hasSideEffects() const=0;
virtual bool operator==(const Expression &e) const=0;
virtual void setVariableAddress(const std::string &name,value* addr)=0;
virtual void setVariableAddresses(const variableMap &vars)=0;
virtual void setVariableTable(VariableTable* table);
virtual VariableTable* getVariableTable() const;
};
The value type is as described back in A Use For Unions and an old version3 of the VariableTable type was shown in Variables and Scoping. The real workhorse function here is evaluate() which orders the Expression to do whatever it does and return the result. There are then a set of concrete subclasses4 which do different tasks: a VariableExpression is bound to a variable and returns its value, an ArithExpression does an arithmetic operation on the values of two operands, a WhileExpression re-evaluates its contents as long as its condition is true, and so on. Note that in most of these cases there is a natural usage of recursion; the operands of an ArithExpression may themselves be expressions (and thus Expressions) as may be the body and condition of the WhileExpression. Via the magic of polymorphism all of these cases can be handled through the generic interface of Expression, and another union based type makes it possible for many of these places to contain either an Expression or a constant as needed.
Putting this together we can decomposed a simple script into the types which will make it up. Take this simple script, which I use for speed benchmarking5 :
//speed_test.txt
beginterrainscript;
variables;
int i, j;
body;
beginstate INIT_STATE;
break;
beginstate START_STATE;
j = 0;
i = 1;
while(i<10000){
j = (i + i) + (i % 2) - (j / i);
if(j == 3){
j = j + 1;
}
else{
j = j - 1;
}
i = i + 1;
}
break;
Pretty much only what's inside the START_STATE (being like the main() function of a C program) is interesting, most of the rest is cruft required to make the old interpreter understand the script with the exception of the variable declarations. Inside the START_STATE all of the code is grouped inside a single BlockExpression, whose purpose is to hold a set of subexpressions and execute them in order. Each of the assignment lines becomes an AssignExpression which takes sets the variable to which it is bound to the the given value. The while loop and if/else statements become a WhileExpression and CondExpressions respectively, and their conditions are ArithExpressions set to perform the appropriate comparative operations. All of the mathematical operations also become ArithExpressions. That's really about all there is to it, which means that running the START_STATE is as simple as telling the outermost BlockExpression to evaulate(), which in turn executes all of the rest as needed.
There are, of course, a number of more complicated Expression subtypes which aren't needed by this simplistic script, in particular the various subtypes of FunctionExpression which allow for calls both to functions exposed from C++ and written in the script. With this framework in place, most of my effort now goes into teaching the parser how to assemble Expression trees from source code, and making each type of Expression do its job as efficiently as possible.
-
I've never taken a class in this stuff, and haven't gotten my hands on a book, much though I've meant to get around to it. ↩
-
Particularly from what I've picked up from the discussion of Javascript interpreters, it appears that simple tree-based approaches like mine are common but losing popularity as they are simple to envision but their performance is limited. See the Webkit Blog for a good description. Bytecode implementations are somewhat favored, but pure versions seem to be being phased out in favor of generating actual native executable code with real Just-in-Time compilation. I can only assume that technologies like LLVM will make this more prevalent. Sidenote to the footnote: As I understand it much of LLVM's power comes from the fact that it generates bytecode or native code from the trees which contain detailed information about the source code. This is what allows it to bring the efficiency of optimizing compilers to interpreters. ↩
-
I've made a lot of changes to the
VariableTablebecause it had a lot of unnecessary functions, and for some reason the first time I wrote it I built it around a linked list. I'm still scratching my head over that one because I can't figure out why I thought it was a good idea. It wasn't all that bad, per se; since the data structure gets used for storage and only rarely for lookup there wasn't much of a performance impact, but it didn't provide anything useful either. I've got an std::map now, back to where I was when I wrote it the first time. ↩ -
The
DynamicExpressiontype, referred to in Variables and Scoping no longer exists; I realized that there were noExpressions which were not alsoDynamic Expressions, so all that was accomplished by having that type in the hierarchy was adding an extra vtable pointer and requiring a bunch of extradynamic_casts. So, I rolled all of the functionality ofDynamicExpressionintoExpression. ↩ -
This script is also completely valid Avernumscript, so I can run it with both the old and new interpreters to see how well they perform compared to each other. ↩