Script Interpreting: Variables and Scoping
I've finally had some time again to work on my script interpreter again, and having done that a bit, I thought I'd take a break and write some more about what I've already done. In an earlier post I described storing values used in a script. But constants alone do not a language make; one also wants variables.
Avernumscipt does have variables, but the restrictions are maddening: each script gets 20 integer variables and 20 string variables, all of which must be global. There are no local variables, and once your used your 20 variables of each type, that's it. Sure, limiting memory usage is important for a game targeted at a wide range of computers, but to 20 strings, which few scripters have much use for, use 20 * 255 bytes = 5.1 kilobytes 1 The 20 integers use only 40 bytes altogether, and scripts often want more of them. Additionally, the lack of local variables is a pain.
I decide that I wanted to relax basically all of the restrictions of Avernumscript outlined above. After all, better to design a powerful system and put limits on it later if necessary.
The basic building block was the VariableTable class. A VariableTable has a list of variables and their names, and an optional 'super table'. The 'super table' allows for variable tables to exist in a hierarchy of scopes. Every expression object should be aware of the VariableTable for its scope, and when it needs a variable's address, it asks that table. if that table doesn't know, but it has a super table, it passes the request on up to the super table. This way, expression only need to know about one VariableTable, but will get access to all variables at wider scopes. More local variables will also naturally shadow less local ones of the same name.
In my first attempt this was exactly how the system was set up, and it did work. Each VariableTable internally stored an STL Map which mapped variable name strings to memory addresses. The trouble was that it was slow when variable accesses were heavy, as they are in many types of code. The solution i came up with was to alter the way that the binding of variable names was handled. As the system was, it was unnecessarily dynamic; looking up the variable's location every single time it was accessed, even though there was no possibility for it to have changed. Under the new system the cost of binding is handled all at once at the time of allocation.
The description is actually pretty simple. Now, not only do Expressions2 know about their associated VariableTable, but VariableTables know about the Expression corresponding to their scope. When an InstExpression[^2] instantiates a variable, it tells it's local VariableTable to create the variable with the desired name. The VariableTable does this, and then turns around to the Expression it belongs to and basically says: "I just added a variable called x, so tell anyone who's interested that that variable now exists at such and such an address." An Expression receiving such a 'message' can store the address if it's of interest to it and passes the message on to any sub expressions. Then any Expression which depends on variables can just collect their addresses, and when it needs to evaluate itself, can either simply use them or throw an exception if an address it needs hasn't been provided (which should never be the case).
This is the definition of VariableTable as matters stand:
class VariableTable
{
protected:
VariableTable* superTable;
DynamicExpression* owner;
varEntry* head;
varEntry* tail;
varEntry* findEntry(const std::string &name);
public:
VariableTable(DynamicExpression* own,VariableTable* sup);
~VariableTable();
VariableTable* duplicate();
void setOwner(DynamicExpression* own);
bool instantiateVariable(const std::string &name, int type);
bool instantiateVariable(const std::string &name, value data);
value* lookupVar(const std::string &name);
void assignToVar(const std::string &name, const value &newVal);
void clearVars();
void setSuperTable(VariableTable* table);
unsigned int getVariableType(const std::string &name);
varEntry* variableList();
};
I used a linked list of custom varEntry structs because actual lookups in the list should never really occur now, and varEntrys a convenient for interfacing with the Expression classes. This is the class derived from Expression from which all Expressions which depend on variables or my contain Expressions which depend on variables are derived:
class DynamicExpression:public Expression{
protected:
VariableTable* varTable;
public:
virtual void setVariableAddress(const std::string &name,value* addr)=0;
virtual void setVariableAddresses(varEntry* vars)=0;
virtual void setVariableTable(VariableTable* table){
varTable=table;
}
};
The setVariableAddress and setVariableAddresses are the mechanism whereby Expressions learn the addresses of variables from VariableTables.
The final result of this reworking of the variable system ended up giving a factor of more than 2 boost in execution speed, so I'm pretty pleased with the result. Hopefully a little later I'll have a chance to write up an outline of the different Expression types and how they work together to form the backbone of an actual script.
-
That is, this memory is spoken for whether any strings are used or not, because it is automatically allocated for each script and can then have variables bound to it. ↩
-
I hope to get a full explanation of the
Expressionclass hierarchy written up soon. For a quick overview,Expressionis an abstract base class for objects which represent individual commands in script. Derived classes such as 'ArithExpression' exist to do particular tasks like arithmetic; theVariableExpressionsimply retrieves the value of a variable, while theInstExpressioninstantiates and initializes a variable. TheBlockExpressionis an important component as it wraps a group of otherExpressions which are executed in sequence. ↩