21

Global Variables

If only there could be an invention that bottled up a memory, like scent. And it never faded, and it never got stale. And then, when one wanted it, the bottle could be uncorked, and it would be like living the moment all over again.

Daphne du Maurier, Rebecca

The previous chapter was a long exploration of one big, deep, fundamental computer science data structure. Heavy on theory and concept. There may have been some discussion of big-O notation and algorithms. This chapter has fewer intellectual pretensions. There are no large ideas to learn. Instead, it’s a handful of straightforward engineering tasks. Once we’ve completed them, our virtual machine will support variables.

Actually, it will only support global variables. Locals are coming in the next chapter. In jlox, we managed to cram them both into a single chapter, because we used the same implementation technique for all variables. We built a chain of environments, one for each scope, all the way up to the top. That was a simple, clean way to learn how to manage state.

But it’s also slow. Allocating a new hash table each time you enter a block or call a function is not the road to a fast VM. Given how much code is concerned with using variables, if variables go slow, everything goes slow. For clox, we’ll improve that by using a much more efficient strategy for local variables, but globals aren’t as easily optimized.

A quick refresher on Lox semantics: Global variables in Lox are “late bound” or resolved dynamically. This means you can compile a chunk of code that refers to a global variable before it’s defined. As long as the code doesn’t execute before the definition happens, everything is fine. In practice, that means you can refer to later variables inside the body of functions:

fun showVariable() {
  print global;
}

var global = "after";
showVariable();

Code like this might seem odd, but it’s handy for defining mutually recursive functions. It also plays nicer with the REPL. You can write a little function in one line, then define the variable it uses in the next.

Local variables work differently. Since a local variable’s declaration always occurs before it is used, the VM can resolve them at compile time, even in a simple single-pass compiler. That will let us use a smarter representation for locals. But that’s for the next chapter. Right now, let’s just worry about globals.

21 . 1 Statements

Variables come into being using variable declarations, which means now is also the time to add support for statements to our compiler. If you recall, Lox splits statements into two categories. “Declarations” are those statements that bind a new name to a value. The other kinds of statementscontrol flow, print, etc.are just called “statements”. We disallow declarations directly inside control flow statements, like this:

if (monday) var croissant = "yes"; // Error.

Allowing it would raise confusing questions around the scope of the variable. So, like other languages, we prohibit it syntactically by having a separate grammar rule for the subset of statements that are allowed inside a control flow body:

statement       exprStmt
               | forStmt
               | ifStmt
               | printStmt
               | returnStmt
               | whileStmt
               | block ;

Then we use a separate rule for the top level of a script and inside a block:

declaration     classDecl
               | funDecl
               | varDecl
               | statement ;

The declaration rule contains the statements that declare names, and also includes statement so that all statements types are allowed. Since block itself is in statement, you can put declarations inside a control flow construct by nesting them inside a block.

In this chapter, we’ll only cover a couple of statements and one declaration:

statement       exprStmt
               | printStmt ;

declaration     varDecl
               | statement ;

Up to now, our VM considered a “program” to be a single expression since that’s all we could parse and compile. In a real Lox program, a script is a sequence of declarations. We’re ready to support that now:

  advance();                 
compiler.c
in compile()
replace 2 lines

while (!match(TOKEN_EOF)) { declaration(); }
  endCompiler();             
compiler.c, in compile(), replace 2 lines

We keep compiling declarations until we hit the end of the source file, using:

compiler.c
add after expression()
static void declaration() {
  statement();             
}                          
compiler.c, add after expression()

We’ll get to variable declarations later in the chapter, so, for now, this just forwards to statement():

compiler.c
add after declaration()
static void statement() {  
  if (match(TOKEN_PRINT)) {
    printStatement();      
  }                        
}                          
compiler.c, add after declaration()

Block statements can contain declarations and control flow statements contain other statements. That means these two functions will eventually be recursive. We may as well write out the forward declarations now:

static void expression();                 
compiler.c
add after expression()
static void statement();                  
static void declaration();                
static ParseRule* getRule(TokenType type);
compiler.c, add after expression()

21 . 1 . 1 Print statements

We have two statement types to support. Let’s start with print statements, which begin, naturally enough, with a print token. We detect that using this helper function:

compiler.c
add after consume()
static bool match(TokenType type) {
  if (!check(type)) return false;  
  advance();                       
  return true;                     
}                                  
compiler.c, add after consume()

You may recognize it from jlox. If the current token has the given type, we consume the token and return true. Otherwise we leave the token alone and return false. This helper function is implemented in terms of this other helper:

compiler.c
add after consume()
static bool check(TokenType type) {  
  return parser.current.type == type;
}                                    
compiler.c, add after consume()

The check() function returns true if the current token has the given type. It seems a little silly to wrap this in a function, but we’ll use it more later and I think short verb-named functions like this make the parser easier to read.

If we did match the print token, then we compile the rest of the statement using:

compiler.c
add after expression()
static void printStatement() {                        
  expression();                                       
  consume(TOKEN_SEMICOLON, "Expect ';' after value.");
  emitByte(OP_PRINT);                                 
}                                                     
compiler.c, add after expression()

A print statement evaluates an expression and prints the result, so we first parse and compile that expression. The grammar expects a semicolon after that, so we consume it. Finally, we emit a new instruction to print the result:

  OP_NEGATE,
chunk.h
in enum OpCode
  OP_PRINT, 
  OP_RETURN,
chunk.h, in enum OpCode

At runtime, we execute this instruction like so:

        break;            
vm.c
in run()

case OP_PRINT: { printValue(pop()); printf("\n"); break; }
      case OP_RETURN: {   
vm.c, in run()

At this point, we have already executed the code for the expression and the result value remains on top of the stack. Now we simply pop it and print it.

Note that we don’t push anything else after that. This is a key difference between expressions and statements in the VM. Every bytecode instruction has a stack effect that describes how the instruction modifies the stack. For example, OP_ADD pops two values and pushes one, leaving the stack one element smaller than before.

You can also sum the stack effects of a series of instructions. When you add the stack effects of any series of instructions compiled from an expression, it will total one. Each expression leaves one result value on the stack.

The bytecode for an entire statement has a total stack effect of zero. Since a statement produces no values, it ultimately leaves the stack unchanged, though it of course uses the stack while it’s doing its thing. This is important because when we get to control flow and looping, a program might execute a long series of statements. If each statement grew or shrank the stack, it might eventually overflow or underflow.

While we’re in the interpreter loop, we should delete a bit of code:

      case OP_RETURN: {     
vm.c
in run()
replace 2 lines
        // Exit interpreter.
        return INTERPRET_OK;
vm.c, in run(), replace 2 lines

When the VM only compiled and evaluated a single expression, we had some temporary code in OP_RETURN to output the value. Now that we have statements and print, we don’t need that anymore. We’re one step closer to the complete implementation of clox.

As usual, a new instruction needs support in the disassembler:

      return simpleInstruction("OP_NEGATE", offset);
debug.c
in disassembleInstruction()
    case OP_PRINT:                                  
      return simpleInstruction("OP_PRINT", offset); 
    case OP_RETURN:                                 
debug.c, in disassembleInstruction()

That’s our print statement. If you want, give it a whirl:

print 1 + 2;
print 3 * 4;

Exciting!

21 . 1 . 2 Expression statements

Wait until you see the next statement. If we don’t see a print keyword, then we must be looking at an expression statement:

    printStatement();     
compiler.c
in statement()
  } else {                
    expressionStatement();
  }                       
compiler.c, in statement()

It’s parsed like so:

compiler.c
add after expression()
static void expressionStatement() {                        
  expression();                                            
  consume(TOKEN_SEMICOLON, "Expect ';' after expression.");
  emitByte(OP_POP);                                        
}                                                          
compiler.c, add after expression()

An “expression statement” is simply an expression followed by a semicolon. They’re how you write an expression in a context where a statement is expected. Usually, it’s so that you can call a function or evaluate an assignment for its side effect, like:

brunch = "quiche";
eat(brunch);

Semantically, an expression statement evaluates the expression and discards the result. The compiler directly encodes that behavior. It emits the code for the expression, and then an OP_POP instruction:

  OP_FALSE,
chunk.h
in enum OpCode
  OP_POP,  
  OP_EQUAL,
chunk.h, in enum OpCode

As the name implies, that instruction pops the top value off the stack and forgets it:

      case OP_FALSE: push(BOOL_VAL(false)); break;
vm.c
in run()
      case OP_POP: pop(); break;                  

case OP_EQUAL: {
vm.c, in run()

We can disassemble it too:

      return simpleInstruction("OP_FALSE", offset);
debug.c
in disassembleInstruction()
    case OP_POP:                                   
      return simpleInstruction("OP_POP", offset);  
    case OP_EQUAL:                                 
debug.c, in disassembleInstruction()

Expression statements aren’t very useful yet since we can’t create any expressions that have side effects, but they’ll be essential when we add functions later. The majority of statements in real-world code in languages like C are expression statements.

21 . 1 . 3 Error synchronization

While we’re getting this initial work done in the compiler, we can tie off a loose end we left several chapters back. Like jlox, clox uses panic mode error recovery to minimize the number of cascaded compile errors that it reports. The compiler exits panic mode when it reaches a synchronization point. For Lox, we chose statement boundaries as that point. Now that we have statements, we can implement synchronization:

  statement();                        
compiler.c
in declaration()

if (parser.panicMode) synchronize();
}                                     
compiler.c, in declaration()

If we hit a compile error while parsing the previous statement, we will enter panic mode. In that case, after the statement, we start synchronizing:

compiler.c
add after printStatement()
static void synchronize() {                             
  parser.panicMode = false;

  while (parser.current.type != TOKEN_EOF) {            
    if (parser.previous.type == TOKEN_SEMICOLON) return;

    switch (parser.current.type) {                      
      case TOKEN_CLASS:                                 
      case TOKEN_FUN:                                   
      case TOKEN_VAR:                                   
      case TOKEN_FOR:                                   
      case TOKEN_IF:                                    
      case TOKEN_WHILE:                                 
      case TOKEN_PRINT:                                 
      case TOKEN_RETURN:                                
        return;                                         

      default:                                          
        // Do nothing.                                  
        ;                                               
    }                                                   

    advance();                                          
  }                                                     
}                                                       
compiler.c, add after printStatement()

We skip tokens indiscriminately until we reach something that looks like a statement boundary. We recognize the boundary by looking for a preceding token that can end a statement, like a semicolon. Or we’ll look for a subsequent token that begins a statement, usually one of the control flow or declaration keywords.

21 . 2 Variable Declarations

Merely being able to print doesn’t win your language any prizes at the programming language fair, so let’s move on to something a little more ambitious and get variables going. There are three operations we need to support:

We can’t do either of the last two until we have some variables, so we start with declarations:

static void declaration() {           
compiler.c
in declaration()
replace 1 line
  if (match(TOKEN_VAR)) {             
    varDeclaration();                 
  } else {                            
    statement();                      
  }                                   

if (parser.panicMode) synchronize();
compiler.c, in declaration(), replace 1 line

The placeholder parsing function we sketched out for the declaration grammar rule has an actual production now. If we match a var token, we jump to:

compiler.c
add after expression()
static void varDeclaration() {                                       
  uint8_t global = parseVariable("Expect variable name.");

  if (match(TOKEN_EQUAL)) {                                          
    expression();                                                    
  } else {                                                           
    emitByte(OP_NIL);                                                
  }                                                                  
  consume(TOKEN_SEMICOLON, "Expect ';' after variable declaration.");

  defineVariable(global);                                            
}                                                                    
compiler.c, add after expression()

The keyword is followed by the variable name. That’s compiled by parseVariable(), which we’ll get to in a second. Then we look for an = followed by an initializer expression. If the user doesn’t initialize the variable, the compiler implicitly initializes it to nil by emiting an OP_NIL instruction. Either way, we expect the statement to be terminated with a semicolon.

There are two new functions here for working with variables and identifiers. The first one is:

compiler.c
add after parsePrecedence()
static uint8_t parseVariable(const char* errorMessage) {
  consume(TOKEN_IDENTIFIER, errorMessage);              
  return identifierConstant(&parser.previous);          
}                                                       
compiler.c, add after parsePrecedence()

It requires the next token to be an identifier token, which it consumes and sends to:

compiler.c
add after parsePrecedence()
static uint8_t identifierConstant(Token* name) {                      
  return makeConstant(OBJ_VAL(copyString(name->start, name->length)));
}                                                                     
compiler.c, add after parsePrecedence()

This function takes the given token and adds its lexeme to the chunk’s constant table as a string. It then returns the index of that constant in the constant table.

Global variables are looked up by name at runtime. That means the VMthe bytecode interpreter loopneeds access to the name. A whole string is too big to stuff into the bytecode stream as an operand. Instead, we store the string in the constant table and the instruction then refers to the name by its index in the table.

This function returns that index all the way to varDeclaration() which later hands it over to:

compiler.c
add after parseVariable()
static void defineVariable(uint8_t global) {
  emitBytes(OP_DEFINE_GLOBAL, global);      
}                                           
compiler.c, add after parseVariable()

This outputs the bytecode instruction that defines the new variable and stores its initial value. The index of the variable’s name in the constant table is the instruction’s operand. As usual in a stack-based VM, we emit this instruction last. At runtime, we execute the code for the variable’s initializer first. That leaves the value on the stack. Then this instruction takes that and stores it away for later.

Over in the runtime, we begin with this new instruction:

  OP_POP,          
chunk.h
in enum OpCode
  OP_DEFINE_GLOBAL,
  OP_EQUAL,        
chunk.h, in enum OpCode

Thanks to our handy-dandy hash table, the implementation isn’t too hard:

      case OP_POP: pop(); break;             
vm.c
in run()

case OP_DEFINE_GLOBAL: { ObjString* name = READ_STRING(); tableSet(&vm.globals, name, peek(0)); pop(); break; }

case OP_EQUAL: {
vm.c, in run()

We get the name of the variable from the constant table. Then we take the value from top of the stack and store it in a hash table with that name as the key.

This code doesn’t check to see if the key is already in the table. Lox is pretty lax with global variables and lets you redefine them without error. That’s useful in a REPL session, so the VM supports that by simply overwriting the value if the key happens to already be in the hash table.

There’s another little helper macro:

#define READ_CONSTANT() (vm.chunk->constants.values[READ_BYTE()])
vm.c
in run()
#define READ_STRING() AS_STRING(READ_CONSTANT())                 

#define BINARY_OP(valueType, op) \
vm.c, in run()

It reads a one-byte operand from the bytecode chunk. It treats that as an index into the chunk’s constant table and returns the string at that index. It doesn’t check that the value is a stringit just indiscriminately casts it. That’s safe because the compiler never emits an instruction that refers to a non-string constant.

Because we care about lexical hygiene, we also undefine this macro at the end of the interpret function:

#undef READ_CONSTANT
vm.c
in run()
#undef READ_STRING  
#undef BINARY_OP    
vm.c, in run()

I keep saying “the hash table”, but we don’t actually have one yet. We need a place to store these globals. Since we want them to persist as long as clox is running, we store them right in the VM:

  Value* stackTop;
vm.h
in struct VM
  Table globals;  
  Table strings;  
vm.h, in struct VM

As we did with the string table, we need to initialize the hash table to a valid state when the VM boots up:

  vm.objects = NULL;     
vm.c
in initVM()

initTable(&vm.globals);
  initTable(&vm.strings);
vm.c, in initVM()

And we should tear it down when we exit:

void freeVM() {          
vm.c
in freeVM()
  freeTable(&vm.globals);
  freeTable(&vm.strings);
vm.c, in freeVM()

As usual, we want to be able to disassemble the new instruction too:

      return simpleInstruction("OP_POP", offset);                   
debug.c
in disassembleInstruction()
    case OP_DEFINE_GLOBAL:                                          
      return constantInstruction("OP_DEFINE_GLOBAL", chunk, offset);
    case OP_EQUAL:                                                  
debug.c, in disassembleInstruction()

And with that, we can define global variables. Not that users can tell that they’ve done so, because they can’t actually use them. So let’s fix that next.

21 . 3 Reading Variables

As in every programming language ever, we access a variable’s value using its name. We hook up identifier tokens to the expression parser here:

  { NULL,     binary,  PREC_COMPARISON }, // TOKEN_LESS_EQUAL
compiler.c
replace 1 line
  { variable, NULL,    PREC_NONE },       // TOKEN_IDENTIFIER
  { string,   NULL,    PREC_NONE },       // TOKEN_STRING    
compiler.c, replace 1 line

That calls:

compiler.c
add after string()
static void variable() {         
  namedVariable(parser.previous);
}                                
compiler.c, add after string()

Like with declarations, there are a couple of tiny helper functions that seem pointless now but will become more useful in later chapters. I promise.

compiler.c
add after string()
static void namedVariable(Token name) {   
  uint8_t arg = identifierConstant(&name);

  emitBytes(OP_GET_GLOBAL, arg);          
}                                         
compiler.c, add after string()

This calls the same identifierConstant() function from before to take the given identifier token and add its lexeme to the chunk’s constant table as a string. All that remains is to emit an instruction that loads the global variable with that name. Here’s the instruction:

  OP_POP,          
chunk.h
in enum OpCode
  OP_GET_GLOBAL,   
  OP_DEFINE_GLOBAL,
chunk.h, in enum OpCode

Over in the bytecode interpreter, the implementation reflects OP_DEFINE_GLOBAL:

      case OP_POP: pop(); break;                                
vm.c
in run()

case OP_GET_GLOBAL: { ObjString* name = READ_STRING(); Value value; if (!tableGet(&vm.globals, name, &value)) { runtimeError("Undefined variable '%s'.", name->chars); return INTERPRET_RUNTIME_ERROR; } push(value); break; }

case OP_DEFINE_GLOBAL: {
vm.c, in run()

We pull the constant table index from the instruction’s operand and get the variable name. Then we use that as a key to look up the variable’s value in the globals hash table.

If the key isn’t present in the hash table, it means that global variable has never been defined. That’s a runtime error in Lox, so we report it and exit the interpreter loop if that happens. Otherwise, we take the value and push it onto the stack.

      return simpleInstruction("OP_POP", offset);                
debug.c
in disassembleInstruction()
    case OP_GET_GLOBAL:                                          
      return constantInstruction("OP_GET_GLOBAL", chunk, offset);
    case OP_DEFINE_GLOBAL:                                       
debug.c, in disassembleInstruction()

A little bit of disassembling, and we’re done. Our interpreter is up to running code like this:

var beverage = "cafe au lait";
var breakfast = "beignets with " + beverage;
print breakfast;

There’s only one operation left…

21 . 4 Assignment

Throughout this book, I’ve tried to keep you on a fairly safe and easy path. I don’t avoid hard problems, but I try to not make the solutions more complex than they need to be. Alas, other design choices in our bytecode compiler make assignment annoying to implement.

Our bytecode VM uses a single-pass compiler. It parses and generates bytecode on the fly without any intermediate AST. As soon as it recognizes a piece of syntax, it emits code for it. Assignment doesn’t naturally fit that. Consider:

menu.brunch(sunday).beverage = "mimosa";

In this code, the parser doesn’t realize menu.brunch(sunday).beverage is the target of an assignment and not a normal expression until it reaches =, many tokens after the first menu. By then, the compiler has already emitted bytecode for the whole thing.

The problem is not as dire as it might seem, though. Look at how the parser sees that example:

The 'menu.brunch(sunday).beverage =

Even though the .beverage part must be compiled as a setter, everything to the left of it is an expression, with the normal expression semantics. The menu.brunch(sunday) part can be compiled and executed as usual.

Fortunately for us, the only semantic differences on the left side of an assignment appear at the very right-most end of the tokens, immediately preceding the =. Even though the receiver of a setter may be an arbitrarily long expression, the setter itself is only a . followed by an identifier, which is right before the =. We don’t need much lookahead to realize .beverage should be compiled as a setter and not a getter.

Variables are even easier since they are just a single bare identifier before an =. The idea then is that right before compiling an expression that can also be used as an assignment target, we look for a subsequent = token. If we see one, we compile it as an assignment or setter instead of a variable access or getter.

We don’t have setters to worry about yet, so all we need to handle is variables:

  uint8_t arg = identifierConstant(&name); 
compiler.c
in namedVariable()
replace 2 lines

if (match(TOKEN_EQUAL)) { expression(); emitBytes(OP_SET_GLOBAL, (uint8_t)arg); } else { emitBytes(OP_GET_GLOBAL, (uint8_t)arg); }
}                                          
compiler.c, in namedVariable(), replace 2 lines

In the parse function for identifier expressions, we look for a following equals sign. If we find one, instead of emitting code for a variable access, we compile the assigned value and then emit an assignment instruction.

That’s the last instruction we need to add in this chapter:

  OP_DEFINE_GLOBAL,
chunk.h
in enum OpCode
  OP_SET_GLOBAL,   
  OP_EQUAL,        
chunk.h, in enum OpCode

As you’d expect, its runtime behavior is similar to defining a new variable:

      }                                                         
vm.c
in run()

case OP_SET_GLOBAL: { ObjString* name = READ_STRING(); if (tableSet(&vm.globals, name, peek(0))) { tableDelete(&vm.globals, name); runtimeError("Undefined variable '%s'.", name->chars); return INTERPRET_RUNTIME_ERROR; } break; }

case OP_EQUAL: {
vm.c, in run()

The main difference is what happens when the key doesn’t already exist in the globals hash table. If the variable hasn’t been defined yet, it’s a runtime error to try to assign to it. Lox doesn’t do implicit variable declaration.

The other difference is that setting a variable doesn’t pop the value off the stack. Remember, assignment is an expression, so it needs to leave that value there in case the assignment is nested inside some larger expression.

Add a dash of disassembly:

      return constantInstruction("OP_DEFINE_GLOBAL", chunk, offset);
debug.c
in disassembleInstruction()
    case OP_SET_GLOBAL:                                             
      return constantInstruction("OP_SET_GLOBAL", chunk, offset);   
    case OP_EQUAL:                                                  
debug.c, in disassembleInstruction()

So we’re done, right? Well… not quite. We’ve made a mistake! Take a gander at:

a * b = c + d;

According to Lox’s grammar, = has the lowest precedence, so this should be parsed roughly like:

The expected parse, like '(a * b) = (c + d)'.

Obviously, a * b isn’t a valid assignment target, so this should be a syntax error. But here’s what our parser does:

  1. First, parsePrecedence() parses a using the variable() prefix parser.
  2. After that, it enters the infix parsing loop.
  3. It reaches the * and calls binary().
  4. That recursively calls parsePrecedence() to parse the right-hand operand.
  5. That calls variable() again for parsing b.
  6. Inside that call to variable(), it looks for a trailing =. It sees one and thus parses the rest of the line as an assignment.

In other words, the parser sees the above code like:

The actual parse, like 'a * (b = c + d)'.

We’ve messed up the precedence handling because variable() doesn’t take into account the precedence of the surrounding expression that contains the variable. If the variable happens to be the right-hand side of an infix operator, or the operand of a unary operator, then that containing expression is too high precedence to permit the =.

To fix this, variable() should only look for and consume the = if it’s in the context of a low precedence expression. The code that knows the current precedence is, logically enough, parsePrecedence(). The variable() function doesn’t need to know the actual level. It just cares that the precedence is low enough to allow assignment, so we pass that fact in as a Boolean:

    error("Expect expression.");                                  
    return;                                                       
  }                                                               

compiler.c
in parsePrecedence()
replace 1 line
  bool canAssign = precedence <= PREC_ASSIGNMENT;                 
  prefixRule(canAssign);                                          

while (precedence <= getRule(parser.current.type)->precedence) {
compiler.c, in parsePrecedence(), replace 1 line

Since assignment is the lowest precedence expression, the only time we allow an assignment is when parsing an assignment expression or top-level expression like in an expression statement. That flag makes its way to:

compiler.c
function variable()
replace 3 lines
static void variable(bool canAssign) {      
  namedVariable(parser.previous, canAssign);
}                                           
compiler.c, function variable(), replace 3 lines

Which accepts the parameter:

compiler.c
function namedVariable()
replace 1 line
static void namedVariable(Token name, bool canAssign) {
  uint8_t arg = identifierConstant(&name);             
compiler.c, function namedVariable(), replace 1 line

And then uses it:

  uint8_t arg = identifierConstant(&name);

compiler.c
in namedVariable()
replace 1 line
  if (canAssign && match(TOKEN_EQUAL)) {  
    expression();                         
compiler.c, in namedVariable(), replace 1 line

That’s a lot of plumbing to get literally one bit of data to the right place in the compiler, but arrived it has. If the variable is nested inside some expression with higher precedence, canAssign will be false and this will ignore the = even if there is one there. Then this returns and eventually makes its way back to parsePrecedence().

Then what? What does the compiler do with our broken example from before? Right now, variable() won’t consume the = so that will be the current token. It returns back to parsePrecedence(). variable() is a prefix parser, so then it tries to enter the infix parsing loop. There is no parsing function associated with =, so it skips that loop.

Then parsePrecedence() silently returns back to the caller. That also isn’t right. If the = doesn’t get consumed as part of the expression, nothing else is going to consume it. It’s an error and we should report it:

  while (precedence <= getRule(parser.current.type)->precedence) {
    advance();                                                    
    ParseFn infixRule = getRule(parser.previous.type)->infix;     
    infixRule();                                                  
  }                                                               
compiler.c
in parsePrecedence()

if (canAssign && match(TOKEN_EQUAL)) { error("Invalid assignment target."); expression(); }
}                                                                 
compiler.c, in parsePrecedence()

With that, the previous bad program correctly gets an error at compile time. OK, now are we done? Still not quite. See, we’re passing an argument to one of the parse functions. But those functions are stored in a table of function pointers, so all of the parse functions need to have the same type. Even though most parse functions don’t support being used as an assignment targetsetters are the only other oneour friendly C compiler requires them all to accept the parameter.

So we’re going to finish off this chapter with some grunt work. First, let’s go ahead and pass the flag to the infix parse functions:

    ParseFn infixRule = getRule(parser.previous.type)->infix;
compiler.c
in parsePrecedence()
replace 1 line
    infixRule(canAssign);                                    
  }                                                          
compiler.c, in parsePrecedence(), replace 1 line

We’ll need that for setters eventually. Then we’ll fix the typedef for the function type:

} Precedence;                           

compiler.c
add after enum Precedence
replace 1 line
typedef void (*ParseFn)(bool canAssign);

typedef struct {
compiler.c, add after enum Precedence, replace 1 line

And some completely tedious code to accept this parameter in all of our existing parse functions:

compiler.c
function binary()
replace 1 line
static void binary(bool canAssign) {
  // Remember the operator.         
compiler.c, function binary(), replace 1 line

And:

compiler.c
function literal()
replace 1 line
static void literal(bool canAssign) {
  switch (parser.previous.type) {    
compiler.c, function literal(), replace 1 line

And:

compiler.c
function grouping()
replace 1 line
static void grouping(bool canAssign) {
  expression();                       
compiler.c, function grouping(), replace 1 line

And:

compiler.c
function number()
replace 1 line
static void number(bool canAssign) {                 
  double value = strtod(parser.previous.start, NULL);
compiler.c, function number(), replace 1 line

And:

compiler.c
function string()
replace 1 line
static void string(bool canAssign) {                        
  emitConstant(OBJ_VAL(copyString(parser.previous.start + 1,
compiler.c, function string(), replace 1 line

And, finally:

compiler.c
function unary()
replace 1 line
static void unary(bool canAssign) {             
  TokenType operatorType = parser.previous.type;
compiler.c, function unary(), replace 1 line

Phew! We’re back to a C program we can compile. Fire it up and now you can run:

var breakfast = "beignets";
var beverage = "cafe au lait";
breakfast = "beignets with " + beverage;

print breakfast;

It’s starting to look like real code for an actual language!

Challenges

  1. The compiler adds a global variable’s name to the constant table as a string every time an identifier is encountered. It creates a new constant each time, even if that variable name is already in a previous slot in the constant table. That’s wasteful in cases where the same variable is referenced multiple times by the same function. That in turn increases the odds of filling up the constant table and running out of slots, since we only allow 256 constants in a single chunk.

    Optimize this. How does your optimization affect the performance of the compiler compared to the runtime? Is this the right trade-off?

  2. Looking up a global variable by name in a hash table each time it is used is pretty slow, even with a good hash table. Can you come up with a more efficient way to store and access global variables without changing the semantics?

  3. When running in the REPL, a user might write a function that references an unknown global variable. Then, in the next line, they declare the variable. Lox should handle this gracefully by not reporting an “unknown variable” compile error when the function is first defined.

    But when a user runs a Lox script, the compiler has access to the full text of the entire program before any code is run. Consider this program:

    fun useVar() {
      print oops;
    }
    
    var ooops = "too many o's!";
    

    Here, we can tell statically that oops will not be defined because there is no declaration of that global anywhere in the program. Note that useVar() is never called either, so even though the variable isn’t defined, no runtime error will occur because it’s never used either.

    We could report mistakes like this as compile errors, at least when running from a script. Do you think we should? Justify your answer. What do other scripting languages you know do?