overwatering.org

blog

about

Assignment is interesting. It’s one of the few places where the syntax tree does map well to the execution, but you can’t get to where you want to go through re-writing syntactic sugar.

Indu will be a language with syntactic sugar. Sugar is easy to read. It might be verbose, but tooling will help with that. Indu is also a simple, uniform language: everything is an expression, there are only functions and objects. In a simple language syntactic sugar can be dealt with by pretending during the parse that a recognised construct should be parsed as something else. It’s sort of like a half-way step from a context-free to a context-sensitive grammar.

Assignment can’t be handled like this.

In the expression set x to x + 1, the sub-expression x is recognised twice. Once as part of the expression x + 1 and then as the target of the assignment. In the first case the x should be interpreted as a placeholder for the value stored under the name, and the semantics of the language should automatically perform the look-up, substituting the retrieved value into the expression.

In the second case, the x should be interpreted as a placeholder for a slot in which a value can be stored. The only way to distinguish these two cases is through the context in which they appear. More specifically, this second case only appears as the target of an assignment expression.

In the C programming language there is a special name for those things that can appears as the target of an assignment: l-values. Because they are things that are allowed to appear on the left-hand side of an assignment. Not the most helpful definition when you encounter not an l-value from your compiler.

In the case of the Indu VM, these l-value like items are implemented as an ‘address’ construct. When generating byte-code for an assignment, the target is treated specially, and can only be of a small set of forms. Each of these forms then results in an addr op-code of some sort that constructs a reference to the slot being named, and pushes that slot onto the stack.

I’ve debated whether that extra step was necessary. If the compilation of the assignment expression needs to examine the target construct to generate the right addr op-code, then couldn’t that code be moved to the assign op-code? That would be cleaner: the construct generated by the addr op-code isn’t used anywhere else. But, it would also result in more complex op-codes and I prefer to make the compiler complex than the VM. Also, the addr may become useful in the future.

I suspect that another way to formalise the difference between the contexts is to use a type system. It feels like the distinction between contexts is the sort of thing that could be captured in a type system. Indu tags values with types, but not slots — it’s a pretty dynamic language.

comments powered by Disqus