Writing A Virtual Machine

For my side project, Pandita, I’ve started writing a virtual machine implementation of the core language, Indu. I’ve been doing this as a hobby in my spare time, roughly since the beginning of the year. I’ve made a bit of progress, and I’m now going to try to write about it as I go.

The first question is why a virtual machine? Indu has been an experiment in growing a language. Whenever I’ve implemented a language or runtime before, I was either solving a very constrained problem, or I had a small core to handle, or I was writing a language for developers. Typically, I could pretty quickly get to some simple definition of syntax and semantics, write a compiler and then write the interpreter.

Indu is supposed to be different. Indu is meant to be a complete language. Indu is meant to be for non-programmers. I wanted to experiment with syntax and semantics. When I started out I had a pretty clear idea of what the syntax should be modeled on. I was unclear on the semantics of that syntax. I also knew that as it expanded into a complete language the syntax and semantics would evolve.

The first implementation of the parser used parsatron, a parser combinator library. This approach made it easy to assemble fragments of parsing code together. But, the code became increasingly complicated, with lots of special cases and conflicts between different parts of the syntax. Once I had a firmer sense of the syntax, I switched to instaparse. I can’t say enough good things about that library. I’m particularly happy that the EBNF that defines the Indu grammar is now also the source code for the parser.

Beyond the parser, I made a lot of the sorts of mistakes that I’ve complained about DSL authors making over the years. The abstract syntax tree was not a pure representation of the parse. The evaluation happened directly off the abstract syntax tree, not off an intermediate representation. The symbol table was not well defined. Scoping rules were not clear.

This was compounded by two implementations of the language. An interpreter in Clojure to execute Indu programs on the server, and a compiler that produced JavaScript to run in the browser, with the help of a small runtime. All semantics had to be implemented twice, but in slightly different ways due to the differing execution models of compilers and interpreters.

But, after all this I had a sense of what the semantics for Indu should be. There are no classes, there are just objects. Attributes of objects are triggers for data flow through your programs. Functions are first class. Objects only compose from other objects.

And with these lessons I could see a much smaller language trying to get out. This smaller language would be the intermediate representation that I needed between the abstract syntax tree, the interpreter and the compiler. Once I started to read about implementing this, I realised pretty quickly that this would be easiest to write as a simple virtual machine. As well as forcing me to define and implement a byte-code, this would also provide the opportunity to define a memory and concurrency model. A defined virtual machine would also be easier to implement twice.

I’ve since found Iain Craig’s Virtual Machines to be very helpful in diving in. I’m hoping to augment that with other texts. Broader interest in technologies like smart contracts and WebAssembly mean there is other writing to draw on.

Next, I’ll try to write about some of the implementation decisions I’ve made so far. Specifically around byte-codes and the memory model. I want to get to a point where I’m able to write here as I’m trying to implement parts of the machine. For this series I’m enabling comments, in the hope that I can find others who think virtual machines are interesting to implement.

overwatering.org

blog

about

Writing A Virtual Machine

gga

#2017-09-20