Skip to main content

Compiler

This page contains a high level breakdown of the different steps needed to compile Mew code.

1. AST Parsing

The parsing step iterates through all source files, and builds a syntax tree for each of them.
The syntax tree represents the code as it was written, maintaining the trivia such as white space, comments etc.

Each node in the AST has a reference to both it's parent and children.

Apart from being the basis for HIR generation, the AST is also used to interact with the source code programatically, i.e. from the LSP server.

2. HIR generation

HIR, short for High-level Intermediate Representation, represents a bound tree, where all types are known.

The HIR references resolved symbols for the different parts of Mew (namespaces, types, functions, parameters, variables etc). For example, two code block that calls a function, will have the same symbol reference to that function.

  1. Build symbol table
    1. Namespaces
    2. Types
    3. Free functions
    4. Type members
  2. Binding
    1. Types
    2. Free functions
    3. Top level statements
info

HIR might contain errors, represented as error symbols.

3. MIR generation

MIR, short for Medium-level Intermediate Representation, is a lowered HIR, without constructs such as while/loop/if.

  • All higher level constructs such as loops and conditions been lowered into labels and branches.
  • Control flow analysis and some optimizations are done here as well.
info

MIR might contain errors, represented as error symbols.

4. LIR generation

LIR, short for Low-level Intermediate Representation, is a lowered MIR, resembling the final byte code that will be emitted.

warning

LIR MUST NOT contain any errors.

5. Emitting

Finally, the LIR is transpiled into Mew bytecode which can be interpreted by the Mew interpreter.

note

There are plans in the future to create a LLVM IR emitter, but until the language settles a bit, the undertaking is a bit too much.