Writing a MiniC-to-MSIL compiler in F# - Part 0 - Introduction

13 April 2014

You can find the code for this series in the Mini-C GitHub repository.

Introduction

Last year, I learnt F# by using it to write a compiler. (I also read Programming F# 3.0 by Chris Smith, a book I recommend.) Writing a compiler is, after all, the hello world program for F#. After finishing the code, I decided to write this series of blog posts about it - mainly for my own benefit, because it’s difficult to truly understand something until you’ve explained it to somebody else. (Actually I finished the code a year ago. What can I say? I procrastinated and did some other things in the meantime.)

This isn’t going to be a tutorial on F# - although if you don’t know F#, you should still be able to follow along. F#’s syntax is initially confusing (for a C# programmer like me), then elegant, and then you don’t want to go back to C#… but that’s another story. This isn’t even a tutorial on compilers. I’m going to explain how this particular compiler works - with the hope that it’s of interest to others.

It’s worth mentioning that there are a number of other “write a compiler in F#” articles out there. Here’s a few:

Lisp Compiler in F#
A Java to x86 compiler written in F#
Let’s Build A Compiler… in F#!
Writing a Compiler with F#
and, of course, the F# compiler itself is open source.

The language

I wanted to keep the compiler as simple as possible. To that end, I googled around and landed on a simple grammar based on a subset of the C programming language called Mini-C. This subset includes most of the basics of C, but is missing (among other things) pointers, structs and macros. Since this compiler is never actually going to be used by anybody, that doesn’t really matter - the subset is complicated enough to make for an interesting learning exercise.

Here’s a sample Mini-C program. As a subset of C, it unsurprisingly looks exactly like C:

int fib(int n) {
  if (n == 0) 
    return 0;
  if (n == 1)
    return 1;
  return fib(n - 1) + fib(n - 2);
}

int main(void) {
  return fib(10);
}

I should say up-front that I possess no expert knowledge about compilers. I vaguely remember attending an undergraduate compilers course, but not much of it stuck. I have read a few books on compilers since then, because I love languages and language implementations. But I’m definitely an enthusiast, rather than an expert. If you spot any mistakes, please let me know!

The compiler

Compilers have the nice property that they are easy to split into logical parts. That’s how my compiler is implemented, and it’s also how I’m going to structure this series of blog posts. I’ll update the links as I write each part:

Part 0 - Introduction - this post
Part 1 - Defining the abstract syntax tree (AST)
Part 2 - Lexing and parsing
Part 3 - Semantic analysis
Part 4 - Building the intermediate representation (IR)
Part 5 - Code generation
Part 6 - Conclusion

What we’ll end up with is a compiler that accepts [a subset of] C source code, parses into an AST, performs some semantic analysis on the AST, builds an intermediate representation, and finally writes MSIL out to a .NET executable.

As a little preview, here is a screenshot of a tool I wrote to accompany the compiler. When you edit the source code in the left panel, the AST and IL panels update in real-time.

Mini-C GUI

Stay tuned for Part 1!