A continuation of Stephen Diehl's Write You a Haskell
Like any large project, a compiler needs to move around lots of data. Some of this data is in the form of abstract syntax, which we constructed last chapter. But there is other data too. There is state, which needs to be kept inside of each pass as the compiler works, and there are warnings and errors that need to be handed around somehow.
The modules in this chapter can be done in almost any order, and some of them are optional. Here’s what we will do in this chapter:
Uniques, a way to “uniquely identify” different entities during compilation. The way we will do this is not the most efficient, but it’s fairly easy to work with. Some form of
Unique is required, though it doesn’t have to be the same as what is covered here.
UniqueMap, a mapping from uniques to some other data type. This gives us very fast lookups in maps of entities. If your
Unique type is just
Int (or a
Int), then use
Data.IntMap as your backing structure instead of what is covered here. Abstracting this is useful in case a later refactor changes the representation of
Bag, a set that can contain duplicates. These are also called
MultiSets, call yours whatever you like. This is a basic type that isn’t really worth writing a section about - you can use GHC’s implementation (be sure to include the copyright notice), make a dependency on
Data.MultiSet from the
multiset package, or just use lists.
CDoc, a wrapper on
Doc from the
prettyprinter package. This will let us easily construct well-formatted messages, as well as change those messages based on compilation flags. This module is optional, especially if you don’t plan on producing good error messages. Pretty printers for our data structures will use
CDoc, but you can just use
Doc instead. The
C is for
Context. (GHC calls it
SDoc, but no one in the IRC channel seems to know why)
Messages. This module contains all of the backing implementation of error and warning messages. If you don’t plan on producing good error messages or any warnings, you can pretty much skip this module.
Panic. This module contains functions for crashing the compiler in various ways, with some nice error messages. We define
panic is for invariant violations, and
sorry for unimplemented features. This module is technically optional, but it’s very small.
FastString, a wrapper on
Data.Text.Lazy that caches each
FastString as it is created, and returns the existing one instead of creating duplicates. Most importantly, this lets us attach
FastStrings themselves, since each one contains a unique string. This module is technically optional, but I think it would be more work to skip it.
IOEnv, a monad which forms the basis of the other monads we will use in the frontend. This module is optional - you could use
IO instead, or
State and some way of handling failure. Since we’re in IO anyway, I’ll demonstrate using IO to get statefulness (like
StateT) and the ability to fail (like
If you just want to build a smaller compiler that assumes its input is correct code, you should skip
Messages, and ignore them in future chapters. Replace throwing of errors with
panic and don’t produce warnings.
CDoc is also helpful for suppressing certain types of debugging output. If you skip it, most debugging will probably happen via dumps and ad-hoc traces.