Posted by Peter on May 4, 2006
Today’s post is meant as a warning to others: don’t make the same foolish implementation choices I did.
My original approach to the type system was the following: to create a type description for each type that was abstract, i.e. could contain self-references and unbound variables, so that type inference could be usefully done with it. And it will, when it is finished, serve this purpose admirably (at least I don’t see any major problems with it at the moment). The problem came when I assumed that this same system could be used to tag the objects generated at run time. Since it can code for any type, no matter how abstract, I figured that it would be simpler to simply label the type of run time objects in the same way. What I didn’t take into consideration was how an object at run-time is generated. For example consider a simple cons cell. In general we can describe the type of the cons function as follows
(-> (a b) ((pair-of a b))), meaning that the type of it’s result is dependant on the type of its parameters. However when we generate the result of the cons operation we cannot label as a type containing two free variables, because eventually we might need to provide type checks on the result (for type safety), and a type of unbound gives us no useful information. What is needed then (or so I thought), was to look at the type of the parameters and then construct a new type, at run-time, using this information. This is only sensible as long as you don’t examine it that closely. For example consider adding a new element to a list already containing 100 members. The type of this new element would then contain type information for the 100 existing descending elements embedded in it. Clearly this is an unreasonable amount of overhead.
The correct solution is relatively simple, make the abstract types and the type encoding at run-time different. The run-time information simply needs to provide an indication how to obtain information for the type of related objects at runtime, then when type checks need to be made (rarely of course due to type inference optimizations) we can get all the information we need without duplicating it unnecessarily. Of course there still needs to be some glue between the purely abstract type system and the practical type system, for example when dispatching by type at runtime, but that is simply a detail.
Hopefully I’ll have the type system working soon, so that I can move onto the parser of my little project.