A Type System for Scheme
Posted by Peter on March 18, 2006
Note: the type system proposed in this post is bad for a number of reasons, but I didn’t realize this until I had done some more research. Seriously, just go read parts 2, 3, and 4.
I know that Common Lisp already has a type system, but honestly it is rather inelegant. On the other hand it is at least somewhat superior to Scheme’s type system because, well, Scheme doesn’t have a formal type system. The advantages to having a type system are twofold. One advantage is that the compiler can perform some additional optimizations / reduce the amount of runtime checking. More importantly however is that some errors can be caught at compile time instead of run-time.
The first thing we consider is how to add type information to a function’s argument list. I would suggest a syntax like the following:
(lambda (arg1:type arg2:type &rest|. argR:type) …). Obviously the compiler should infer the return type of the function by inspecting it. Scheme already has a way of overloading functions for different lengths of argument lists, so it shouldn’t be too hard to extend this to overloading based on the type of the arguments as well. So far this sounds very well and good, functions being declared as
(lambda (a:number b:bool) …) aren’t that much harder to write than normal. The real difficulty lies in creating a system that can give functions a type as well. Before we cover functions however let us add two additions to the type system. One is the idea of generic types, which can be used as follows:
type-name<type+>. For example the following:
vector<number> is a vector whose elements are numbers. This would also be a valid type:
mytype<vector<bool> number>. The other addition we need to make is the
any type. As the name implies this implies that the variable under consideration could be of any type, and thus the usual run-time checks would need to be in place. The addition of the
any type seems questionable in my mind, later I will discuss if we are able to leave it out of the type system.
Using our syntax so far we can define the type of a function as
func<tuple<param-type param-type …> tuple<return-type return-type …>>. This declaration left open room for multiple return types since some functions may wish to return multiple values (see my earlier post). If you wish to restrict functions to only return one value than the type of a function could be written as:
func<tuple<param-type param-type …> return-type>. Now before you begin criticizing the system I am proposing, I know that this looks like an overly baroque construction. However with one quick addition to the system you would almost never have to type out that long type description for a function. That addition is the
define-type statement. With it you could write
(define-type numerical-op func<tuple<number number> tuple<number>>). Then if you wanted to create a function that took a function of this type as a parameter you would write:
(lambda (arg1:numerical-op arg2:number) …), which makes using this type system manageable again. The ability to define types in this way has another advantage, for example consider an addition to define that took a new, optional first parameter, the type of the variable (normally define would infer the type based on what was being assigned to the variable). Now when we are defining a function we may not even need to type in its parameter type, for example if a function is defined as
(define numerical-op addtwice (lambda (a b) (+ a b b))) the compiler should be able to infer the type of the parameters from the type supplied to the define.
Unfortunately this system still needs a bit of work. For example consider how you would define map:
(define map (lambda (operator:func<tuple<??> tuple<??>> lst:list<??>) …)). The problem is that we can’t put something definite in for the values of ?? without restricting map in a way that will make it unusable. What we need to fix this problem is the
free type. This type is used much like a generic, but instead of being given other types any identifier will do, so for example
free<blah> are valid. Free is not like the
any type. What it signifies is that arguments must share certain type properties, and that the actual type of
free will be inferable at some point. For example map would be defined as:
(define map (lambda (operator:func<tuple<free<list-type>> tuple<free<return-type>>> lst:list<free<list-type>>) …)). You can see that at compile time applications of map which don’t let the
free types resolve properly (for example the function being passed must take as an argument the same type of thing that is in the list) will be rejected. Likewise when the call to map is compiled the return type will also be able to be inferred (a list of type
free<return-type>. Finally let us assume that the following type had been defined:
(define-type example func<tuple<number free<A> free<B>> tuple<free<B>>>) then you could define a function as follows:
(define example efunc (lambda (param1:bool param2 param3) …). Here you can see how it would be possible for the compiler to eliminate the need for some type declarations be inferring the type of a
free from either the returned values or from another type given in the parameters list.
Finally let us return to the
any type. The only reason I mentioned it is because it seems like this is what
car, for example must return. We can’t simply force
car to return an element of the same type as the list’s generic, because sometimes we want to nest lists within other lists, even if they ultimately all contain the same type of value. If we are willing to sacrifice performance, and some code clarity, we can do away with
any by making the value
car returns really be a wrapper object, which we can test regarding which type it contains, and then return either the list it contains or the value with two different routines. I leave it as an open question which solution is better.
Obviously even I think this type system might be a bit baroque or inconvenient sometimes. One way to resolve this is simply to allow the
any type to be assumed wherever the type could not be otherwise , but then again allowing this means that the type system would be under-used, and thus wouldn’t provide the extra compile-time checks that were the whole point of introducing it. As usual feedback is appreciated.