A Type System For Scheme, Part 4
Posted by Peter on March 28, 2006
After researching type systems, and smacking myself for my foolish first attempt, I have come up with a second proposal for a type system that could be added to Scheme, in this case mostly transparently. This system is constructed that type inference will catch compile time type errors as well as result in performance improvements. The constructs described herein are to make the life of the programmer easier, and to allow for the creation of custom types / object systems.
The simple built in types are as follows:
Complex types also exist. Complex types take one or more types as arguments, which may be either simple types or complex types which also have their arguments specified, for example
pair-of could be used in the following way:
(pair-of number string) or as
(pair-of (list-of number) string). The types
tand are special in that they both take two or more arguments and that they have special implication for type inference. For example if in the body of a function a variable is used both as a number and as a symbol then it must be of type
(tand number symbol). However because the compiler can rule out certain conjunctions as possible this will be rejected as an error. Likewise if a function returns a number on one branch and a symbol on another its return type will be
(tor number symbol). This however is not a valid result to pass to a function that expects a value of type symbol as a parameter (although
(tand symbol number) would be).
The following complex types are built in:
More on type inference:
As you may have already gathered the program performs type inference wherever possible. To make type inference work perfectly a few enhancements are needed. One is that a value should be able to be assigned the special value
#<undefined> when it is first declared, either in a define or a let statement. This tells the compiler that its type should be computed from the first time it is
set! and not from the declaration. Also there needs to be a type conditional in the form:
(tcond variable (type1 statements) (type2 statements) … [(else statements)]). If the else clause is omitted then the compiler will infer that the variable is of type
(tor type1 type2 …), if there is an else clause it will infer that it is of type
any. Within each clause the variable is treated as if it is the tested type. For example
(tcond x (number (+ x 2)) (boolean (not x))) will cause the compiler to infer that x is
(tor number boolean), that the return type of this statement is
(tor number boolean). Despite these inferences the first clause will compile as if x is of type number and the second as if x is of type boolean. Just as in a standard cond only one clause will be invoked. This tcond also allows us to do away with built in type predicates. For example number? could be written as
(define (number? x) (tcond x (number #t) (else #f))). Finally, we can “force” the result of an evaluation with the special form
(force type expression). The return type of a force statement is the conjunction (
tand) of the given type and the return type of the expression.
Type of functions:
The type of a function is constructed with the
-> type, but typedscm does a much better job of explaining it than I could, so just go look at their site. Although I might be tempted to make a few improvements to that part of their system it is simply more efficient to let the experts describe the complex job of typing a function.
Types can be declared as follows:
(define-type type-name known-type)
A type declaration in this way is often used as an abbreviation for a complex type such as
(define-type vec-list (list-of (vector-of number)))
A second way to define types is:
(define-type (type-name related-types …) known-type)
These type declarations can also be self referential. For example if you wanted to define a generic binary tree type you would do so as follows
(define-type (binary-tree-of X) (pair-of (tor X (binary-tree-of X)) (tor X (binary-tree-of X)))).
Finally if you must you can “extend” an old type with the following:
(define-extended-type (type-name related-types …) extended-types …)
If tested, for example with a
tcond, or when the compiler is determining if the arguments to a function call are of a valid type, the extended type can be considered as one of the types it is extended from, however the types it is extended from will not be considered the derived type for the purposes of these computations.
Although I haven’t thought of a clean syntax for it yet obviously imported functions from compiled code (for example a C library) will need to have their type declared at the time they are imported for the purpose of type inference.
One final issue to address is how to create generic functions, and thus by extension generic classes. Obviously generics must be implemented as a macro (since types are not first class objects). One way to define a generic might be
(define-generic name (unbound-types …) body). For example a function might be written as
(define-generic my-function (A B) (lambda (x) (tcond x (A …) (B …))))). When you wanted to call such a function you would write
((my-function number boolean) params …), and if you wanted to get the function itself (for example to pass it as a value) you would write
(my-function number boolean). The system might work as follows underneath: The generic name itself is defined as a macro. When that macro is invoked it looks up the type parameters in a hash table, if that combination is already known it returns a reference to the needed function. If it is not known it compiles a new top-level function, substituting the types given as parameters for the unbound types, and then saves that result in its look-up hash.
9/10: This proposal does a lot of things, but there are some areas it still doesn’t address. For example I couldn’t think of a clean way of doing overloading by parameter type. Likewise I was unable to devise a way to make types first class objects and still be able to conduct type inference properly