On Scheme

Thoughts on Scheme and teaching Scheme

Type Foolishness

Posted by Peter on May 4, 2006

Today’s post is meant as a warning to others: don’t make the same foolish implementation choices I did.

My original approach to the type system was the following: to create a type description for each type that was abstract, i.e. could contain self-references and unbound variables, so that type inference could be usefully done with it. And it will, when it is finished, serve this purpose admirably (at least I don’t see any major problems with it at the moment). The problem came when I assumed that this same system could be used to tag the objects generated at run time. Since it can code for any type, no matter how abstract, I figured that it would be simpler to simply label the type of run time objects in the same way. What I didn’t take into consideration was how an object at run-time is generated. For example consider a simple cons cell. In general we can describe the type of the cons function as follows (-> (a b) ((pair-of a b))), meaning that the type of it’s result is dependant on the type of its parameters. However when we generate the result of the cons operation we cannot label as a type containing two free variables, because eventually we might need to provide type checks on the result (for type safety), and a type of unbound gives us no useful information. What is needed then (or so I thought), was to look at the type of the parameters and then construct a new type, at run-time, using this information. This is only sensible as long as you don’t examine it that closely. For example consider adding a new element to a list already containing 100 members. The type of this new element would then contain type information for the 100 existing descending elements embedded in it. Clearly this is an unreasonable amount of overhead.

The correct solution is relatively simple, make the abstract types and the type encoding at run-time different. The run-time information simply needs to provide an indication how to obtain information for the type of related objects at runtime, then when type checks need to be made (rarely of course due to type inference optimizations) we can get all the information we need without duplicating it unnecessarily. Of course there still needs to be some glue between the purely abstract type system and the practical type system, for example when dispatching by type at runtime, but that is simply a detail.

Hopefully I’ll have the type system working soon, so that I can move onto the parser of my little project.

5 Responses to “Type Foolishness”

  1. Greg Buchholz said

    What is needed then (or so I thought), was to look at the type of the parameters and then construct a new type, at run-time, using this information. This is only sensible as long as you don’t examine it that closely. For example consider adding a new element to a list already containing 100 members. The type of this new element would then contain type information for the 100 existing descending elements embedded in it. Clearly this is an unreasonable amount of overhead.

    Hmm. The most overhead you could have is a factor of two time the type information, right? But it still seems like there’s at least a couple of ways to handle the list situation without too much overhead. (For this discussion let’s assume that an object is a pair of a value and a type) First, you could use a union type, so that a list of a string and 99 integers might look like…

    '((("foo" . 'string)
    (1 . 'integer)
    (2 . 'integer)
    ...
    (99 . 'integer)) . '(list-of (or 'string 'integer)))

    …or you could float the type tags up a level, so instead of something like…

    '((10 . 'integer) ("foo" . 'string) (3.14 . 'real))

    …you could have…


    '((10 "foo" 3.14) . ('integer 'string 'real))

    P.S. Your blog could use a “preview” button.

  2. Peter said

    I see by your examples that you are falling into the same kind of reasoning trap that I was, i.e. confusing type inference with run-time type safety. For type inference types such as (list-of (or string integer)) or are what we want, and are powerful enough to eliminate most type checks. However lets say that we are branching based on the type of a variable, say as follows:

    (tcond x
    ((pair-of number any) (+ (car x) 5))
    ((pair-of string any) (+ (string->number (car x)) 5)))

    Using the type: (list-of (or string integer)) we can determine that it is possible for either of these clauses to trigger, and likewise we can use type inference within each clause to eliminate type checks. However at run-time we need access to all the details on order to determine which branch to follow, making types like (list-of (or string integer)) useless, we need types like (pair-of number (pair-of string (pair-of number nil))), but as you can see the type information, if gathered all in one place, is already quite extensive even for a three element list. The reason it would have to be duplicated at each element of course is that we might need to pass the car or the cdr of the list to another function, and we wouldn’t want to re-compute the type information in these operations, because it would be too expensive. Thus the best solution is to let each element contain only the information that it is a pair, and the information how to obtain the type of each of its releated elements if it is needed, thus reducing the duplication of this information to a minimum and making us pay the costs of computing the complete type only when we absolutely need it. Finally you may be thinking that types such as (number string number) could fix these problems, but consider the following function:

    (define (gen-list)
    (let ((val (random 3)))
    (cond
    ((= val 0) (cons 1 (gen-list)))
    ((= val 1) (cons “1” (gen-list)))
    (else ‘()))))

    Clearly here the type of the result can be determined at compile time at best as (list-of (or string integer)). To constuct a type like: (number string number) the cons operation would have to be examining its arguments and creating a new type on each operation, and since this information may only be needed if we branch on type the overhead of this solution is too expensive.

  3. Greg Buchholz said

    I guess I’m mostly puzzled by the “too much overhead” thing. Have you taken a look at lanugages like OCaml, Clean, and Haskell? They seem to do OK on the efficiency front when it comes to code like…

    data Foo = I Integer | S String

    bar (I num) = 5 + num
    bar (S str) = 5 + read str

    (and maybe also their notions of dynamic typing, and stongly typed heterogeneous lists)

  4. Peter said

    Right, these languages use type inference at compile time, but they probably don’t use that information for run-time checks (this is the solution I am proposing). Type inference is efficient at eliminating such checks, so run-time checks don’t have to be that painful, it’s just if you try to use the abstract types that you got from preforming type inference you end up wasting space and/or speed when you are forced to resort to run-time checks, and actually are less efficent. For example in the code:

    data Foo = I Integer | S String

    bar (I num) = 5 + num
    bar (S str) = 5 + read str

    you wouldn’t have any run-time checks, you could eliminate all the type checking at compile time, but this is not true in every case (because in Scheme we like to do weird things with lists, like the gen-list example I gave). Don’t think about how a type system would be implemented in a language that already has types, because if the language has types why build a type system? For example if I was adding type inference to an existing Scheme implementation I would use macros and simply let the normal type system do the checks at run-time. What really illuminates the issues is considering how you would implement types in a language that doesn’t have run-time type checks, for example C.

  5. Greg Buchholz said

    For example in the code:

    data Foo = I Integer | S String

    bar (I num) = 5 + num
    bar (S str) = 5 + read str

    you wouldn’t have any run-time checks, you could eliminate all the type checking at compile time,

    No, with a union type like above you still need run-time checks to dispatch on the “I” vs. “S” constructor. Static typing merely prevents you from using the “Foo” type where a Double was expected, etc.

    -- genlist a-like in Haskell...

    import System.Random

    data Foo = I Int | S String deriving Show

    main = print (map genlist (take 10 (randoms (mkStdGen 10) ::[Int])))

    genlist n = if even n then I 1 else S "1"

Leave a reply to Greg Buchholz Cancel reply