Phantom types and subtyping

Matthew Fluet, Riccardo Pucella
[doi] [Google Scholar] [DBLP] [Citeseer]
Read: 26 June 2021

Journal of Functional Programming 16(6)
Cambridge University Press
Pages 751-791
Note(s): phantom types, foreign function interface
Papers: reppy:att:1996, finne:icfp:1999, blume:babel:2001

When creating a safe interface to code in another language, you want to encode the type system of the other language. This paper describes a general encoding of subtyping hierarchies in the Hindley-Milner type system using phantom types and generalizes reppy:att:1996, finne:icfp:1999 and blume:babel:2001.

The essence of the encoding is as follows (see section 5.4 for a worked example)

  1. To define a dummy datatype corresponding to each sort within the type hierarchy. For example given “A > B, A > C, C > D, C > E”, you create1

    datatype A a
    datatype B a
    datatype C a
    datatype D a
    datatype E a

    The type parameter ‘a’ is used to encode subtyping

  2. Represent each concrete subtype by the path from the root to that subtype and ending in ‘()’.

    type A_conc = A ()
    type B_conc = A (B ())
    type C_conc = A (C ())
    type D_conc = A (C (D ()))
    type E_conc = A (C (E ()))
  3. Represent each abstract subtype by the path from the root to that subtype and ending in a type variable ‘a’

    type A_abs a = A a
    type B_abs a = A (B a)
    type C_abs a = A (C a)
    type D_abs a = A (C (D a))
    type E_abs a = A (C (E a))

    Note that a concrete subtype ‘c’ unifies with an abstract subtype ‘a’ only if ‘c’ is a subtype of ‘a’.

  4. Use these types as ‘phantom’ type parameters when defining the mapping to the other language. For example, if the representation in C is a 32-bit value, that has the above 5 subsorts, we might define

    newtype T a = T Word32
    mkA :: IO (T A_conc)
    mkB :: IO (T B_conc)
    mkC :: IO (T C_conc)
    mkD :: IO (T D_conc)
    mkE :: IO (T E_conc)
    addB :: T (C_abs a) -> T (C_abs a) -> T C_conc

    Which (if I got this example right) lets you add values that are any subtype of C and return something with the type C and prevents you from accidentally adding values of subtype A or B.

Type hierarchies that are not trees (i.e., that have multiple inheritance) can be encoded as an embedding into powerset lattices created using the product of two phantom types. And that leads to a more flexible but less readable alternative type encoding with types that look like “(T (T a), (b, c))”

Extending this approach to a form of bounded polymorphism is restricted by ML’s use of “prenex parametric polymorphism”. That is, all type quantifiers go on the outside of the type.

Phantom types

  1. Note that the paper uses ML syntax but I have switched it to Haskell syntax.