Parametric Datatype-Genericity Jeremy Gibbons (University of Oxford) Ross Paterson (City University, London) Tuesday 17th July, 2007 Abstract Datatype-generic programs are programs that are parametrized by a datatype or type functor. There are two main styles of datatype-generic programming: the Algebra of Programming approach, characterized by structured recursion operators parametrized by a shape functor, and the Generic Haskell approach, characterized by case analysis over the structure of a datatype. We show that the former enjoys a kind of parametricity, relating the behaviours of generic functions at different types; in contrast, the latter is more ad hoc, with no coherence required or provided between the various clauses of a definition.
1
Introduction
Consider the following familiar datatype of lists, with a fold operator and a length function: data List a = Nil | Cons a (List a) foldL :: b -> (a->b->b) -> List a -> b foldL e f Nil = e foldL e f (Cons a x) = f a (foldL e f x) length :: List a -> Int length = foldL 0 (\ a n -> 1+n) Here also is a datatype of binary trees, with its fold operator: data Tree a = Empty | Node a (Tree a) (Tree a) foldT :: b -> (a->b->b->b) -> Tree a -> b foldT e f Empty = e foldT e f (Node a x y) = f a (foldT e f x) (foldT e f y)
1
One can compute the ‘left spine’ of a binary tree as a list, using foldT: lspine :: Tree a -> List a lspine = foldT Nil (\ a x y -> Cons a x) The ‘left depth’ of a binary tree is defined to be the length of the left spine: ldepth :: Tree a -> Int ldepth = length . lspine It should come as no surprise that the two steps encoded above can be fused into one, computing the left depth of the tree directly rather than via a list: ldepth :: Tree a -> Int ldepth = foldT 0 (\ a m n -> 1+m) This result is a consequence of the well-known fusion law for foldT, which states that h . foldT e f = foldT e’ f’ given that h e = e’ and h (f a u v) = f’ a (h u) (h v) for every a, u, v. Importantly, there is nothing inherently specific to binary trees involved here. The ‘fold’ operator is datatype-generic, which is to say, parametrized by a datatype, such as Tree or List. Moreover, there is a datatype-generic fusion law for folds, of which this law for foldT is a datatype-specific instance. But there is more to this particular application of fusion than meets the eye. It reveals some deep structure of the datatype-generic fold operator, relating folds on binary trees and on lists. Similar relationships hold between the folds on any two datatypes. Specifically, the main result of this paper is that: fold is a higher-order natural transformation. That is, ‘fold’ is a rather special kind of datatype-generic operator, both enjoying and requiring coherence between its datatype-specific instances. This is in contrast to many other datatype-generic operators such as parsers, pretty-printers and marshallers, for which there need be no such coherence. The situation is analogous to that between parametrically polymorphism and ad hoc polymorphism. For example, the parametrically polymorphic function length :: forall a . List a -> Int (for which we have now written an explicit universal type quantification) is a family of monomorphic functions lengtha :: List a -> Int, one for each type a. The parametricity gives rise to a naturality property, relating different monomorphic instances of the polymorphic function length: lengthb (map f x) = lengtha x 2
for any f :: a -> b. Informally, mapping over a list preserves its length. This naturality property follows from the type of the function length alone; one need not look at the definition of the function. In contrast, consider the ad hoc polymorphic function count :: Eq a => a -> List a -> Int that counts the occurrences of a given element in a list (in which the notation ‘Eq a =>’ denotes a type class context, essentially a bounded quantification over the type class Eq of types a possessing an equality operation). This too can be considered as a family of monomorphic functions counta :: a -> List a -> Int, one for each type a in Eq. But the polymorphism is ad hoc, and there is no corresponding coherence between the specific instances: the equation that would be the naturality property, countb (f a) (map f x) = counta a x for f :: (Eq a, Eq b) => a -> b, does not hold for the intended definition of count, so certainly does not follow from its type. (In fact, using the standard compilation technique for ad hoc polymorphism by translation into dictionarypassing style [9], count will be implemented with an additional parameter giving the equality operation, effectively having the type count’ :: (a->a->Bool) -> a -> List a -> Int This turns the ad hoc polymorphism into parametric polymorphism; the naturality property for this type, when specialized to functions, turns out to be g a a’ = h (f a) (f a’) ⇒ count’b h (f a) (map f x) = count’a g a x for any f::a->b, g::a->a->Bool, h::b->b->Bool. This property is valid; among other consequences, when g and h are specialized to the equality functions on types a and b respectively, it states that mapping with an injective function preserves element counts.) This paper is concerned with parametric datatype-generic operators such as fold, which obey a naturality property analogous to that of length above, and constrasts them with ad hoc datatype-generic operators such as parsers, pretty printers and marshallers, which do not. Cardelli and Wegner [5] consider parametric polymorphism to be ‘true polymorphism, whereas ad hoc polymorphism is some kind of apparent polymorphism whose polymorphic character disappears at close range’. We refrain from making quite so strong a statement in the context of datatypegeneric programming; however, we do consider parametric datatype-genericity to be the ‘gold standard’ of datatype-generic programming, to be preferred over ad hoc datatype-genericity when it is available.
2
Datatype-generic programming
Computer science uses a small number of highly overloaded terms, corresponding to characteristics that recur in many areas of the field. The distinction between static 3
and dynamic aspects is one example, applying to typing, binding, IP addresses, web pages, loading of libraries, memory allocation, and program analyses, among many other things. The term generic is another example, although in this case particularly within the programming languages community rather than computer science as a whole. ‘Generic programming’ means different things to different people. We have been using it to refer to programs parametrized by a datatype such as ‘lists of’ or ‘trees of’; but to reduce confusion, we have coined a new term datatype-generic programming [7] for this specific usage. Examples of datatype-generic programs are the map and fold higher-order traversal patterns of the origami programming style [6], and data processing tools such as pretty printers, parsers, encoders, marshallers, comparators, and so on that form the main applications of Generic Haskell [12]. All of the operations named above can be defined once and for all, for all datatypes, rather than again and again for each specific datatype. However, the two families of operations differ in the style of definition that can be provided. Roughly speaking, the origami family of datatype-generic operations have definitions that are parametric in the datatype parameter, whereas those of the data processor family are in general ad hoc in the parameter. That is to say, in the former case the datatype parameter is passed around and applied but not analysed, whereas the latter case relies on a case analysis of the parameter. As a consequence, different specific instances of a datatype-generic operation from the origami family are related, whereas there is no such constraint on instances of a datatype-generic data processor — the origami operations are natural in the datatype parameter. That is not to say that the data processing operations are non-natural. But it does mean that the definition by case analysis does not provide naturality naturally: ensuring naturality requires very careful consideration of the interaction between the different cases of the definition. By analogy with Cardelli and Wegner’s observations about polymorphism, we claim that naturality is at least a useful healthiness condition on datatype-generic definitions.
2.1
Origami programming
The Origami [6] or Algebra of Programming [3] style is based around the idea that datatypes are fixpoints of shape functors. In this overview, for simplicity, we consider only sets and total functions, and unary polynomial functors; the full generality is recounted in the definitions in Section 3. Definition 1 (Shape functor) A shape functor has two aspects: an operation on types and an operation on functions. The operation on types takes a type parameter α to some type constructed from the parameter α, the unit type 1, and constant types such as Int, using sum + and product ×. The corresponding operation on functions has to be coherent with the operation on types, in the sense that if f : α → β then
4
F(f ) : F(α) → F(β). Moreover, it has to respect identities and composition: F(id α ) = id F(α) F(f · g) = F(f ) · F(g)
2
Example 2 An example of a shape functor is the operation G whose action on types is given by G(α) = 1 + Int × α, and whose action on functions G(f ) behaves as the identity on a unit (in the left of the sum), and applies f to the second component of a pair (in the right of the sum). 2 A shape functor F determines the shape of a recursive datatype µF; the recursive knot is tied by taking a fixpoint, up to isomorphism. That is, µF is by definition isomorphic to F(µF). The isomorphism is witnessed by inverse operations inF : F(µF) → µF and outF : µF → F(µF). Fixpoints are generally not unique; the one that is chosen is in a sense the ‘least’, as captured below, and generally even that is unique only up to isomorphism. Definition 3 (F-algebra) An F-algebra is a pair (α, f ) with f of type F(α) → α. 2 Definition 4 (Initial F-algebra) The pair (µF, inF ) is chosen to be an initial Falgebra: one for which, for any other F-algebra (α, f ), there is a unique function h satisfying h · inF = f · F(h) It can be shown [30] that there will be a unique such initial F-algebra (up to isomorphism) for any polynomial shape functor F. Moreover, it is not hard to prove Lambek’s Lemma (Lemma 9 below), that the inF so defined has an inverse outF . 2 That the equation in Definition 4 has a unique solution in h, given F and f , justifies the introduction of a name foldF (f ) for that solution. The existence and uniqueness of the solution are together expressed by the universal property below. Definition 5 (Fold) For f : F(α) → α, define foldF (f ) by: h = foldF (f ) ⇔ h · inF = f · F(h)
2
Two simple consequences of the universal property characterizing fold are an evaluation rule, showing how a data structure is consumed, and a reflection rule, that folding with the constructors is the identity. Lemma 6 (Fold evaluation) foldF (f ) · inF = f · F(foldF (f ))
2 5
Proof Just let h = foldF in Definition 5.
2
Lemma 7 (Fold reflection) foldF (inF ) = id µF
2
Proof Just let h = id µF and f = inF in Definition 5.
2
A more interesting consequence of the universal property is a fusion law, for combining a fold with a following function. Theorem 8 (Fold fusion) h · foldF (f ) = foldF (g) ⇐ h · f = g · F(h)
2
Proof Again, the universal property is the key ingredient. h · foldF (f ) = foldF (g) ⇔ { universal property } h · foldF (f ) · inF = g · F(h · foldF (f )) ⇔ { fold evaluation } h · f · F(foldF (f )) = g · F(h · foldF (f )) ⇔ { functors } h · f · F(foldF (f )) = g · F(h) · F(foldF (f )) ⇐ { Leibniz } h · f = g · F(h) (Note that by omitting the last step in the proof, we get an equivalence: for the fusion on the left-hand side of the theorem statement to be valid, it is not only sufficient but also necessary for the property on the right to hold on the range of F(foldF (F )).) 2 Lemma 9 (Lambek’s Lemma) The universal property in Definition 5 induces an inverse in−1 F = foldF (F(inF )) for the constructor inF . So µF really is a fixpoint of F, up to isomorphism. Proof Suppose that inF has a right inverse as a fold; then inF · foldF (f ) = id µF ⇔ { reflection (Lemma 7) } inF · foldF (f ) = foldF (inF ) ⇐ { fusion (Theorem 8) } inF · f = inF · F(inF ) ⇐ { Leibniz } f = F(inF ) 6
2
So that right inverse has to be foldF (F(inF )). Now to check that it is also a left inverse: foldF (F(inF )) · inF = { evaluation (Lemma 6) } F(inF ) · F(foldF (F(inF ))) = { functors respect composition } F(inF · foldF (F(inF ))) = { right inverse } F(id µF ) = { functors respect identity } id F(µF)
2 .
Definition 10 A natural transformation φ : F → G between shape functors F, G is a polymorphic function of type ∀α. F(α) → G(α) such that φβ · F(f ) = G(f ) · φα 2
for any f : α → β. .
Corollary 11 If φ : F → G, and f : G(α) → α for some particular α, then foldG (f ) · foldF (inG · φ) = foldF (f · φ)
2
Proof By fusion (Theorem 8), it suffices to show the following condition: foldG (f ) · inG · φµG = f · φα · F(foldG (f )) This follows from the evaluation rule for fold and naturality of φ: foldG (f ) · inG · φµG = { fold evaluation } f · G(foldG (f )) · φµG = { naturality of φ } f · φα · F(foldG (f ))
2
Example 12 An integer-specific specialisation of the example from Section 1 is expressed in the origami programming style as follows. Let F(α) = 1 + Int × (α × α), so that µF is binary trees of integers, and G(α) = 1 + Int × α as above, so that µG . is lists of integers. Choose φ : F → G = id + id × fst, which discards right children. Then Corollary 11 states that one can compute the length of the left spine of a tree in two phases, via the spine (the left-hand side), or in one phase, directly on the tree itself (the right-hand side). 2 Remark 13 For a polymorphic datatype Tree α, one would need to use a bifunctor such as Fα (β) = 1 + α × (β × β); the theory generalises smoothly. 2 7
2.2
Generic Haskell
A different approach to datatype-generic programming involves case analyses on the structure of datatypes: ‘a generic program is defined by induction on structurerepresentation types’ [14]. We take as representative of this approach the Generic Haskell extension [10, 13, 21] of Haskell [27], based on the notion of type-indexed functions with kind-indexed types [11], in which the family of type indexes is the polynomial types (sums and products of the unit type and some basic types such as Int); however, our remarks apply just as well to a number of related techniques, such as the ‘Scrap Your Boilerplate’ series [18, 19, 20]. Consider the datatype-generic function enc, encoding a data structure as a list of bits. In Generic Haskell, this is defined roughly as follows. (We have made some simplifications, such as omitting cases for labels and constructors, for brevity. We have also adapted the Generic Haskell syntax slightly, for consistency with the rest of this paper; in particular, we use the list constructors from Section 1, so that we can use ‘:’ to denote a type or kind judgement.) enc{|α : ∗|}
:
(enc{|α|}) ⇒ α → List Bool
enc{|Unit|} () enc{|Int|} n enc{|α :+: β|} (Inl x) enc{|α :+: β|} (Inr y) enc{|α :×: β|} (x, y)
= = = = =
Nil encInt n Cons (False, enc{|α|} x) Cons (True, enc{|β|} y) enc{|α|} x ++ enc{|β|} y
The first line gives a type declaration, as usual; it declares that enc specialized to a type α of kind ∗ has type α → List Bool . (This does not mean that enc can be applied only to types of kind ∗. Rather, the type of enc at a type index of another kind is derived automatically from this. For example, enc{|List|} has type (α → List Bool ) → (List α → List Bool ); hence the slogan that ‘type-indexed functions have kind-indexed types’ [11].) Moreover, the context ‘(enc{|α|}) ⇒’ indicates that enc depends on itself, that is, it is defined inductively. There are cases for each possible top-level structure of the type index; the base case for integers assumes a primitive function encInt : Int → List Bool . The Generic Haskell compiler uses these five cases to derive a specialization of enc for any polynomial datatype, such as for the types of integer lists and binary trees introduced in Section 1. The important point is that the behaviour of enc is dispersed across five separate cases, and any desired coherence between different instances has to be very carefully engineered. Recall, for example, the spine relationship between binary trees and lists from Section 1. It turns out that that the specializations of enc for binary trees and for their left spines are indeed related; but proving this fact requires detailed consideration of the interaction between the behaviours in different branches, and depends non-trivially on prefix-freeness of the encoding (a happy consequence of this particular definition). To restate our point: ad hoc datatype-generic programs may in fact be parametric, and perhaps parametricity is a useful healthiness condition for datatype genericity; 8
but with ad hoc techniques, this parametricity requires careful design, rather than arising automatically.
3
Technical definitions and notation
We recall the (entirely standard) definitions of categories, functors, natural transformations, opposite categories, contravariance, and functor categories. We also give the (obvious, but apparently not standardized) definitions of higher-order functors and higher-order natural transformations, which are simply the standard definitions specialized to the functor category. For a fuller account, see for example [22]. Definition 14 (Category) A category C consists of: • a collection of objects |C|; • for each pair of objects X, Y , a collection of arrows C(X, Y ) from X to Y ; • for each object X ∈ |C|, an identity arrow id X ∈ C(X, X); • a composition operator (·C ) taking compatible arrows in C(Y, Z) and C(X, Y ) to an arrow in C(X, Z), which is associative with the appropriate identity arrows as units. 2 Remark 15 We often write ‘f : X → Y ’ for ‘f ∈ C(X, Y )’, and omit the subscript C on the composition operator (·C ), when the category C is understood. 2 Definition 16 (Functor) A functor F : C ; D between categories C and D is an operation taking the objects and arrows of C to those of D, preserving the structure of C in D: • F(X) ∈ |D| for each object X ∈ |C|; • F(f ) ∈ D(F(X), F(Y )) for each arrow f ∈ C(X, Y ); • F(id X ) = id F(X) ; • F(f ·C g) = F(f ) ·D F(g). An endofunctor on C is a functor from C to C. The identity functor Id acts as the identity on objects and arrows, and functor composition (◦) is functional composition of the actions on objects and on arrows. 2 Remark 17 For clarity, we distinguish the symbol used in the ‘types’ of an arrow f : X → Y and of a functor F : C ; D. We also distinguish the arrows in natural . : transformations φ : F → G and higher-order natural transformations op : H → K below. 2 9
.
Definition 18 (Natural transformation) A natural transformation φ : F → G between two functors F, G : C ; D is a family of arrows of D indexed by objects of C, effectively transforming the one functor into the other: • components φX ∈ D(F(X), G(X)) for each object X ∈ |C|, such that. . . • φY ·D F(f ) = G(f ) ·D φX for each f ∈ C(X, Y ): F(X) F(f )
φX
F(Y )
φY
/
/
G(X)
G(f )
G(Y )
The family id F of monomorphic identity arrows (id F )X = id F(X) indexed by objects X . is a natural transformation F → F, and composition φ · ψ of natural transformations . . φ : G → H and ψ : F → G defined componentwise by (φ · ψ)X = φX · ψX yields a . natural transformation F → H. 2 Remark 19 In this paper, the natural transformations we will consider are mostly natural transformations between endofunctors. However, when we get to higher-order natural transformations (Definition 28) between higher-order functors (Definition 27), the latter will in general not be endofunctors. 2 Definition 20 (Opposite category) The opposite or dual Cop of the category C reverses all the arrows and compositions: • |Cop | = |C|; • Cop (X, Y ) = C(Y, X); • f ·Cop g = g ·C f .
2
Definition 21 (Contravariant functor) A contravariant functor F : C ; D between categories C and D is defined to be an ordinary (covariant) functor Cop ; D. 2 Remark 22 Unpacking Definition 21 for a contravariant functor F : C ; D yields: • F(X) ∈ |D| for each object X ∈ |Cop |; • F(f ) ∈ D(F(Y ), F(X)) for each arrow f ∈ Cop (Y, X); • F(id X ) = id F(X) ; • F(g ·Cop f ) = F(g) ·D F(f ). or equivalently: 10
• F(X) ∈ |D| for each object X ∈ |C|; • F(f ) ∈ D(F(Y ), F(X)) for each arrow f ∈ C(X, Y ) (note the reversal of X and Y ); • F(id X ) = id F(X) ; • F(f ·C g) = F(g) ·D F(f ) (note again the reversal). So an alternative view of a contravariant functor F : C ; D is that it reverses arrows and compositions. 2 Remark 23 Similarly, contravariant functor F : C ; D satisfies: • F(X) ∈ |Dop | for each object X ∈ |C|; • F(f ) ∈ Dop (F(X), F(Y )) for each arrow f ∈ C(X, Y ); • F(id X ) = id F(X) ; • F(f ·C g) = F(f ) ·Dop F(g). — that is, F is equivalently a covariant functor C ; Dop .
2
Remark 24 (Naturality, contravariantly) By definition, a natural transforma. tion φ : F → G for functors F, G : C ; D makes the diagram in Definition 18 commute; that is, φ satisfies the characteristic equation φY · F(f ) = G(f ) · φX for each f ∈ C(X, Y ). Also by definition, contravariant functors F, G are covariant functors Cop ; D. Then the characteristic equation of the natural transformation holds for each f ∈ Cop (X, Y ) = C(Y, X). Conventionally, one swaps the roles of the objects X, Y , yielding the commuting diagram F(Y ) F(f )
φY
F(X)
φX
/
/
G(Y )
G(f )
G(X) 2
for each f ∈ C(X, Y ) when F, G are contravariant.
Definition 25 (Functor category) The functor category DC is just the category of covariant functors C ; D and their natural transformations: • objects |DC | are the covariant functors F : C ; D; .
• arrows DC (F, G) are the natural transformations φ : F → G; • identity arrows are the identity natural transformations; 11
• composition of arrows is componentwise composition of natural transformations. 2 Remark 26 In this paper, the only functor categories we will use are endofunctor categories CC for some category C. 2 Definition 27 (Hofunctor) A higher-order functor (or hofunctor for short) H on category C to another category D is just the normal notion of functor, specialized to act on the endofunctor category CC : • H (F) : D for each functor F : C ; C; .
• H (φ) : D(H (F), H (G)) for each natural transformation φ : F → G; • identities are preserved, with H (id F ) = id H (F) ; • composition is preserved, with H (φ · ψ) = H (φ) · H (ψ).
2
Definition 28 (Hont) A higher-order natural transformation (or hont for short) : op : H → K for hofunctors H , K on category C to category D is just the normal notion of natural transformation, specialized to act on the endofunctor category CC : • opF : D(H (F), K (F)) for each functor F : C ; C, such that. . . .
• opG · H (φ) = K (φ) · opF for each φ : F → G: H (F) H (φ)
opF / K (F)
H (G)
opG
/
K (φ)
K (G) 2
For the remainder of the paper, we assume that C is a particular cartesian closed category, such as the category of sets and total functions, and we revert to using lowercase Greek letters for its objects. In particular, for any two objects α, β ∈ |C|, there is a suitable exponential object (α ⇒ β) ∈ |C| representing the space of ‘functions’ between the ‘types’ α and β.
4
Fold is a higher-order natural transformation
In this section, we justify the observation in the introduction about the coherence between different instances of the datatype-generic fold operator; that is, fold is a hont.
12
.
Definition 29 For φ : F → G, define the operation µ(φ) : µF → µG by µ(φ) = foldF (inG · φµG )
2
Remark 30 Definition 29 makes µ a functor from CC to C. The reflection rule implies that µ(id F ) = id µF : µF → µF, and preservation of compositions µ(φ · ψ) = µ(φ) · µ(ψ) : µF → µH .
.
2
for ψ : F → G and φ : G → H follows from Corollary 11.
Definition 31 Fix an object α ∈ |C|. Define the two contravariant hofunctors H , K from CC to Cop as follows: • H (F) = (F(α) ⇒ α); .
• H (φ) = ( · φα ) : (G(α) ⇒ α) → (F(α) ⇒ α) for φ : F → G; • K (F) = (µF ⇒ α); .
• K (φ) = ( · µ(φ)) : (µG ⇒ α) → (µF ⇒ α) for φ : F → G.
2
Remark 32 Because H , K are contravariant, natural transformations between them make the diagram from Remark 24 commute. Specialized appropriately, as Remark 24 specializes Definition 18, the characteristic equation of a hont opF between contravariant hofunctors F, G reads: opF · H (φ) = K (φ) · opG .
for each φ : F → G, or diagrammatically: opG / H (G) K (G) H (φ)
H (F)
opF
/
K (φ)
K (F) 2
Specializing op to fold yields exactly the law of folds from Corollary 11. Theorem 33 With H and K as in Definition 31, :
fold : H → K
2
Proof As noted above, expanding the definitions leads to the proof obligation foldF (f · φα ) = foldG (f ) · µ(φ) .
for φ : F → G and f : G(α) → α, which was the content of Corollary 11.
2
Remark 34 Fold fusion (Theorem 8) corresponds to the statement that fold is dinatural in α, but that is not our concern here. 2 13
5
Other honts
We have shown that the datatype-generic fold operator enjoys a rather special property, being parametric in its shape argument. However, there is nothing inherently specific about fold in this regard; many other datatype-generic operators enjoy similar properties. For a start, the constructor in of initial algebras is datatype-generic, being parametrized by a functor; it too is natural in that functor. :
Theorem 35 The constructor in of initial algebras is a hont. That is, in : H → K , where covariant hofunctors H , K from CC to C are given by: • H (F) = F(µF); .
• H (φ) = (φµG · F(µ(φ)) : F(µF) → G(µG) for φ : F → G; • K (F) = µF; .
• K (φ) = µ(φ) : µF → µG for φ : F → G. where µ(φ) is as defined in Definition 29 above. That is, inG · φµG · F(µ(φ)) = µ(φ) · inF
2
Proof The proof obligation inG · φµG · F(µ(φ)) = µ(φ) · inF is trivially discharged, using the characterization foldF (inG · φµG ) of µ(φ), and the evaluation rule for fold. 2 The construction of folds as unique homomorphisms from initial algebras dualizes elegantly. Definition 36 (Final F-coalgebra, unfold) An F-coalgebra is a pair (α, f ) with f : α → F(α). The pair (νF, outF ) is chosen to be a final F-coalgebra: one for which, for any other F-coalgebra (α, f ), there is a unique function h satisfying outF · h = F(h) · f (It can again be shown [30] that such a final F-coalgebra exists for any polynomial shape functor F.) We introduce the name unfoldF (f ) for this unique solution. Existence and uniqueness are together expressed by the universal property: h = unfoldF (f ) ⇔ outF · h = F(h) · f As consequences, we get evaluation and reflection rules: outF · unfoldF (f ) = F(unfoldF (f )) · f unfoldF (outF ) = id νF 14
and a fusion law for combining an unfold with a preceding function: unfoldF (f ) · h = unfoldF (g) ⇐ f · h = F(h) · g
2
Example 37 The bifunctor S acting on types as Sα (β) = α × β and on functions as Sα (f ) = id × f induces a datatype Strm(α) = νSα of streams. A step function f : β → Sα (β) induces a stream producer unfoldSα (f ) : β → Strm(α). So repeat = unfoldS (dup) where dup x = (x, x) yields infinitely many copies of a given element, and zipAdd = unfoldSInt (f ) where f (x : xs, y : ys) = (x + y, (xs, ys)) 2
adds two streams of Ints element-wise to make a third.
The datatype-generic unfold operator we have just introduced is natural in its functor parameter. :
Theorem 38 The unfold operator is a hont unfold : H → K where covariant hofunctors H , K from CC to C are given by: • H (F) = (α ⇒ F(α)); .
• H (φ) = (φα · ) : (α ⇒ F(α)) → (α ⇒ G(α)) for φ : F → G; • K (F) = (α ⇒ νF); .
• K (φ) = (ν(φ) · ) : (α ⇒ νF) → (α ⇒ νG) for φ : F → G. .
where ν(φ) = unfoldG (φνF · outF ) : νF → νG for φ : F → G. That is, unfoldG (φα · f ) = ν(φ) · unfoldF (f ) 2
for f :: α → F(α). Proof By definition of ν(φ) and fusion, it suffices to show φνF · outF · unfoldF (f ) = G(unfoldF (f )) · φα · f This follows from the naturality of φ and the evaluation rule for unfold: φνF · outF · unfoldF (f ) = { unfold evaluation } φνF · F(unfoldF (f )) · f = { naturality of φ } G(unfoldF (f )) · φα · f
2
Remark 39 Note that the higher-order naturality of unfold is simpler than that of fold, because it does not involve contravariance. Perhaps unfold should be better appreciated [8] — even considered the ‘ordinary’ case, and fold its dual? 2 15
Example 40 As a simple application of Theorem 38, consider Pascal’s Triangle: 1 1 1 1 1 4 ...
1 2
3
1 3
6
1 4
1
The triangular shape can expressed as the final coalgebra of the bifunctor T defined by T(α) = Int × Strm(Int) × Strm(Int) × α, giving a corner, two infinite edges and an inner structure. Pascal’s Triangle itself is unfoldT (step) (repeat 1), where step (x : y : xs) = let zs = (2 × y) : zipAdd (xs, zs) in (x, y : xs, y : xs, zs) Pascal’s Triangle has many nice properties. One of them is that the nth element of the central column 1, 2, 6, 20 . . . is the number of: non-decreasing sequences of n integers drawn from 0..n; direct routes on a grid making n steps East and n steps North in total; directed, convex polyominoes having semiperimeter n + 2; and so on [29, Sequence A000984]. Extraction of this middle column is achieved by νcorner , where . the natural transformation corner : T → SInt is defined by corner (x, ys, zs, u) = (x, u). By Theorem 38, νcorner · unfoldT step = unfoldSInt (cornerstep) where cornerstep (x : y : xs) = let zs = (2 × y) : zipAdd (xs, zs) in (x, zs) This yields a direct method of computing the sequence, without having to generate Pascal’s Triangle first. 2 The destructor out of final co-algebras is natural in its shape functor. Theorem 41 The destructor out of final co-algebras is a hont. That is, out : : H → K where covariant hofunctors H , K from CC to C are given by: • H (F) = νF; .
• H (φ) = ν(φ) : νF → νG for φ : F → G. • K (F) = F(νF); .
• K (φ) = (G(ν(φ) · φµF ) : F(νF) → G(νG) for φ : F → G; where ν(φ) is as defined in Theorem 38 above. That is, G(ν(φ)) · φνF · outF = outG · ν(φ)
2 16
Proof As with the proof of Theorem 35, the required property easily follows from the definition of ν(φ) and the evaluation rule of unfold. 2 There are generalizations of fold and unfold, from so-called (co-)iteration to primitive (co-)recursion; these too are honts. Theorem 42 Meertens [23] captures the familiar pattern of primitive recursion by defining the paramorphism: paraF : (F(α × µF) ⇒ α) → (µF ⇒ α) :
This too is a hont: para : H → K where contravariant hofunctor H from CC to Cop is given by: • H (F) = (F(α × µF) ⇒ α); • H (φ) = ( · G(id × µ(φ)) · φα×µF ) : (G(α × µG) ⇒ α) → (F(α × µF) ⇒ α) for . φ:F→G and µ(φ) and K are as defined in Definitions 29 and 31 respectively.
2
Theorem 43 Uustalu and Vene [31] dualize Meertens’ paramorphisms by defining the apomorphism: apoF : (α ⇒ F(α + νF)) → (α ⇒ νF) :
This is another hont: apo : H → K where covariant hofunctor H from CC to C is given by: • H (F) = (α ⇒ F(α + νF)); • H (φ) = (φα+νG · F(id + ν(φ)) · ) : (α ⇒ F(α + νF)) → (α ⇒ G(α + νG)) for . φ:F→G and K and ν(φ) are as defined in Theorem 38 above.
6
2
Conclusions
We have shown that the datatype-generic fold operator enjoys a kind of higher-order parametricity result, relating different instances connected by a natural transformation. Similar results apply to many other datatype-generic operators in the origami programming style. In contrast, in approaches to datatype-generic programming that rely on case analyses on the shape of data, such properties have to be much more carefully engineered. The large collection of honts in Section 5 begs a question: are all datatype-generic operators honts? We believe not. It seems reasonable that the Generic Haskell function enc of Section 2.2 enjoys a kind of parametricity property, by which (say) the left spine encoding of a tree can be obtained either by projecting onto the left spine and encoding that, or directly on the tree; that is, the list and tree instances of enc are naturally related. But that seems rather more unlikely for (say) a contextsensitive pretty-printing function, inserting linebreaks in renditions of subterms in order to fit within the remaining space left by outer terms. 17
6.1
Inspiration
Our inspiration for the higher-order natural transformations described in this paper is Paul Hoogendijk’s very elegant work with Roland Backhouse [15, 16] on generic . ‘zip’ functions zipF,G : F ◦ G → G ◦ F that commute or transpose two functors F and G. The general case requires a relational setting, because such a transposition might be partial (for instance, yielding no result on mismatched data structures, such as when zipping a pair of lists of differing length) and non-deterministic (for instance, yielding results of arbitrary shape, such as when zipping with a constant functor F). However, the essence of the idea can be seen from considering the special case when G corresponds to a fixed-shape datatype such as Pair. In this case, the partially-parametrized remainder is conventionally called a ‘generic unzip’ . unzipF : F ◦ Pair → Pair ◦ F, and can easily be given an explicit definition: unzipF = F(fst) M F(snd ) where fst and snd are the projections from Pair, and M generates a Pair using two element-generating functions. It is not hard to show that unzip is a hont : (◦Pair) → (Pair◦). The higher-order naturality property arising from this observation was used in [4] to transform an O(n log n)-time algorithm for computing bit-reversal permutations to O(n)-time.
6.2
Higher-order theorems for free
Theorem 42 can be proven using the higher-order naturality of fold and in, in terms of which para is defined. Similarly, we expect that any operator built from fold and in (or unfold and out) could be shown to be a hont between appropriate functors, using a kind of second-order parametricity theorem [28, 32, 25], but we have not yet worked out the details.
6.3
Related work
The equation in Corollary 11 capturing the higher-order naturality of fold has appeared before; for example, it is Theorem 6.12 of the second author’s PhD thesis [26], and Equation 29 in the influential ‘bananas paper’ [24]. However, to the best of our knowledge, the fact that this property is a naturality property has not been noted previously. Although Hoogendijk [15] coins the term ‘parametric polytypism’ (for what we call ‘parametric datatype-genericity’), and contrasts it with ‘ad hoc polytypism’, the only hont he considers is the generic zip, described above. In particular, although Hoogendijk certainly makes use of fold, there is no evidence from his writing that he realised that it too was a hont. The closest observation we are aware of in the literature is an offhand remark (‘catamorphisms on different types can be related, but the precise details are not clear to us’) by Jeuring and Jansson [17]. The extension to other datatype-generic operators such as unfold appears to be novel, albeit perhaps not very surprising.
18
6.4
Acknowledgements
We would like to thank the members of the Algebra of Programming group at Oxford, for helpful discussions on this work; in particular, Bruno Oliveira showed us how to carry out the fusion in Example 40. Philip Wadler, Patty Johann and Neil Ghani all provided advice that improved our presentation. Neither this paper nor the work it reports has been submitted for publication anywhere else. However, presentations on the material have been made at the Workshop on Generic Programming (Portland, Oregon, September 2006) and at IFIP WG2.1 Meeting #62 (Namur, Belgium, December 2006).
References [1] Roland Backhouse and Jeremy Gibbons, editors. Summer School on Generic Programming, volume 2793 of Lecture Notes in Computer Science. SpringerVerlag, 2003. [2] Roland Backhouse, Jeremy Gibbons, Ralf Hinze, and Johan Jeuring, editors. Spring School on Datatype-Generic Programming, volume 4719 of Lecture Notes in Computer Science. Springer-Verlag, 2007. [3] Richard Bird and Oege de Moor. The Algebra of Programming. Prentice-Hall, 1996. [4] Richard Bird, Jeremy Gibbons, and Geraint Jones. Program optimisation, naturally. In J. W. Davies, A. W. Roscoe, and J. C. P. Woodcock, editors, Millenial Perspectives in Computer Science. Palgrave, 2000. [5] Luca Cardelli and Peter Wegner. On understanding types, data abstraction and polymorphism. ACM Computing Surveys, 17(4):471–522, December 1985. [6] Jeremy Gibbons. Origami programming. In Jeremy Gibbons and Oege de Moor, editors, The Fun of Programming, Cornerstones in Computing, pages 41–60. Palgrave, 2003. [7] Jeremy Gibbons. Datatype-generic programming. In Backhouse et al. [2]. [8] Jeremy Gibbons and Geraint Jones. The under-appreciated unfold. In Proceedings of the Third ACM SIGPLAN International Conference on Functional Programming, pages 273–279, Baltimore, Maryland, September 1998. [9] Cordelia Hall, Kevin Hammond, Simon Peyton Jones, and Philip Wadler. Type classes in Haskell. ACM Transactions on Programming Languages and Systems, 18(2):19–138, 1996. [10] Ralf Hinze. A generic programming extension for Haskell. In Erik Meijer, editor, Third Haskell Workshop, 1999. 19
[11] Ralf Hinze. Polytypic values possess polykinded types. In Roland Backhouse and Jos´e Nuno Oliveira, editors, Mathematics of Program Construction, volume 1837 of Lecture Notes in Computer Science, pages 2–27. Springer-Verlag, 2000. [12] Ralf Hinze and Johan Jeuring. Generic Haskell: Applications. In Backhouse and Gibbons [1], pages 57–97. [13] Ralf Hinze and Johan Jeuring. Generic Haskell: Practice and theory. In Backhouse and Gibbons [1], pages 1–56. [14] Ralf Hinze, Johan Jeuring, and Andres L¨oh. Comparing approaches to generic programming in Haskell. In Backhouse et al. [2]. [15] Paul Hoogendijk. A Generic Theory of Datatypes. PhD thesis, Technische Universiteit Eindhoven, 1997. [16] Paul Hoogendijk and Roland Backhouse. When do datatypes commute? In Eugenio Moggi and Guiseppe Rosolini, editors, Category Theory and Computer Science, volume 1290 of Lecture Notes in Computer Science, pages 242–260. Springer-Verlag, September 1997. [17] Johan Jeuring and Patrick Jansson. Polytypic programming. In John Launchbury, Erik Meijer, and Tim Sheard, editors, Advanced Functional Programming, volume 1129 of Lecture Notes in Computer Science. Springer-Verlag, 1996. [18] Ralf L¨ammel and Simon Peyton Jones. Scrap your boilerplate: A practical design pattern for generic programming. In Types in Language Design and Implementation, 2003. [19] Ralf L¨ammel and Simon Peyton Jones. Scrap more Boilerplate: Reflection, zips, and generalised casts. In International Conference on Functional Programming, pages 244–255. ACM Press, 2004. [20] Ralf L¨ammel and Simon Peyton Jones. Scrap your Boilerplate with class: Extensible generic functions. In International Conference on Functional Programming, pages 204–215. ACM Press, September 2005. [21] Andres L¨oh. Exploring Generic Haskell. PhD thesis, Utrecht University, 2004. [22] Saunders Mac Lane. Categories for the Working Mathematician. Springer-Verlag, 1971. [23] Lambert Meertens. Paramorphisms. Formal Aspects of Computing, 4(5):413–424, 1992. [24] Erik Meijer, Maarten Fokkinga, and Ross Paterson. Functional programming with bananas, lenses, envelopes and barbed wire. In John Hughes, editor, Functional Programming Languages and Computer Architecture, volume 523 of Lecture Notes in Computer Science, pages 124–144. Springer-Verlag, 1991. 20
[25] John C. Mitchell and Albert R. Meyer. Second-order logical relations. In Rohit Parikh, editor, Logics of Programs, volume 193 of Lecture Notes in Computer Science, pages 225–236, 1985. [26] Ross A. Paterson. Reasoning about Functional Programs. PhD thesis, University of Queensland, 1987. [27] Simon Peyton Jones. The Haskell 98 Language and Libraries: The Revised Report. Cambridge University Press, 2003. [28] John C. Reynolds. Towards a theory of type structure. In B. Robinet, editor, Colloque sur la Programmation, volume 19 of Lecture Notes in Computer Science, pages 408–425. Springer-Verlag, 1974. [29] N. J. A. Sloane. On-line encyclopedia of integer sequences. http://www. research.att.com/~njas/sequences/. Accessed May 2007. [30] M. B. Smyth and G. D. Plotkin. The category-theoretic solution of recursive domain equations. SIAM Journal on Computing, 11(4):761–783, November 1982. [31] Varmo Vene and Tarmo Uustalu. Functional programming with apomorphisms (corecursion). Proceedings of the Estonian Academy of Sciences: Physics, Mathematics, 47(3):147–161, 1998. 9th Nordic Workshop on Programming Theory. [32] Philip Wadler. Theorems for free! In Functional Programming Languages and Computer Architecture, pages 347–359. ACM, 1989.
21