Last modified: Thu Oct 19, 2001

Data Field Extensions to Haskell

Warning: this documentation was originally written for the Haskell 1.3 version of the language and has not yet been updated to the Haskell 98 version. Minor details are thus obsolete, most notably the eval class is gone in Haskell 98 which has changed the typing of some Data Field Haskell functions.

Table of Contents

1 The Datafield Module
2 The Pord Module
3 The Show class
4 The Eval Class
5 forall-abstraction
  5.1 Other Modifications to the Prelude
  5.2 Semantics of forall-abstraction
    5.2.1 Semantics for Pattern-matching
    5.2.2 Translation of forall-abstractions
6 for-abstraction

3. The Show class


class Show a  where 
  showsPrec        :: Int -> a -> ShowS
  showsPrec'       :: Int -> a -> ShowS
  showList         :: [a] -> ShowS
  showsType        :: a -> ShowS
  showsOutofbounds :: a -> ShowS

  showList xs = handle (showLs xs) (showsOutofbounds (head xs))
    where showLs []     = showString "[]"
          showLs (x:xs) = showChar '[' . shows x . 
                           (handle (showl xs) 
                                   (showString ", " . showsOutofbounds x))

          showl []     = showChar ']'
          showl (x:xs) = showString ", " . shows x . 
                           (handle (showl xs) 
                                   (showString ", " . showsOutofbounds x))

  showsPrec p x = handle (showsPrec' p x) (showsOutofbounds x)

  showsOutofbounds _ = showString "<OUB>"

To make the printing of the value outofBounds configurable for each type, we add a method showsOutofbounds to the Show class. To minimize the necessary modifications of the instances for the basic types, we add a method showsPrec' which should work as showsPrec in the Haskell report. The new showsPrec is, as default, a wrapper which checks is the result of applying showPrec' is outofBounds, and if so, calls showsOutofbounds.

showsOutofbounds is, as default, defined as printing <OUB>.

4. The Eval Class


infixr 0 `seq`
infixr 0 `hseq`

class Eval a where
   seq         :: a -> b -> b
   strict      :: (a -> b) -> a -> b
   hseq        :: a -> b -> b
   hyperstrict :: (a -> b) -> a -> b
   
   strict f x  =  x `seq` f x
   hyperstrict f x = x `hseq` f x

We add methods hseq and hyperstrict to the Eval class. They are hyperstrict analogs to the seq and strict methods. However, while seq and strict are strict both in the error value outofBounds and _|_, hseq and hyperstrict are hyperstrict only in _|_.

The new Eval instance derived by the compiler for a type T with constructors C1, ..., Cn is as shown below.

instance (Eval a1, ..., Eval ak) => Eval (T a1 ... ak) where
  x `seq` y = case x of
               C1 _ ... _ -> y
                        _ -> y

  x `hseq` y = 
    let x' = case x of
               C1 x1 ... xk1 -> 
                 x1 `hseq` (x2 `hseq` ... (xk1 `hseq` y)...)
               C2 x1 ... xk2 -> 
                 x1 `hseq` (x2 `hseq` ... (xk2 `hseq` y)...)
               		.
               		.
               		.
               Cn x1 ... xkn -> 
                 x1 `hseq` (x2 `hseq` ... (xkn `hseq` y)...)
    in if isoutofBounds x' then y else y

5. forall-abstraction

forall-abstraction is a construct for specifying data field in a implicit way. The idea is specify a data field very much as one would specify a partial function with lambda-abstraction. A partial function has a domain, and similarly a data fields given by forall-abstraction has a bound which is defined as to approximate the domain of the corresponding partial function. Syntactically, forall-abstraction works as lambda-abstraction:

forall apat1 ... apatn -> exp

As for lambda-abstractions, the general form above is equivalent to

forall x1 ... xn -> case ( x1, ..., xn ) of
( pat1, ..., patn ) -> exp

where x1,..., xn are fresh identifiers. The types of the identifiers being abstracted over must be instances of the Pord and Ix classes. So if the type of

\x1 ... xn -> expr

is

c => a1 -> ... -> an -> t

where the ai are type variables and c is a context, i.e a list (C1 u1, ...Cn un) where the Ci are class identifiers and the ui are type variables, then the type of

forall x1 ... xn -> expr

is

c' => Datafield a1 (Datafield a2 (... Datafield an t))

where c' is c with type assertions (Pord a1, Ix a1, ...Pord an, Ix an) added.

If we have a type of the above form where the ai are not type variables, the situation is a bit more complicated, since only type variables may appear in contexts. If ai = C v1 ...vn, where C is a type constructor and vi are type variables, then C must be an instance of Pord and Ix, and we must add contexts (Pord v1, Ix v1, ..., Pord vn, Ix vn) to c. The above applies recursively if the vi are not type variables. For instance, the expression forall ((x,y),z) -> x will have the type

(Pord a, Ix a, Pord b, Ix b, Pord b, Ix c) =>
Datafield ((a,b),c) -> a

5.1 Other Modifications to the Prelude

The putChar and putStr functions are modified so that they print <OUB> if outofBounds is encountered. That is:

putChar c = if isoutofBounds c then
		putStr "<OUB>"
	    else
		primPutchar c

putStr xs = if isoutofBounds xs then
	      putStr "<OUB>"
	    else
              case xs of
		[]     -> return ()
		(x:xs) -> do putChar x
			     putStr xs

5.2 Semantics of forall-abstraction

In the Haskell Report, the semantics of advanced syntactic structures are given as translation into the Haskell kernel, a very simple subset of Haskell. In this tradition, we give the semantics of forall-abstraction as a translation from the Haskell kernel with forall-abstractions into the Haskell kernel without forall-abstractions. However, for convenience we do not assume that let-expressions have been removed. We do assume all case-expressions have been "flattened", and written on the form

case v of { K x1 ... xn -> e ;
 _ -> e' }

Where v and x1, ..., xn are variables.

5.2.1 Semantics for Pattern-matching

The formal semantics for case-expressions in The Haskell 1.4 Report, Section 3.17.3 is changed slightly. The rule

(b)case v of { p1 match1;  ... ; pn matchn }
=  case v of { p1 match1 ;
                _  -> ... case v of {
                           pn matchn
                           _  -> error "No match" }...}
 where each matchi has the form:
  | gi,1  -> ei,1 ; ... ; | gi,mi -> ei,mi where { declsi }

is replaced by

(b')case v of { p1 match1;  ... ; pn matchn }
=  case v of { p1 match1 ;
                _  -> ... case v of {
                           pn matchn
                           _  -> caseNoMatch }...}
 where each matchi has the form:
  | gi,1  -> ei,1 ; ... ; | gi,mi -> ei,mi where { declsi }

We need caseNoMatch to formally distinguish pattern-matching errors from other errors in the semantics for forall-abstractions. caseNoMatch can be implemented by the compiler as

caseNoMatch = error "No match"

so this change does not change the semantics of pattern matching in practice.

5.2.2 Translation of forall-abstractions

(FORALL1) forall x1 ... xn -> e
= forall x1 -> ... -> forall xn -> e
(FORALL2) forall x -> e
= datafield (\ x -> e) B(e,(x),Ø)

The translation of forall-abstractions is given above. A forall-abstraction with multiple arguments is first translated into nested forall-abstractions with one argument each. The forall-abstractions with single arguments are translated into an application of the datafield constructor on a lambda-abstraction and a bound which is derived from the expression according to the B-scheme, shown here. It is defined in such a way that the bound of forall x -> e should provide a reasonable overapproximation of the domain of the partial function \x -> e. A small deviation from this is that functions occurring under a forall always are considered as strict as regards propagation of bounds, see the (APP2)-rule below.

Below, x and v stand for variables, while e and t stand for Haskell core-expressions.

Some notes on the translation: The parameters in B(e,x,Y) are a Haskell core expression e, a tuple x of identifiers, alternatively written (x1, ..., xn) (where n might be 1), and a set Y of identifiers. e is the expression being analyzed. x is the argument which we analyze e as a data field over. At the beginning, this is the argument of the forall-abstraction, and is thus a single identifier, but since case-expressions may bind new identifiers to the components of a tuple, we also need to find the applications of data fields to those identifiers. This is done by analyzing the subexpression where the binding has effect with respect to the tuple containing the new identifiers. The set Y keeps track of identifiers which are bound after the identifier being abstracted over. We must keep track of these since the analysis should treat them differently than the identifiers which are formal parameters to the forall-abstraction.

By abuse of notation, we will write x also for {x1, ..., xn}.

To keep the description more readable, prod_n e1 ... en is written either as e1 x ... x en, as xni = 1 ei, or, if all factors are identical, as en. We assume that all bound variables are distinct. Due to font restrictions, we write "x" for bounds product, "u" for set union, "in" for set membership, and "intersect" for set intersection below.

We also define a family of projection functions on bounds, prmk. Let r be a (set-theoretic) partial function from [1,m] to Haskell expressions, and b be a m-dimensional bound (i.e., a bound which represents a set of m-tuples). The projection prmk(r,b) is the projection of the bound b in the k:th dimension, with additional constraints in the dimensions for which the partial function r is defined. We first define prmk for product bounds. Let pmk be a family of functions with the property pmk(b1 ... bk ... bm) = bk, and

prmk(r, b) = if  cond  then  pmk(b)  else empty

where

cond =  r(i1)  `inBounds`  pmi1(b)  &&  ...  &&  r(ip)  `inBounds`  pmip(b)
dom(r) = {i1, ..., ip}

This definition works for dense bounds as well, if we define

pmk((l1, ..., lk, ..., lm) <:> (u1, ..., uk, ..., um)) = lk  <:>  uk

For sparse bounds we can define pmk as

pmk (sparse l) = sparse (map (\( x1, ..., xk, ..., xm ) -> xk)  l)

and prmk(r, b) as

prmk(r, b) = pmk (b `meet` (predicate p))

where

p = (\(x1, ..., xm)  ->  xi1  ==  r(i1)  &&  ...  &&  xil  ==  r(ip))

For predicate bounds, we have

prmk(r, predicate  p) = \xk -> p (r(1), ..., xk, ..., r(m))

if r(i) is defined for i {1,...,m} \ {k}, and

prmk(r, predicate  p) = universe

otherwise.

For universe and empty we have

prmk(r, universe) = universe

and

prmk(r, empty) = empty

(LAM)B(\ v1 ... vn -> e, x, Y) = B(e, x, Y u {v1,...,vn})
(CASE1)B(case xi of {( v1, ..., vn ) -> e ; _ -> e' }, x, Y)
= ((universe i-1 x B(e, v, Y u x) x universem-i)
`meet` B(e,x,Y u v))
(CASE2)B(case y of { K v1 ... vn  ->  e ; _ -> e'}, x, Y)
=B(e,x, Y u {v1,...,vn}) `join` B(e', x, Y)
(APP1)B((!) e (t1, ..., tm), x, Y)
= T(bounds e, (t1, ..., tm), x, Y), if FV(e) intersect (Y u x) = Ø
(APP2)B(e1 e2, x, Y) = B(e1, x, Y) `meet` B(e2, x, Y)
(LET)B(let { v1 = e1 ; ... ; vn = en } in  e , x, Y)
= B(e1, x, Y u {v1, ..., vn}) `meet` ... `meet` B(en, x, Y u {v1, ..., vn})
`meet` B(e, x, Y u {v1, ..., vn})
(PFAIL)B(caseNoMatch, x, Y) = empty
(AFAIL)B(outofBounds, x, Y) = empty
(DEFAULT)B(e, x, Y) = universe , if none of the other rules apply
(TUPLE) T(b,(t1, ..., tm), (x1, ..., xn), Y)
= xni = 1 `meet`mk = 1 C(b, k, (t1, ..., tm), xi, Y u x)
(COMP) C(b, k, (t1, ..., tm), xi, Y)
= transBound(prmk(r, b), a) if tk =xi + a,
where FV(a) intersect Y = Ø
= prmk(r, b) if tk =xi
= T(prmk(r, b), (t1', ..., tp'), xi, Y \ {xi}) if tk =(t1', ..., tp'),
and xi in FV(tk)
= universe otherwise
where r = {(j,tj) | FV(tj) intersect Y = Ø}

We now explain the rules above.

6. for-abstraction

for-abstraction specifies the bounds of data fields more explicitly than forall-abstraction. The syntax is

for pat in { e1 -> e1' ; ... ; en -> en' }

which is equivalent to

(forall pat -> if inBounds pat (e1) then e1'
else if inBounds pat (e2) then e2'
...
else if inBounds pat (en) then en'
else outofbounds) <\>
(e1) `join` ... `join` (en)

Björn Lisper, bjorn.lisper@mdh.se