void swap(int *x, int *y) {
int xv = *x; // read
int yv = *y;
*x = yv; // write
*y = xv;
}
int main() {
int a = 13;
int b = 14;
swap(&a, &b); // references
printf("%d, %d", a, b);
return 0;
}
let swap x y =
let vx = !x (* read *)
and vy = !y in
x := vy; (* write *)
y := vx;;
let a = ref 13;; (* reference *)
let b = ref 14;;
swap a b;;
Primitives:
ref : 'a -> 'a ref
(!) : 'a ref -> 'a
(:=) : 'a ref -> a -> unit
swap :: IORef a -> IORef a -> IO ()
swap x y = do
vx <- readIORef x
vy <- readIORef y
writeIORef x vy
writeIORef y vx
Primitives:
newIORef :: a -> IO (IORef a)
readIORef :: IORef a -> IO a
writeIORef :: IORef a -> a -> IO ()
Side effects are properly indicated with IO
in types.
STRef
s are more safe than IORef
s because they need less privileges.
swap :: STRef s a -> STRef s a -> ST s ()
swap x y = do
vx <- readSTRef x
vy <- readSTRef y
writeSTRef x vy
writeSTRef y vx
Primitives:
newSTRef :: a -> ST s (STRef s a)
readSTRef :: STRef s a -> ST s a
writeSTRef :: STRef s a -> a -> ST s ()
Imperative style Fibonacci function:
fib :: Integer -> ST s Integer
fib n = do
a <- newSTRef 0
b <- newSTRef 1
replicateM_ n $ do
av <- readSTRef a
bv <- readSTRef b
writeSTRef a bv
writeSTRef b (av + bv)
readSTRef a
Note that the return type of the ST
computation does not depend on s
:
fib :: Integer -> ST s Integer
In this case the ST
computation can be turned into a pure value:
runST :: (forall s. ST s a) -> a
fib' :: Integer -> Integer
fib' n = runST (fib n)
In that way pointers can be used in a pure function.
Still, we need a strictly scheduled computation inside.
swap :: (Ptr a) (Ptr a) *Heap -> *Heap
swap x y h1 = h5
where
(vx, h2) = readPtr x h1
(vy, h3) = readPtr y h2
h4 = writePtr x vy h3
h5 = writePtr y vx h4
Primitives:
newPtr :: a *Heap -> (Ptr a, *Heap)
readPtr :: (Ptr a) *Heap -> (a, *Heap)
writePtr :: (Ptr a) a *Heap -> *Heap
The previous pointer interface is
However, an explicit heap value should be carried through the program which determines the evaluation order overly.
The result is an imperative program in a functional guise.
Reading a pointer does not alter the heap but it have to be done in time:
swap :: (Ptr a) (Ptr a) *Heap -> *Heap
swap x y h
#! vx = sreadPtr x h
vy = sreadPtr y h
= writePtr y vx (writePtr x vy h)
New primitive:
sreadPtr :: (Ptr a) Heap -> a
Note that Heap
is a subtype of *Heap
.
An Int
-pointer read and a Char
-pointer write may be interchanged safely.
This is modeled with typed heaps.
Primitives (as used in the Clean compiler sources):
newHeap :: .(Heap a)
newPtr :: a *(Heap a) -> (Ptr a, *(Heap a))
readPtr :: (Ptr a) *(Heap a) -> (a, *(Heap a))
sreadPtr :: (Ptr a) (Heap a) -> a
writePtr :: (Ptr a) a *(Heap a) -> *(Heap a)
Still a problem: Reading a Ptr Char
in a Heap Char
fails if the pointer was constructed in another Heap Char
.
We distinguish between different Heap Char
values by adding a phantom type variable: Heap k Char
.
newPtr :: a -> Heap k a -> (Ptr k, Heap k a)
sreadPtr :: Ptr k -> Heap k a -> a
writePtr :: Ptr k -> a -> Heap k a -> Heap k a
Note that the interface use Ptr k
instead of Ptr k a
because a
is not needed.
If the result of a heap-consuming computation does not contain the phantom typevar then we get a heap for free:
runHCC :: (forall k. Heap k a -> b) -> b
Finite maps are functions with finite domain.
Related phrases: dictionary (Python), hash (Perl), association list.
empty :: Map k a
lookup :: Ord k => k -> Map k a -> Maybe a
insert :: Ord k => k -> a -> Map k a -> Map k a
delete :: Ord k => k -> Map k a -> Map k a
We will need an additional function:
modify :: Ord k => k -> Maybe a -> Map k a -> Map k a
modify k Nothing m = delete k m
modify k (Just a) m = insert k a m
Heap k (Maybe a)
~ Map (Id k) a
newtype Id k = Id Int deriving (Eq, Ord)
We allow only Maybe
-typed heaps, so we can use an interface similar to finite maps.
Map
here is the abstract heap (not a finite map):
lookup :: Id k -> Map k a -> Maybe a
insert :: Id k -> a -> Map k a -> Map k a
delete :: Id k -> Map k a -> Map k a
Instead of including newPtr
, pointers are created with the map (this decison pays back later):
runICC :: (forall k. Map k a -> [Id k] -> b) -> b
runICC
runs an identifier consuming computation, which receives a map (heap) and an infinite list of identifiers (pointers) allowed to be used with that map.
data DList k a
= Empty
| NonEmpty
{ first :: Id k
, last :: Id k
, nodes :: Map k (DListNode k a)
}
data DListNode k a =
{ previous :: Maybe (Id k)
, next :: Maybe (Id k)
, value :: a
}
(<|) :: a -> DList k a -> Id k -> DList k a
(|>) :: DList k a -> a -> Id k -> DList k a
It is a problem that at insertions free Id
s are needed. This new version solves that problem:
data DList k a
= Empty
| NonEmpty
{ first :: Id k
, last :: Id k
, nodes :: Map k (DListNode k a)
, freeIds :: [Id k] -- stored free Ids
}
(<|) :: a -> DList k a -> DList k a
(|>) :: DList k a -> a -> DList k a
This version simplifies the creation of DList
s:
data DList a
= Empty
| forall k . NonEmpty -- encapsulated heap
{ first :: Id k
, last :: Id k
, nodes :: Map k (DListNode k a)
, freeIds :: [Id k]
}
singleton :: a -> DList a
(<|) :: a -> DList a -> DList a
(|>) :: DList a -> a -> DList a
Code for singleton
:
singleton :: a -> DList a
singleton x = runICC $ \emptyMap (firstId: otherIds) ->
NonEmpty
{ first = firstId
, last = firstId
, nodes = insert firstId x emptyMap
, freeIds = otherIds
}
But DList
s can not be joined because if we open two NonEmpty
values, the phantom variables can not be unified by the type system (which is right).
If k1
≠ k2
then Id k1
can not be used instead of Id k2
. This is right, because this type variables marks "different regions of memory".
But sometimes memory regions should be joined.
Id (k1 :|: k2)
is the joined set of Id k1
and Id k2
.
:|:
is an infix type constructor with kind * -> * -> *
:
data (a :|: b)
-- no constructors
Id (k1 :|: k2)
is the joined set of Id k1
and Id k2
.
A value with type Id k1
is acceptable when a value with type Id (k1 :|: k2)
is needed.
In other words, Id k1
is a subtype of Id (k1 :|: k2)
.
There is no subtyping in Haskell so we use explicit conversion functions:
left :: Id k1 -> Id (k1 :|: k2)
right :: Id k2 -> Id (k1 :|: k2)
One can join two maps (two heaps or two "memory regions") with union
:
union :: Map k1 a -> Map k2 a -> Map (k1 :|: k2) a
A simplification first: the freeIds
field is not needed because any number of free Id
s can be obtained by joining a new "memory region":
data DList a
= Empty
| forall k . NonEmpty
{ first :: Id k
, last :: Id k
, nodes :: Map k (DListNode k a)
-- , freeIds :: [Id k] -- not needed
}
(><) :: DList a -> DList a -> DList a
Empty >< y = y
x >< Empty = x
x >< y = NonEmpty
{ first = left (first x)
, last = right (last y)
, nodes = ... (fmap left (nodes x)
`union` fmap right (nodes y))
}
...
contains code which redirectsnext (last x)
to first y
andprevious (first y)
to (last x)
.
Redirecting next (last x)
to first y
is complicated because a DListNode
record have to be updated.
This could be improved if three different maps were used for previous
, next
and value
values. But a pointer can only point to one object.
Solution: Maps are tagged with type-level integers. A pointer can be a key in several maps with different integers.
We will use (Map I0 k a, Map I1 k b, Map I2 k c)
instead of Map k (a, b, c)
.
To understand the implementation:
The finite map Map i k a
represents a scattered memory fragment with the following properties:
a
-typed values.i
th field.i
is a type-level integer (I0
, I1
, I2
, ... in the implementation).k
is an additional tag (for example, to separate two doubly linked lists)data DList a
= Empty
| forall k . NonEmpty
{ first :: Id k
, last :: Id k
, previous :: Map I0 k (Id k)
, next :: Map I1 k (Id k)
, value :: Map I2 k a
}
Basic solution: There are a family of functions
runICC1 :: (forall k. Map I0 k a -> [Id k] -> b) -> b
runICC2 :: (forall k. Map I0 k a -> Map I1 k a -> [Id k] -> b) -> b
runICC3 :: (forall k. Map I0 k a -> Map I1 k a -> Map I2 k a -> [Id k] -> b) -> b
...
Instead of that, the current implementation use a variant of the function
runICC :: (forall k. Maps i k -> [Id k] -> b) -> b
where Maps
is a GADT which can be unfolded into i
maps.
The implementation is as efficient as if mutable references were used:
Map
s are not present in the generated code (for example, NonEmpty
has two fields).Id
s are replaced by pointers to records (arrays actually).TODOs:
Maybe
s still cause some performance loss.Guarantees by the type system:
TODOs:
Pros:
i
th fields of different records (if the i
th fields has the same type).Cons:
The library has a simple semantics.
This is demonstrated by a small pure functional implementation of the interface functions.
Sets can be modeled as maps to unit values.
Identifiers can refer to static data.
For example, if a sequence is implemented by a doubly linked map, previous
and next
are mutable but value
is static. So two maps are sufficient.