%-*- mode: Latex; abbrev-mode: true; auto-fill-function: do-auto-fill -*-

%include lhs2TeX.fmt
%include myFormat.fmt

\chapter{Pattern-Matching Details}
\label{ch:patterns}

\index{pattern!matching||(} 
In this chapter we will look at Haskell's pattern-matching process in
greater detail.

Haskell defines a fixed set of patterns for use in case expressions
and function definitions.  Pattern matching is permitted using the
constructors of any type, whether user-defined or pre-defined in
Haskell.  This includes tuples, strings, numbers, characters, etc.
For example, here's a contrived function that matches against a tuple
of ``constants:''
\begin{spec}
contrived :: ([a], Char, (Int, Float), String, Bool) -> Bool
contrived ([],  'b',  (1,   2.0),   "hi",   True) = False
\end{spec}
This example also demonstrates that {\em nesting} of patterns is
permitted (to arbitrary depth).

Technically speaking, {\em formal parameters} to functions are also
patterns---it's just that they {\em never fail to match a value}.  As
a ``side effect'' of a successful match, the formal parameter is bound
to the value it is being matched against.  For this reason patterns in
any one equation are not allowed to have more than one occurrence of
the same formal parameter.
\index{pattern!refutable}
\index{pattern!irrefutable}

A pattern that may fail to match is said to be {\em refutable}; for
example, the empty list \hs{[]} is refutable.  Patterns such as formal
parameters that never fail to match are said to be {\em irrefutable}.
There are three other kinds of irrefutable patterns, which are
summarized below.

\index{pattern!as-pattern}
\paragraph*{As-Patterns} Sometimes it is convenient to name a
pattern for use on the right-hand side of an equation.  For example, a
function that duplicates the first element in a list might be written
as: 
\begin{spec}
f (x:xs)                = x:x:xs
\end{spec}
Note that \hs{x:xs} appears both as a pattern on the left-hand side,
and as an expression on the right-hand side.  To improve readability,
we might prefer to write \hs{x:xs} just once, which we can achieve
using an {\em as-pattern} as follows:\footnote{Another advantage to
doing this is that a naive implementation might otherwise completely
reconstruct \hs{x:xs} rather than re-use the value being matched
against.}
\begin{spec}
f s@(x:xs) = x:s
\end{spec}
Technically speaking, as-patterns always result in a successful match,
although the sub-pattern (in this case \hs{x:xs}) could, of course,
fail.

\index{pattern!wildcard}
\paragraph*{Wildcards} Another common situation is matching against
a value we really care nothing about.  For example, the functions
\hs{head} and \hs{tail} can be written as:
\begin{spec}
head (x:_)             = x
tail (_:xs)            = xs
\end{spec}
in which we have ``advertised'' the fact that we don't care what a
certain part of the input is.  Each wildcard will independently match
anything, but in contrast to a formal parameter, each will bind
nothing; for this reason more than one are allowed in an equation.

\index{pattern!lazy}
\paragraph*{Lazy Patterns}
There is one other kind of pattern allowed in Haskell.  It is called a
{\em lazy pattern}, and has the form \hs{~pat}.  Lazy patterns are
{\em irrefutable}: matching a value $v$ against \hs{~pat} always
succeeds, regardless of \hs{pat}.  Operationally speaking, if an
identifier in \hs{pat} is later ``used'' on the right-hand-side, it
will be bound to that portion of the value that would result if \hs{v}
were to successfully match \hs{pat}, and $\bot$ otherwise.

Lazy patterns are useful in contexts where infinite data structures
are being defined recursively.  For example, infinite lists are an
excellent vehicle for writing {\em simulation} programs, and in this
context the infinite lists are often called {\em streams}.  
%% Streams were discussed at length in Chapter \ref{ch:streams}.

\section*{Pattern-Matching Semantics}

So far we have discussed how individual patterns are matched, how some
are refutable, some are irrefutable, etc.  But what drives the overall
process?  In what order are the matches attempted?  What if none
succeed?  This section addresses these questions.

Pattern matching can either {\em fail}, {\em succeed} or {\em
diverge}.  A successful match binds the formal parameters in the
pattern.  Divergence occurs when a value needed by the pattern
diverges (i.e.\ is non-terminating) or results in an error ($\bot$).
The matching process itself occurs ``top-down, left-to-right.''
Failure of a pattern anywhere in one equation results in failure of
the whole equation, and the next equation is then tried.  If all
equations fail, the value of the function application is $\bot$, and
results in a run-time error.

For example, if \hs{bot} is a divergent or erroneous computation, and
if \hs{[1,2]} is matched against \hs{[0,bot]}, then \hs{1} fails to
match \hs{0}, so the result is a failed match.  But if \hs{[1,2]} is
matched against \hs{[bot,0]}, then matching \hs{1} against \hs{bot}
causes divergence (i.e.~$\bot$).

\index{pattern!guard}

The only other twist to this set of rules is that top-level patterns
may also have a boolean {\em guard}, as in this definition of a
function that forms an abstract version of a number's sign:
\begin{spec}
sign x |  x >  0        =   1
       |  x == 0        =   0
       |  x <  0        =  -1
\end{spec}
Note here that a sequence of guards is given for a single pattern; as
with patterns, these guards are evaluated top-down, and the first that
evaluates to \hs{True} results in a successful match.

\paragraph*{An Example}

The pattern-matching rules can have subtle effects on the meaning of
functions.  For example, consider this definition of \hs{take}:
\begin{spec}
take  0     _           =  []
take  _     []          =  []
take  n     (x:xs)      =  x : take (n-1) xs
\end{spec}
and this slightly different version (the first 2 equations have been
reversed):
\begin{spec}
take1  _     []         =  []
take1  0     _          =  []
take1  n    (x:xs)      =  x : take1 (n-1) xs
\end{spec}
Now note the following:
\[\begin{array}{lcl}
  \hs{take  0 bot}  &\ \ \ \red\ \ \ & \hs{[]} \\
  \hs{take1 0 bot}  &\ \ \ \red\ \ \ & \bot \\[.1in]
  \hs{take  bot []} &\ \ \ \red\ \ \ & \bot \\
  \hs{take1 bot []} &\ \ \ \red\ \ \ & \hs{[]}
\end{array}\]
We see that \hs{take} is ``more defined'' with respect to its second
argument, whereas \hs{take1} is more defined with respect to its first.
It is difficult to say in this case which definition is better.  Just
remember that in certain applications, it may make a difference.  (The
Standard Prelude includes a definition corresponding to \hs{take}.)

\section*{Case Expressions}
\indexkw{case}

Pattern matching provides a way to ``dispatch control'' based on
structural properties of a value.  However, in many circumstances we
don't wish to define a {\em function} every time we need to do this.
Haskell's {\em case expression} provides a way to solve this problem.
Indeed, the meaning of pattern matching in function definitions is
specified in the Haskell Report in terms of case expressions, which
are considered more primitive.  In particular, a function definition
of the form:
\[\begin{array}{l}
\hs{f} p_{11} ... p_{1k} \hs{=} e_{1} \\
... \\
\hs{f} p_{n1} ... p_{nk} \hs{=} e_{n}
\end{array}\]
where each $p_{ij}$ is a pattern, is semantically equivalent to:
\[ \hs{f x1 x2 ... xk = case (x1, ..., xk) of}
   \begin{array}[t]{l}
   (p_{11}, ..., p_{1k}) \rightarrow e_{1} \\
   ... \\
   (p_{n1}, ..., p_{nk}) \rightarrow e_{n}
   \end{array}
\]
where the \hs{xi} are new identifiers.  For example, the
definition of \hs{take} given earlier is equivalent to:
\begin{spec}
take m ys = case (m,ys) of
              (0,_)       ->  []
              (_,[])      ->  []
              (n,x:xs)    ->  x : take (n-1) xs
\end{spec}
For type correctness, the types of the right-hand sides of a case
expression or set of equations comprising a function definition must
all be the same; more precisely, they must all share a common
principal type.

The pattern-matching rules for case expressions are the same as we
have given for function definitions.
\index{pattern!matching||)}