*********************************
Type Directed Search ``:search``
*********************************
Idris' ``:search`` command searches for terms according to their
approximate type signature (much like
`Hoogle `__ for Haskell). For
example::
Idris> :search e -> List e -> List e
= Prelude.List.(::) : a -> List a -> List a
Cons cell
= Prelude.List.intersperse : a -> List a -> List a
Insert some separator between the elements of a list.
> Prelude.List.delete : Eq a => a -> List a -> List a
< assert_smaller : a -> b -> b
Assert to the totality checker than y is always structurally
smaller than x (which is typically a pattern argument)
< Prelude.Basics.const : a -> b -> a
Constant function. Ignores its second argument.
The best results are listed first. As we can see, ``(::)`` and
``intersperse`` are exact matches; the ``=`` symbol to the left of those
results tells us the types of ``(::)`` and ``intersperse`` are
effectively the same as the type that was searched.
The next result is ``delete``, whose type is more specific than the type
that was searched; that's indicated by the ``>`` symbol. If we had a
function with the signature ``e -> List e -> List e``, we could have
given it the type ``Eq a => a -> List a -> List a``, but not necessarily
the other way around.
The final two results, ``assert_smaller`` and ``const``, have types more
general than the type that was searched, and so they have ``<`` symbols
to their left. For example, ``e -> List e -> List e`` would be a valid
type for ``assert_smaller``. The correspondence for ``const`` is more
complicated than any of the four previous results. ``:search`` shows
this result because we could change the order of the arguments! That is,
the following definition would be legal:
.. code:: idris
f : e -> List e -> List e
f x xs = const xs x
About :search results
---------------------
:search's functionality is based on the notion of type isomorphism.
Informally, two types are isomorphic if we can identify terms of one
type exactly with terms of the other. For example, we can consider the
types ``Nat -> a -> List a`` and ``a -> Nat -> List a`` to be
isomorphic, because if we have ``f : Nat -> a -> List a``, then
``flip f : a -> Nat -> List a``. Similarly, if
``g : a -> Nat -> List a``, then ``flip g : Nat -> a -> List a``.
With :search, we create a partial order on types; that is, given two
types ``A`` and ``B``, we may choose to say that ``A <= B``, ``A >= B``,
or both (in which case we say ``A == B``), or neither. For :search, we
say that ``A >= B`` if all of the terms inhabiting ``A`` correspond to
terms of ``B``, but it need not necessarily be the case that *all* the
terms of ``B`` correspond to terms of ``A``. Here's an example:
::
a -> a >= Nat -> Nat
The left-hand type has just a single inhabitant, ``id``, which
corresponds to the term ``id {a = Nat}``, which has the right-hand type.
However, there are various terms inhabiting the right-hand type (such as
``S``) which cannot correspond with terms of type ``a -> a``.
We can consider the partial order for ``:search`` to be, in some sense,
inductively generated by several classes of "edits" which are described
below.
Possible edits
~~~~~~~~~~~~~~
Here is a simple approximate list of the edits that are possible in
``:search``. They are not entirely formal, and do not necessarily
reflect the ``:search`` command's actual behavior. For example, the
*argument application* rule may be used directly on arguments that are
bound after other arguments, without using several applications of the
*argument transposition* rule.
- **Argument transposition**
::
A : Type B : Type a : A, b : B |- M : Type
----------------------------------------------------------------------------
(x : A) -> (y : B) -> [x,y/a,b]M == (y : B) -> (x : A) -> [x,y/a,b]M
Score: 1 point
Example:
::
a -> Vect n a -> Vect (S n) a == Vect n a -> a -> Vect (S n) a
Note that in order for it to make sense to change the order of
arguments, neither of the arguments' types may depend on the value bound
by the other argument!
- **Symmetry of equality**
::
A = B : Type t : Type |- M : Type
----------------------------------------
[A = B/t]M == [B = A/t]M
Score: 1 point
Example:
::
(x,y,z : Nat) -> x + (y + z) = (x + y) + z
==
(x,y,z : Nat) -> (x + y) + z = x + (y + z)
Note that this rule means that we can flip equalities anywhere they
occur (i.e., not only in the return type).
- **Argument application**
::
e : A |- M : Type y1 : T1, ..., yn : Tn |- x : A
-----------------------------------------------------------------
(z : A) -> [z/e]M >= (y1 : T1) -> ... -> (yn :Tn) -> [x/e]M
Score: <= : 3 points, >= : 9 points
Examples:
::
a -> a >= Nat -> Nat
a -> a >= List e -> List e
Vect k (Fin k) >= Vect 5 (Fin 5)
Note that the ``n`` shown in the scheme above may be 0; that is, there
are no Pi terms to be added on the right side. For example, that's the
case for the first example shown above. This is probably the most
important, and most widely used, rule of all.
- **Type class application**
::
C : Type -> TypeClass
, y1 : T1, ..., yn : Tn |- A : Type, instance : C A
, t : Type |- M : Type
-----------------------------------------------------------------
C a => [a/t]M >= (y1 : T1) -> ... -> (yn : Tn) -> [A/t]M
Score: <= : 4 points, >= : 12 points
Examples
::
Ord a => a >= Int
Show (List e) => List e -> String >= Show a => List a -> String
This rule is used by looking at the instances for a particular type
class. While the scheme is shown only for single-parameter type classes,
it naturally generalizes to multi-parameter type classes. This rule is
particularly useful in conjunction with argument application. Again,
note that the ``n`` in the scheme above may be 0.
- **Type class introduction**
::
t : Type |- M : Type C : Type -> TypeClass
-----------------------------------------------
(t : Type) -> M >= C t => M
Score: <= : 2 points, >= : 6 points
Example:
::
a -> a -> Bool >= Eq a => a -> a -> Bool
Scoring and listing search results
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When a type ``S`` is searched, the type is compared against the types of
all of the terms which are currently in context. When :search compares
two types ``S`` and ``T``, it essentially tries to find a chain of
inequalities
::
R1 R2 Rn Rn+1
S <= A1 <= ... <= An <= T
using the edit rules listed above. It also tries to find chains going
the other way (i.e., showing ``S >= T``) as well. Each rule has an
associated score which indicates how drastic of a change the rule
corresponds to. These scores are listed above. Note that for the rules
which are not symmetric, the score depends on the direction in which the
rule is used. Finding types which are more general that the searched
typed (``S <= T``) is preferred to finding types which are less general.
The score for the entire chain is, at minimum, the sum of the scores of
the individual rules (some non-linear interactions may be added). The
:search function tries to find the chain between ``S`` and ``T`` which
results in the lowest score, and this is the score associated to the
search result for ``T``.
Search results are listed in order of ascending score. The symbol which
is shown along with the search result reflects the type of the chain
which resulted in the minimum score.
Implementation of :search
-------------------------
Practically, naive and undirected application of the rules enumerated
above is not possible; not only is this obviously inefficient, but the
two application rules (particularly *argument application*) are really
impossible to use without context given by other types. Therefore, we
use a heuristic algorithm that is meant to be practical, though it might
not find ways to relate two types which may actually be related by the
rules listed above.
Suppose we wish to match two types, ``S`` and ``T``. We think of the
problem as a non-deterministic state machine. There is a ``State``
datatype which keeps track of how well we've matched ``S`` and ``T`` so
far. It contains:
- Names of argument variables (Pi-bound variables) in either type which
have yet to be matched
- A directed acyclic graph (DAG) of arguments (Pi-bindings) for ``S``
and ``T`` which have yet to be matched
- A list of typeclass constraints for ``S`` and ``T`` which have yet to
be matched
- A record of the rules which have been used so far to get to this
point
A function ``nextSteps : State -> [State]`` finds the next states which
may follow from a given state. Some states, where everything has been
matched, are considered final. The algorithm can be roughly broken down
into multiple stages; if we start from having two types, ``S`` and
``T``, which we wish to match, they are as follows:
1. For each of ``S`` and ``T``, split the types up into their return
types and directed acyclic graphs of the arguments, where there is an
edge from argument A to argument B if the term bound in A appears in
the type of B. The topological sorts of the DAG represent all the
possible ways in which the arguments may be permuted.
2. For type ``T``, recursively find (saturated) uses of the ``=`` type
constructor and produce a list of modified versions of ``T``
containing all possible flips of the ``=`` constructor (this
corresponds to the *symmetry of equality rule*).
3. For each modified type for ``T``, try to unify the return type of the
modified ``T`` with ``S``, considering arguments from both ``S`` and
``T`` to be holes, so that the unifier may match pieces of the two
types. For each modified version of ``T`` where this succeeds, an
initial ``State`` can be made. The arguments and typeclasses are
updated accordingly with the results of unification. The remainder of
the algorithm involves applying ``nextSteps`` to these states until
either no states remain (corresponding to no path from ``S`` to
``T``) or a final state is found. ``nextSteps`` also has several
stages:
4. Try to unify arguments of ``S`` with arguments of ``T``, much like is
done with the return types. We work "backwards" through the
arguments: we try matching all remaining arguments of ``S`` which
lack outgoing edges in the DAG of remaining arguments (that is, the
bound value doesn't appear in the type of any other remaining
arguments) with the all of the corresponding remaining arguments of
``T``. This is done recursively until no arguments remain for both
``S`` and ``T``; otherwise, we give up at this point. This step
corresponds to application of the *argument application rule*, as
well as the *argument transposition* rule.
5. Now, we try to match the type classes. First, we take all possible
subsets of type class constraints for ``S`` and ``T``. So if ``S``
and ``T`` have a total of ``n`` type class constraints, this produces
``2^n`` states for every state, and this quickly becomes infeasible
as ``n`` grows large. This is probably the biggest bottleneck of the
algorithm at the moment. This step corresponds to applications of the
*type class introduction* rule.
6. Try to match type class constraints for ``S`` with those for ``T``.
We attempt to unify each type class constraint for ``S`` with each
constraint for ``T``. This may result in applications of the *type
class application* rule. Once we are unable to match any more type
class constraints between ``S`` and ``T``, we proceed to the final
step.
7. Try instantiating type classes with their instances (in either ``S``
or ``T``). This corresponds to applications of the *type class
application* rule. After instantiating a type class, we hopefully
open up more opportunities to match typeclass constraints of ``S``
with those of ``T``, so we return to the previous step.
The code for :search is located in the `Idris.TypeSearch
module `__.
Aggregating results
~~~~~~~~~~~~~~~~~~~
The search for chains of rules/edits which relate two types can be
viewed as a shortest path problem where nodes correspond to types and
edges correspond to rules relating two types. The weights or distances
on each edge correspond to the score of each rule. We then may imagine
that we have a single start node, our search type ``S``, and several
final nodes: all of the types for terms which are currently in context.
The problem, then, is to find the shortest paths (where they exist) to
all of the final nodes. In particular, we wish to find the "closest"
types (those with the minimum score) first, as we'd like to display them
first.
This problem nicely maps to usage of Dijkstra's algorithm. We search for
all types simultaneously so we can find the closest ones with the
minimum amount of work. In practice, this results in using a priority
queue of priority queues. We first ask "which goal type should we work
on next?", and then ask "which state should we expand upon next?" By
using this strategy, the best results can be shown quickly, even if it
takes a bit of time to find worse results (or at least rule them out).
Miscellaneous Notes
-------------------
Whether arguments are explicit or implicit does not affect search
results.