LogicGrowsOnTrees: a parallel implementation of logic programming using distributed tree exploration

[ bsd3, control, distributed-computing, library, logic, parallelism ] [ Propose Tags ] [ Report a vulnerability ]

NOTE: In addition to the following package description, see

TUTORIAL.md for a tutorial,
USERS_GUIDE.md for a user's guide that provides more information about how to use this package, and
README.md for an FAQ.

You can think of this package in two equivalent ways. First, you can think of it as an implementation of logic programming that is designed to be parellelized using workers that have no memory shared between them (hence, "distributed"). Second, you can think of this package as providing infrastructure for exploring a tree in parallel. The connection between these two perspectives is that logic programming involves making nondeterministic choices, and each such choice is equivalent to a branch point in a tree representing the search space of the logic program. In the rest of the reference documentation we will focus on the tree perspective simply because a lot of the functionality makes the most sense from the perspective of working with trees, but one is always free to ignore this and simply write a logic program using the standard approach of using MonadPlus to indicate choice and failure, and the Tree implementation of this typeclass will take care of the details of turning your logic program into tree. (If you are not familiar with this approach, then see <http://github.com/gcross/LogicGrowsOnTrees/blob/master/TUTORIAL.md TUTORIAL.md>.)

To use this package, you first write a function that builds a tree (say, by using logic programming); the LogicGrowsOnTrees module provides functionality to assist in this. You may have your function either return a generic MonadPlus or MonadExplorable (where the latter lets you cache expensive intermediate calculations so that they do not have to be performed again if this path is re-explored later), or you may have it return a Tree (or one of its impure friends) directly. You can then test your tree using the visting functions in the LogicGrowsOnTrees module.

WARNING: If you need something like state in your tree, then you should stack the state monad (or whatever else you want) on top of Tree rather than below it. The reason for this is that if you stack the monad below TreeT, then your monad will be affected by the order in which the tree is explored, which is almost never what you want, in part because if you are not careful then you will break the assumption made by the checkpointing and parallelization infrastructure that it does not matter in what order the tree is explored or even whether some parts are explored twice or not at all in a given run. If side-effects that are not undone by backtracking is indeed what you want, then you need to make sure that your side-effects do not break this assumption; for example, a monad which memoizes a pure function is perfectly fine. By contrast if you are working within the IO monad and writing results to a database rather than returning them (and assuming that duplicate results would cause problems) then you need to check to make sure you aren't writing the same result twice, such as by using the LogicGrowsOnTrees.Location functionality to identify where you are in the tree so you can query to see if your current location is already listed in the database.

If you want to see examples of generating a tree to solve a problem, then see LogicGrowsOnTrees.Examples.MapColoring or LogicGrowsOnTrees.Examples.Queens modules, which have some basic examples of using logic programming to find and/or count the number of solutions to a given map coloring problem and a given n-queens problem. The LogicGrowsOnTrees.Examples.Queens.Advanced module has my own solution to the n-queens problem where I use symmetry breaking to prune the search tree, cutting the runtime by about a factor of three.

Once your tree has been debugged, you can start taking advantage of the major features of this package. If you are interested in checkpointing, but not parallelization, then you can use the step functions in the LogicGrowsOnTrees.Checkpoint module to sequentially explore a tree one node at a time, saving the current checkpoint as often as you desire; at any time the exploration can be aborted and resumed later. Most likely, though, you will be interested in using the parallelization infrastructure rather than just the checkpointing infrastructure. The parallelization infrastructure uses a supervisor/worker model, and is designed such that the logic used to keep track of the workers and the current progress is abstracted away into the LogicGrowsOnTrees.Parallel.Common.Supervisor module; one then uses one of the provided adapters (or possibly your own) to connect the abstract model to a particular means of running multiple computations in parallel, such as multiple threads, multiple processes on the same machine, multiple processes on a network, and MPI; the first option is included in this package and the others are provided in separate packages. Parallelization is obtained by stealing workloads from workers; specifically, a selected worker will look back at the (non-frozen) choices it has made so far, pick the first one, freeze it (so that it won't backtrack and try the other branch), and then hand the other branch to the supervisor which will then give it to a waiting worker.

To use the parallelization infrastructure, you have two choices. First, you can opt to use the adapter directly; the exploration functions provided by the adapter are relatively simple (compared to the alternative to be discussed in a moment) and furthermore, they give you maximum control over the adapter, but the downside is that you will have to re-implement features such as regular checkpointing and forwarding information from the command line to the workers yourself. Second, you can use the infrastructure in LogicGrowsOnTrees.Parallel.Main, which automates most of the process for you, including parsing the command lines, sending information to the workers, determining how many workers (if applicable) to start up, offering the user a command line option to specify whether, where, and how often to checkpoint, etc.; this infrastructure is also completely adapter independent, which means that when switching from one adapter to another all you have to do is change one of the arguments in your call to the main function you are using in LogicGrowsOnTrees.Parallel.Main. The downside is that the call to use this functionality is a bit more complex than the call to use a particular adapter precisely because of its generality.

If you want to see examples of using the LogicGrowsOnTrees.Parallel.Main module, check out the example executables in the examples/ subdirectory of the source distribution.

If you are interested in writing a new adapter, then you have couple of options. First, if your adapter can spawn and destroy workers on demand, then you should look at the LogicGrowsOnTrees.Parallel.Common.Workgroup module, as it has infrastructure designed for this case; look at LogicGrowsOnTrees.Parallel.Adapter.Threads for an example of using it. Second, if your adapter does not meet this criterion, then you should look at the LogicGrowsOnTrees.Parallel.Common.Supervisor module; your adapter will need to run within the SupervisorMonad, with its own state contained in its own monad below the SupervisorMonad monad in the stack; for an example, look at the LogicGrowsOnTrees-network module.

NOTE: This package uses the hslogger package for logging; if you set the log level to INFO or DEBUG (either by calling the functions in hslogger yourself or by using the -l command line option if you are using Main) then many status messages will be printed to the screen (or wherever else the log has been configured to be written).

The modules are organized as follows:

LogicGrowsOnTrees: basic infrastructure for building and exploring trees
LogicGrowsOnTrees.Checkpoint: infrastructure for creating and stepping through checkpoints
LogicGrowsOnTrees.Examples.MapColoring: simple examples of computing all possible colorings of a map
LogicGrowsOnTrees.Examples.Queens: simple examples of solving the n-quees problem
LogicGrowsOnTrees.Examples.Queens.Advanced: a very complicated example of solving the n-queens problem using symmetry breaking
LogicGrowsOnTrees.Location: infrastructure for when you want to have knowledge of your current location within a tree
LogicGrowsOnTrees.Parallel.Adapter.Threads: the threads adapter
LogicGrowsOnTrees.Parallel.Common.Message: common infrastructure for exchanging messages between worker and supervisor
LogicGrowsOnTrees.Parallel.Common.Process: common infrastricture for the case where a worker has specific communications channels for sending and recieving messages; it might seem like this should always be the case, but it is not true for threads, as the supervisor has direct access to the worker thread, nor for MPI which has its own idiosyncratic communication model
LogicGrowsOnTrees.Parallel.Common.RequestQueue: infrastructure for sending requests to the SupervisorMonad from another thread
LogicGrowsOnTrees.Parallel.Common.Supervisor: common infrastructure for keeping track of the state of workers and of the system as a whole, including determining when the run is over
LogicGrowsOnTrees.Parallel.Common.Worker: contains the workhorse of the parallel infrastructure: a thread that steps through a given workload while continuously polling for requests
LogicGrowsOnTrees.Parallel.Common.Workgroup: common infrastructure for the case where workers can be added and removed from the system on demand
LogicGrowsOnTrees.Parallel.ExplorationMode: specifies the various modes in which the exploration can be done
LogicGrowsOnTrees.Parallel.Main: a unified interface to the various adapters that automates much of the process such as processing the command, forwarding the needed information to the workers, and performing regular checkpointing if requested via a command line argument
LogicGrowsOnTrees.Parallel.Purity: specifies the purity of the tree being explored
LogicGrowsOnTrees.Path: infrastructure for working with paths trough the search tree
LogicGrowsOnTrees.Utils.Handle: a couple of utility functions for exchanging serializable data over handles
LogicGrowsOnTrees.Utils.IntSum: a monoid that contains an Int to be summed over
LogicGrowsOnTrees.Utils.PerfectTree: provides algorithms for generating various simple trees
LogicGrowsOnTrees.Utils.WordSum: a monoid that contains a Word to be summed over
LogicGrowsOnTrees.Utils.Word_: a newtype wrapper that provides an ArgVal instance for Word
LogicGrowsOnTrees.Workload: infrastructure for working with Workloads

Of the above modules, the ones you will be using most often are LogicGrowsOnTrees (for building trees), one of the adapter modules (such as LogicGrowsOnTrees.Parallel.Adapter.Threads), and possibly LogicGrowsOnTrees.Parallel.Main. If you are counting the number of solutions, then you will also want to look at LogicGrowsOnTrees.Utils.WordSum. Finally, if your program takes a Word as a command line argument or option then you might find the LogicGrowsOnTrees.Utils.Word_ module to be useful. The other modules provide lower-level functionality; in particular the LogicGrowsOnTrees.Parallel.Common.* modules are primarily geared towards people writing their own adapter.

[Skip to Readme]

Modules

[Index]

Flags

Automatic Flags

Name	Description	Default
warnings	Enables most warnings.	Disabled
pattern-warnings	Enables only pattern match warnings.	Disabled
examples	Enable building the examples.	Disabled
tutorial	Enable building the tutorial examples.	Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

LogicGrowsOnTrees-1.1.0.2.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

GregoryCrosswhite

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	1.0.0, 1.0.0.0.1, 1.1, 1.1.0.1, 1.1.0.2
Change log	CHANGELOG.md
Dependencies	AbortT-mtl (>=1.0 && <1.1), AbortT-transformers (>=1.0 && <1.1), base (>4 && <5), bytestring (>=0.9 && <0.11), cereal (>=0.3 && <0.5), cmdtheline (>=0.2 && <0.3), composition (>=0.2 && <1.1), containers (>=0.4 && <0.6), data-ivar (>=0.30 && <0.31), derive (>=2.5.11 && <2.6), directory (>=1.1 && <1.3), hslogger (>=1.2 && <1.3), hslogger-template (>=2.0 && <2.1), lens (>=3.8 && <4.1), LogicGrowsOnTrees, MonadCatchIO-transformers (>=0.3 && <0.4), monoid-statistics (>=0.3 && <0.4), mtl (>=2.1 && <2.2), multiset (>=0.2 && <0.3), old-locale (>=1.0 && <1.1), operational (>=0.2 && <0.3), prefix-units (>=0.1 && <0.2), pretty (>=1.1 && <1.2), PSQueue (>=1.1 && <1.2), sequential-index (>=0.2 && <0.3), split (>=0.2 && <0.3), stm (>=2.3 && <2.5), time (>=1.4 && <1.5), transformers (>=0.2 && <0.4), void (>=0.6 && <0.7), yjtools (>=0.9.7 && <0.10) [details]
License	BSD-3-Clause
Author	Gregory Crosswhite
Maintainer	Gregory Crosswhite <gcrosswhite@gmail.com>
Uploaded	by GregoryCrosswhite at 2014-03-09T04:25:10Z
Category	Control, Distributed Computing, Logic, Parallelism
Bug tracker	https://github.com/gcross/LogicGrowsOnTrees/issues
Source repo	head: git clone git://github.com/gcross/LogicGrowsOnTrees.git this: git clone git://github.com/gcross/LogicGrowsOnTrees.git(tag 1.1.0.2)
Distributions
Reverse Dependencies	4 direct, 0 indirect [details]
Executables	tutorial-13, tutorial-12, tutorial-11, tutorial-10, tutorial-9, tutorial-8, tutorial-7, tutorial-6, tutorial-5, tutorial-4, tutorial-3, tutorial-2, tutorial-1, count-all-trivial-tree-leaves, print-some-nqueens-solutions-using-push, print-some-nqueens-solutions-using-pull, print-an-nqueens-solution, print-all-nqueens-solutions, count-all-nqueens-solutions, readme-full, readme-simple
Downloads	5315 total (9 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs available [build log] Successful builds reported [all 1 reports]

Readme for LogicGrowsOnTrees-1.1.0.2

[back to package description]

What is LogicGrowsOnTrees?

LogicGrowsOnTrees is a library that lets you use a standard Haskell domain specific language (MonadPlus and friends) to write logic programs (by which we mean programs that make non-deterministic choices and have guards to enforce constraints) that you can run in a distributed setting.

Could you say that again in Haskellese?

LogicGrowsOnTrees provides a logic programming monad designed for distributed computing; specifically, it takes a logic program (written using MonadPlus), represents it as a (lazily generated) tree, and then explores the tree in parallel.

What do you mean by "distributed"?

By "distributed" I mean parallelization that does not required shared memory but only some form of communication. In particular there is package that is a sibling to this one that provides an adapter for MPI that gives you immediate access to large numbers of nodes on most supercomputers. In fact, the following is the result of an experiment to see how well the time needed to solve the N-Queens problem scales with the number of workers for N=17, N=18, and N=19 on a local cluster:

Alt text

The above was obtained by running a job, which counts the number of solutions, three times for each number of workers and problem size, and then taking the shortest time of each set of three*; the maximum number of workers for this experiment (256) was limited by the size of the cluster. From the above plot we see that scaling is generally good with the exception of the N=18 case for 128 workers and above, which is not necessarily a big deal since the total running time is under 10 seconds.

* All of the data points for each value of N were usually within a small percentage of one another, save for (oddly) the left-most data point (i.e., the one with the fewest workers) for each problem size, which varied from 150%-200% of the best time; the full data set is available in the scaling/ directory.

When would I want to use this package?

This package is useful when you have a large space that can be defined efficiently using a logic program that you want to explore to satisfy some goal, such as finding all elements, counting the number of elements, finding just one or a few elements, etc.

LogicGrowsOnTrees is particularly useful when your solution space has a lot of structure as it gives you full control over the non-deterministic choices that are made, which lets you entirely avoid making choices that you know will end in failure, as well as letting you factor out symmetries so that only one solution is generated out of some equivalence class. For example, if permutations result in equivalent solutions then you can factor out this symmetry by only choosing later parts of a potential solution that are greater than earlier parts of the solution.

What does a program written using this package look like?

The following is an example of a program (also given in examples/readme-simple.hs) that counts the number of solutions to the n-queens problem for a board size of 10:

NOTE: I have optimized this code to be (hopefully) easy to follow, rather than to be fast.

import Control.Monad
import qualified Data.IntSet as IntSet

import LogicGrowsOnTrees
import LogicGrowsOnTrees.Parallel.Main
import LogicGrowsOnTrees.Parallel.Adapter.Threads
import LogicGrowsOnTrees.Utils.Word_
import LogicGrowsOnTrees.Utils.WordSum

-- Code that counts all the solutions for a given input board size.
nqueensCount 0 = error "board size must be positive"
nqueensCount n =
    -- Start with...
    go n -- ...n queens left...
       0 -- ... at row zero...
       -- ... with all columns available ...
       (IntSet.fromDistinctAscList [0..fromIntegral n-1])
       IntSet.empty -- ... with no occupied negative diagonals...
       IntSet.empty -- ... with no occupied positive diagonals.
  where
    -- We have placed the last queen, so this is a solution!
    go 0 _ _ _ _ = return (WordSum 1)

    -- We are still placing queens.
    go n
       row
       available_columns
       occupied_negative_diagonals
       occupied_positive_diagonals
     = do
        -- Pick one of the available columns.
        column <- allFrom $ IntSet.toList available_columns

        -- See if this spot conflicts with another queen on the negative diagonal.
        let negative_diagonal = row + column
        guard $ IntSet.notMember negative_diagonal occupied_negative_diagonals

        -- See if this spot conflicts with another queen on the positive diagonal.
        let positive_diagonal = row - column
        guard $ IntSet.notMember positive_diagonal occupied_positive_diagonals

        -- This spot is good!  Place a queen here and move on to the next row.
        go (n-1)
           (row+1)
           (IntSet.delete column available_columns)
           (IntSet.insert negative_diagonal occupied_negative_diagonals)
           (IntSet.insert positive_diagonal occupied_positive_diagonals)

main =
    -- Explore the tree generated (implicitly) by nqueensCount in parallel.
    simpleMainForExploreTree
        -- Use threads for parallelism.
        driver

        -- Function that processes the result of the run.
        (\(RunOutcome _ termination_reason) -> do
            case termination_reason of
                Aborted _ -> error "search aborted"
                Completed (WordSum count) -> putStrLn $ "found " ++ show count ++ " solutions"
                Failure _ message -> error $ "error: " ++ message
        )

        -- The logic program that generates the tree to explore.
        (nqueensCount 10)

This program requires that the number of threads be specified via -n # on the command line, where # is the number of threads. You can use -c to have the program create a checkpoint file on a regular basis and -i to set how often the checkpoint is made (defaults to once per minute); if the program starts up and sees the checkpoint file then it automatically resumes from it. To find out more about the available options, use --help which provides an automatically generated help screen.

The above uses threads for parallelism, which means that you have to compile it using the -threaded option. If you want to use processes instead of threads (which could be more efficient as this does not require the additional overhead incurred by the threaded runtime), then install LogicGrowsOnTrees-processes and replace Threads with Processes in the import at the 8th line. If you want workers to run on different machines then install LogicGrowsOnTrees-processes and replace Threads with Network. If you have access to a cluster with a large number of nodes, you will want to install LogicGrowsOnTrees-MPI and replace Threads with MPI.

If you would prefer that the problem size be specified at run-time via a command-line argument rather than hard-coded at compile time, then you can use the more general mechanism illustrated as follows (a complete listing is given in examples/readme-full.hs):

import Control.Applicative
import System.Console.CmdTheLine
...
main =
    -- Explore the tree generated (implicitly) by nqueensCount in parallel.
    mainForExploreTree
        -- Use threads for parallelism.
        driver

        -- Use a single positional required command-line argument to get the board size.
        (getWord
         <$>
         (required
          $
          pos 0
            Nothing
            posInfo
              { posName = "BOARD_SIZE"
              , posDoc = "board size"
              }
         )
        )

        -- Information about the program (for the help screen).
        (defTI { termDoc = "count the number of n-queens solutions for a given board size" })

        -- Function that processes the result of the run.
        (\n (RunOutcome _ termination_reason) -> do
            case termination_reason of
                Aborted _ -> error "search aborted"
                Completed (WordSum count) -> putStrLn $
                    "for a size " ++ show n ++ " board, found " ++ show count ++ " solutions"
                Failure _ message -> error $ "error: " ++ message
        )

        -- The logic program that generates the tree to explore.
        nqueensCount

Where can I learn more?

Read TUTORIAL.md for a tutorial of how to write and run logic programs using this package, USERS_GUIDE.md for a more detailed explanation of how things work, and the haddock documentation available at http://hackage.haskell.org/package/LogicGrowsOnTrees.

What platforms does it support:

The following three packages have been tested on Linux, OSX, and Windows using the latest Haskell Platform (2013.2.0.0):

LogicGrowsOnTrees (+ Threads adapter)
LogicGrowsOnTrees-processors
LogicGrowsOnTrees-network

LogicGrowsOnTrees-MPI has been tested as working on Linux and OSX using OpenMPI, and since it only uses very basic functionality (just sending, probing, and receiving messages) it should work on any MPI implementation.

(I wasn't able to try Microsoft's MPI implementation because it only let me install the 64-bit version (as my test machine was 64-bit) but Haskell on Windows is only 32-bit.)

Why would I use this instead of Cloud Haskell?

This package is higher level than Cloud Haskell in that it takes care of all the work of parallelizing your logic program for you. In fact, if one wished one could potentially write an adapter for LogicGrowsOnTrees that lets one use Cloud Haskell as the communication layer.

Why would I use this instead of MapReduce?

MapReduce and LogicGrowsOnTrees can both be viewed (in a very rough sense) as mapping a function over a large data set and then performing a reduction on it. The primary difference between them is that MapReduce is optimized for the case where you have a huge data set that already exists (which means in particular that optimizing I/O operations is a big deal), whereas LogicGrowsOnTrees is optimized for the case where your data set needs to be generated on the fly using a (possibly quite expensive) operation that involves making many non-deterministic choices some of which lead to dead-ends (that produce no results). Having said that, LogicGrowsOnTrees can also be used like MapReduce by having your function generate data by reading it from files or possibly from a database.

Why would I use this instead of a SAT/SMT/CLP/etc. solver?

First, it should be mentioned that one could use LogicGrowsOnTrees to implement these solvers. That is, a solver could be written that uses the mplus function whenever it needs to make a non-deterministic choices (e.g. when guessing whether a boolean variable should be true or false) and mzero to indicate failure (e.g., when it has become clear that a particular set of choices cannot result in a valid solution), and then the solver gets to use the parallelization framework of this package for free! (For an example of such a solver, see the incremental-sat-solver package (which was not written by me).)

Having said that, if your problem can most easily and efficiently be expressed as an input to a specialized solver, then this package might not be as useful to you. However, even in this case you might still want to consider using this package if there are constraints that you cannot express easily or efficiently using one of the specialized solvers because this package gives you complete control over how choices are made which means that you can, for example, enforce a constraint by only making choices that are guaranteed to satisfy it, rather than generating choices that may or may not satisfy it and then having to perform an additional step to filter out all the ones that don't satisfy the constraint.

What is the overhead of using LogicGrowsOnTrees?

It costs approximately up to twice as much time to use LogicGrowsOnTrees with a single worker thread as it does to use the List monad. Fortunately, it is possible to eliminate most of this if you can switch to using the List monad near the bottom of the tree. For example, my optimized n-queens solver switches to a loop in C when fewer than eleven queens remain to be placed. This is not ``cheating'' for two reasons: first, because the hard part is the symmetry-breaking code, which would have been difficult to implement and test in C due to its complexity, and second, because one can't rewrite all the code in C because then one would lose access to the automatic checkpointing and parallelization features.

Why Haskell?

Haskell has many strengths that made it ideal for this project:

Laziness

Haskell has lazy* evaluation which means that it does not evaluate anything until the value is required to make progress; this capability means that ordinary functions can act as control structures. In particular, when you use mplus a b to signal a non-deterministic choice, neither a nor b will be evaluated unless one chooses to explore respectively the left and/or right branch of the corresponding decision tree. This is very powerful because it allows us to explore the decision tree of a logic program as much or as little as we want and only have to pay for the parts that we choose to explore.

* Technically Haskell is "non-strict" rather than "lazy", which means there might be times in practice when it evaluates something more than is strictly needed.
Purity

Haskell is a pure language, which means that functions have no (observable) side-effects other than returning a value*; in particular, this implies that all operations on data must be immutable, which means that they result in a new value (that may reference parts or even all of the old value) rather than modifying the old value. This is an incredible boon because it means that when we backtrack up to explore another branch of the decision tree we do not have to perform an undo operation to restore the old values from the new values because the old values were never lost! All you have to do is "forget" about the new values and you are done. Furthermore, most data structures in Haskell are designed to have efficient immutable operations which try to re-use as much of an old value as possible in order to minimize the amount of copying needed to construct the new value.

(Having said all of this, although it is strongly recommended that your logic program be pure by making it have type Tree, as this will cause the type system to enforce purity, you can add various kinds of side-effects by using type TreeT instead; a time when it might make sense to do this is if there is a data set that will be constant over the run which is large enough that you want to read it in from various files or a database as you need it. In general if you use side-effects then they need to be non-observable, which means that they are not affected by the order in which the tree is explored or whether particular parts of the tree are explored more than once.)

* Side-effects are implemented by, roughly speaking, having some types represent actions that cause side-effects when executed.
Powerful static type system

When writing a very complicated program you want as much help as possible in making it correct, and Haskell's powerful type system helps you a lot here by harnessing the power of static analysis to ensure that all of the parts fit together correctly and to enforce invariants that you have encoded in the type system.

I have more questions!

Then please contact the author (Gregory Crosswhite) at gcrosswhite@gmail.com! :-)