This is the Holumbus-MapReduce Framework
Version 0.1.0
Stefan Schmidt sts@holumbus.org
http://holumbus.fh-wedel.de
About
-----
Holumbus is a set of Haskell libraries. This package contains the
Holumbus-MapReduce library for building and running distributed MapReduce
systems.
This library depends on the Holumbus-Distributed and Holumbus-Storage libraries.
If you want to run some of the examples, e.g. the distributed Web-Crawler
and Indexer, the the Holumbus-Searchengine library must also be installed.
Contents
--------
Examples Some example applications and utilities.
Programs The applications you need to run a distributed MapReduce system.
source Source code of the Holumbus-MapReduce library.
Requirements
------------
So far, this library is only tested under Linux, please let us know, if there are
any problems under Windows or other OSes.
The Holumbus-MapReduce library requires at least GHC 6.10 and the
following packages (available via Hackage).
containers
hslogger
directory
network
time
bytestring
binary
hxt
Holumbus-Distribution
Installation
------------
A Cabal file is provided, therefore Holumbus-MapReduce can be installed using
the standard Cabal way:
$ runhaskell Setup.hs configure
$ runhaskell Setup.hs build
$ runhaskell Setup.hs install # with root privileges
If you prefer to do it the old way there with make:
$ make build
$ make install # with root privileges
Steps to make the system running
--------------------------------
1. Compile and install the Holumbus-Distribution library.
2. Compile and install the Holumbus-MapReduce framework.
If this is done with cabal, step 3. can be skipped, the
master will be build and installed with cabal
$ make build
$ make install # this with root privileges
3. Compile the Master-Program.
It is located in Programs/
$ make programs
4. Start the PortRegistry from the Holumbus-Distribution library.
5. Start the Master.
If you want to start the master on another machine than the PortRegistry,
then you have to copy the file "registry.xml" from the tmp directory to the
the tmp directory on the machine the master should run on.
$ cd Programs/Master
$ ./Master
Alternatively start the Master program, installed with cabal
Now, the system itself is running, but as you might wonder, you only have
started the registry for communication and the master for the MapReduce system.
To run your own system, you have to create a MapReduce Client and a MapReduce
Worker. Here, we'll show you to run the examples, they'll show you how to
create you own system.
6. Compile the Examples.
For some examples (e.g. the Crawler), this requires that you have installed
the Holumbus-Searchengine framework. You can get it from the Holumbus
homepage (http://holumbus.fh-wedel.de), please read its documentation for further
instructions and details. You'll only need the Searchegine library installed
for the MapReduce examples, not the library itself.
$ make examples # some examples require the Searchengine lib
7. Start the Workers.
At this time every worker has to run in its own directory, so create for
each worker, you want to run, its own subdirectory and copy the executeable
to it. If you want to run the workers on different machines, you have to
copy the file "registry.xml" from your tmp directory to the tmp directory
of the machine the worker will work on.
8. Start the Client.
Now you can start the client and wait the things to come. If you want to run
the client on another machine than the PortRegistry, you have to copy the
registry.xml file from the tmp directory, too.