Sirkel is a DHT based on Chord
To use, compile Main.hs with ghc -threaded --make Main.hs
afterwards run as many Sirkel instances you want on the network you have.
It only supports LAN or boxes directly connected to the internet yet.
to put data into the DHT write "put " preceded by the data and press enter.
example:
put abc123
The data will the be saved in the DHT.
All nodes can now "get" the data using the SHA-1 hash of the data.
When you write the put
the output will be a list of keys. These keys are the SHA-1
hashes of the blocks that your data was chunked intoo before it was put.
to get some data back, just write get [key1, key2, key3]
where key1 ... key3
are the keys that where output from the put
.
Note that to get your data correctly the order of the keys in the get
matters. Each key represents a block of data. First each of them are retrieved from
the DHT, then they are concatenated together to form the final data. If you mess up the order,
the concatenation will be messed up and therefore also the deserialization.
In Main.hs there is a initState. The fields in this state determines how your DHT works.
initState = NodeState {
self = undefined
, fingerTable = Map.empty -- ^ The fingerTable
, blockDir = "/tmp/"
, predecessor = undefined
, timeout = 10 -- ^ The timout latency of ping
, m = 160 -- ^ The number of bits in a key, ususaly 160
, r = 5 -- ^ the number of successors
, b = 2 -- ^ the nuber of replicas
, blockSize = 10 -- ^ the number of bytes a block is
}
self
is a reference to ourselves and should be left undefined, it is populated when you initialize the DHT.
fingerTable
is our "address book". It tells us whom to ask what, and also who most likely to know what.
blockDir
is currently not used since blocks are stored in RAM
predecessor
is a reference to our predecessor in the Chord ring. It should be left empty for the same reason as self.
timeout
is not used yet but will be used to determine how long to way for a reply from the DHT.
m
is the number of bits that makes up the keyspace of the Chord ring. Note that it can not be changed without changing the hashing algorithm which is not supported yet.
r
is one of the most important parameters. It determines how many immediate successors you keep. This has consequences that we'll look on soon.
b
is the number of replicas of each block that should be kept, the node responsible for a block will make shure that its b
successors has a copy of it's blocks so that in the event of node failure they can take over the role as owners of the block.
blockSize
is the numbers of bytes a block should have. When you try to put data, it will first be chunked into blocks of this size, then each block will be hashed and put separately.
Now, more on the r
number. Each node keeps r
successors.
That means that if we are asked for the successor of a key that preceeds
our r
th successor we know that our r
th successor is the successor of
that key. We do not know anything about r
s successors though, because
we only keep a list of the first r
of us and the r
th successor is the
last in that list. Therefore, in all calls that "get" something, findSuccessors
,
getBlock
and getObject
, there is a howMany
argument. This specifies how
many successors we need back. At most we can get r
successors back since no node
keeps a list of more than that. But say if we only need the first successor of a key.
Then we call findSuccessors
with howMany = 1
which lets not just the immediate predecessor
of the key answer our call, but all the predecessors that have at-least one successor to that key.
This boils down to that the r
nodes before the key can answer. Therefore the argument howMany
lets you specify how "exact" your query is. If you choose howMany = r
then only one node in the
ring can answer that query. For howMany <= r
, r + 1 - howMany
nodes can answer your query.
This means that if you don't need that detailed information about the successors of a key, maybe
just the first one, the reliability and speed of your query increases.
This is especially important for small networks. If the r
parameter is larger than the number
of Sirkel nodes then all nodes know all others and there will be no network traffic to resolve
a query. This off-course comes to a price. You have to store the contact information about r
nodes so for a million nodes network r
must be kept at a reasonable level.
For questions or anything else: