Introduction
------------

This command supports a generalised version of
sha1sum/sha256sum/sha512sum programs that are available on a standard
linux system. It supports generating checksum files and verifying them
for all the hashes exposed by the raaz library. The purpose of writing
this application is the following.

1. To give an example of of a non-trivial program written to use the
   raaz library.

2. To make sure that the implementations of hashes in this library are
   not too off in terms of performance.

The command line options of this command is similar to that of sha1sum
and hence can be used as a replacement.

This file is a literate haskell file and hence can be compiled
directly. The text is in markdown and hence you should be able to
produce the documentation for

We start by enabling some pragmas and importing some stuff which can
be ignored.

> {-# LANGUAGE GADTs           #-}
> {-# LANGUAGE RankNTypes      #-}
> {-# LANGUAGE RecordWildCards #-}
> {-# LANGUAGE ConstraintKinds #-}

> module Command.Checksum ( checksum ) where
>
> import Control.Applicative
> import Control.Monad
> import Data.List             (intercalate)
> import Data.Monoid
> import Data.String
> import Data.Version          (showVersion)
> import System.Environment
> import System.Exit
> import System.IO             (stdin, stderr, hPutStrLn)
> import System.Console.GetOpt

> import Raaz     hiding (Result)
> import Raaz.Hash.Sha1


Verification Tokens
-------------------

Programs like sha1sum is typically used to verify that the contents of
a set of files have not been modified or corrupted. This program does
the following:

1. In compute mode it computes a set of verification tokens which
   uniquely identify the contents of the file.

2. In verification mode it takes a set of tokens are verify them.

Verification tokens are computed using the cryptographic hash. We
allow the use of any of the hashes exposed by the raaz library. Thus
for us, any hash that satisfies the constraint `TokenHash` should be
usable in computing and verifying tokens.


> type TokenHash h = (Hash h, Recommendation h, Show h, IsString h)
>

The verification token is defined below. To make it opaque, we
existentially quantify over the underlying digest.

>
> data Token  = forall h . TokenHash h
>             => Token { tokenFile    :: FilePath
>                      , tokenDigest  :: h
>                      }
>


A token can be verified easily. First we define the result type

> type Result = Either FilePath FilePath
>
> verify :: Token -> IO Result
> verify (Token{..}) = do c <- (==tokenDigest) <$> hashFile tokenFile
>                         return $ if c then Right tokenFile else Left tokenFile


Computing tokens.
-----------------

To compute the verification token, we need a way to specify the
algorithm.  The following proxy helps us in this.

> data Algorithm h   = Algorithm

Here `h` varies over all the hashes supported by the library. We now
need an easy way to tabulate all the hash algorithm that we
support. Existential types comes to the rescue once more.

> data SomeAlgorithm = forall h . TokenHash h => SomeAlgorithm (Algorithm h)

Here is the table of algorithms that we support currently.

> algorithms :: [(String, SomeAlgorithm)]
> algorithms =  [ ("broken-sha1"  , SomeAlgorithm (Algorithm :: Algorithm SHA1)   )
>               , ("sha256", SomeAlgorithm (Algorithm :: Algorithm SHA256) )
>               , ("sha512", SomeAlgorithm (Algorithm :: Algorithm SHA512) )
>               -- Add new algorithms here.
>               ]

We now define the computation function. There are two variants, one for arbitrary files
and the other for standard input.

> -- | Compute the token using a given algorithm.
> token :: TokenHash h
>       => Algorithm h  -- ^ The hashing algorithm to use.
>       -> FilePath     -- ^ The file to compute the token for.
>       -> IO Token
> token algo fp = Token fp <$> hashIt algo
>   where hashIt :: TokenHash h => Algorithm h -> IO h
>         hashIt _ = hashFile fp
>
> tokenStdin :: TokenHash h => Algorithm h -> IO Token
> tokenStdin algo = Token "-" <$> hashIt algo
>   where hashIt :: TokenHash h => Algorithm h -> IO h
>         hashIt _ = hashSource stdin
>


Printed form of tokens
----------------------

To inter-operate with programs like sha1sum, we follow the same
printed notation. The appropriate show instances for token is the
following. The format is `line := digest space mode filename`. The mode
has something to do with whether it is binary or text (we always put
a space for it).

> instance Show Token where
>   show (Token{..}) = show tokenDigest ++ "  " ++ tokenFile

We also define the associated parsing function which has to take the
the underlying algorithm as a parameter.

> parse :: TokenHash h => Algorithm h -> String -> Token
> parse algo str = Token { tokenFile   = drop 2 rest
>                        , tokenDigest = parseDigest algo digest
>                        }
>   where parseDigest    :: TokenHash h => Algorithm h -> String -> h
>         parseDigest _  = fromString
>         (digest, rest) = break (==' ') str -- break at the space.


The main function.
------------------

The overall structure of the code is clear the details follow.

> checksum :: [String] -> IO ()
> checksum =  parseOpts >=> handleArgs

> handleArgs :: (Options, [FilePath])
>            -> IO ()
> handleArgs (opts@Options{..}, files) = do
>   when optHelp printHelp       -- When the help option is given print it and exit
>   flip (either badAlgorithm) optAlgo $ \ algo -> do
>     if optCheck  -- if asked to check.
>       then verifyMode opts  algo files >>= optPrintCount
>       else computeMode      algo files


> badAlgorithm :: String -> IO ()
> badAlgorithm name = errorBailout ["Bad hash algorithm " ++ name]


The compute mode.
-----------------

There are two important modes of operation for this program, _the
compute mode_ and the _verify mode_. In the compute mode, we are given
an a set of files and we need to print out the verification tokes for
those files.

> computeMode :: SomeAlgorithm  -- The algorithm to use
>             -> [FilePath]     -- files for which tokes need to be
>                               -- computed.
>             -> IO ()
> computeMode (SomeAlgorithm algo) files
>   | null files = tokenStdin algo >>= print  -- No files means compute it for stdin.
>   | otherwise  = mapM_ printToken files     -- Print the token for each file.
>   where printToken = token algo >=> print


The verification mode of the algorithm is a bit more complicated than
the compute mode. Given a list of tokens let us first read
them. Recall the tokens are listed, one per line with the digest
followed by a space followed by the filename.


> verifyMode :: Options
>            -> SomeAlgorithm
>            -> [FilePath]
>            -> IO Int
> verifyMode (Options{..}) algo files = verifyFiles algo files >>= foldM fldr (0 :: Int)
>   where fldr n = either whenFailed whenOkey
>           where whenOkey    :: FilePath -> IO Int
>                 whenOkey    = optOkey   >=> const (return n)     -- when okey do the okey action and keep the count
>                 whenFailed  = optFailed >=> const (return (n+1)) -- when failed do the failed action and increment

This function verify the token list given in a list of files. Each
file contains a list of tokens and each of these tokens have to be
verified.

> verifyFiles :: SomeAlgorithm
>             -> [FilePath]
>             -> IO [Result]
>
> verifyFiles (SomeAlgorithm algo) files
>   | null files = getContents >>= verifyTokenList
>   | otherwise  = concat <$> mapM verifyFile files
>   where
>     verifyFile      = readFile >=> verifyTokenList
>     verifyTokenList = mapM mapper . lines
>     mapper          = verify . parse algo


This function prints the help for the program.

> printHelp :: IO ()
> printHelp = do putStrLn $ usage []
>                exitSuccess


Command line parsing
--------------------

The options supported by the program is given by the following data
type. Fields should be self explanatory.

> data Options =
>   Options { optHelp    :: Bool
>           , optCheck   :: Bool
>           , optAlgo    :: Either String SomeAlgorithm
>           , optOkey    :: FilePath -> IO () -- ^ handle successful tokens
>           , optFailed  :: FilePath -> IO () -- ^ handle failed tokens.
>           , optPrintCount   :: Int -> IO () -- ^ print failure counts.
>           }


The default options for the command is as follows.

> defaultOpts =
>   Options { optHelp       = False
>           , optCheck      = False
>           , optAlgo       = Right sha512Algorithm
>           , optOkey       = \ fp -> putStrLn (fp ++ ": OK")
>           , optFailed     = \ fp -> putStrLn (fp ++ ": FAILED")
>           , optPrintCount = printCount
>           }
>   where sha512Algorithm = SomeAlgorithm (Algorithm :: Algorithm SHA512)
>         printCount n  = when (n > 0) $ do
>           putStrLn $ show n ++ " failures."
>           exitFailure
>

We use the getOpts library to parse the command lines.  The options
are summarised in the following list. The `Endo` monoid helps in
summarising the changes to the option set.

> options :: [OptDescr (Endo Options)]
> options =
>   [ Option ['h'] ["help"]    (NoArg setHelp)    "print the help"
>   , Option ['c'] ["check"]   (NoArg setCheck)   "check instead of compute"
>   , Option ['q'] ["quiet"]   (NoArg setQuiet)   "print failure only"
>   , Option ['s'] ["status"]  (NoArg setStatusOnly)
>     "no output only return status"
>   , Option ['a'] ["algo"]    (ReqArg setAlgo "HASH")
>     $ "hash algorithm to use " ++ "[" ++ algOpts ++ "]. Default sha512"
>   ]
>   where setHelp      = Endo $ \ opt -> opt { optHelp    = True }
>         setCheck     = Endo $ \ opt -> opt { optCheck   = True }
>         setAlgo  str = Endo $ \ opt -> opt { optAlgo    = a    }
>                  where a = maybe (Left str) Right $ lookup str algorithms
>         algOpts          = intercalate "|" $ map fst algorithms
>         setQuiet      = Endo $ \ opt ->  opt { optOkey   = noPrint }
>         setStatusOnly = Endo $ \ opt ->  opt { optFailed      = noPrint
>                                              , optOkey        = noPrint
>                                              , optPrintCount  = returnStatus
>                                              }
>         noPrint           = const $ return ()
>         returnStatus n
>           | n > 0         = exitFailure
>           | otherwise     = exitSuccess
>


The usage message for the program.

> usage :: [String] -> String
> usage errs
>       | null errs = usageInfo header options
>       | otherwise = "raaz checksum: " ++ unlines errs ++ usageInfo header options
>   where header ="Usage: raaz checksum [OPTIONS] FILE1 FILE2 ..."


Parsing the options.

> parseOpts :: [String] -> IO (Options, [FilePath])
> parseOpts args = case getOpt Permute options args of
>                    (o,n,[])   -> return (appEndo (mconcat o) defaultOpts, n)
>                    (_,_,errs) -> errorBailout errs

Bail out with an error message.

> errorBailout :: [String]-> IO a
> errorBailout errs = do
>   hPutStrLn stderr $ usage errs
>   exitFailure