# streamly-archive
[![Hackage](https://img.shields.io/hackage/v/streamly-archive.svg?style=flat)](https://hackage.haskell.org/package/streamly-archive)
![CI](https://github.com/shlok/streamly-archive/workflows/CI/badge.svg?branch=master)
Stream data from archives (tar, tar.gz, zip, or any other format [supported by libarchive](https://github.com/libarchive/libarchive/wiki/LibarchiveFormats)) using the Haskell [streamly](https://hackage.haskell.org/package/streamly) library.
## Requirements
Install libarchive on your system.
* Debian Linux: `sudo apt-get install libarchive-dev`.
* macOS: `brew install libarchive`.
## Quick start
```haskell
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
module Main where
import Crypto.Hash
import Data.ByteString (ByteString)
import Data.Function
import Data.Functor
import Data.Maybe
import Streamly.Data.Fold (Fold)
import qualified Streamly.Data.Fold as F
import qualified Streamly.Data.Stream.Prelude as S
import Streamly.External.Archive
main :: IO ()
main = do
-- A fold for converting each archive entry (which is a Header followed by
-- zero or more ByteStrings) into a path and corresponding SHA-256 hash
-- (Nothing for no data).
let entryFold :: Fold IO (Either Header ByteString) (String, Maybe String) =
F.foldlM'
( \(mpath, mctx) e ->
case e of
Left h -> do
mpath' <- headerPathName h
return (mpath', mctx)
Right bs ->
return
( mpath,
Just . (`hashUpdate` bs) $
fromMaybe (hashInit @SHA256) mctx
)
)
(return (Nothing, Nothing))
<&> ( \(mpath, mctx) ->
( show $ fromMaybe (error "path expected") mpath,
show . hashFinalize <$> mctx
)
)
-- Execute the stream, grouping at the headers (the Lefts) using the above
-- fold, and output the paths and SHA-256 hashes along the way.
S.unfold readArchive (id, "/path/to/archive.tar.gz")
& groupByLeft entryFold
& S.mapM print
& S.fold F.drain
```
## Benchmarks
See `./bench/README.md`. Summary (with rough figures from our machine†):
* For 1-byte files, this library has roughly a 70 ns/byte overhead compared to plain Haskell `IO` code, which has roughly a 895 ns/byte overhead compared to plain C.
* For larger (> 10 KB) files, this library performs just as good as plain Haskell `IO` code, which has roughly a 0.15 ns/byte overhead compared to plain C.
† July 2024; NixOS 22.11; Intel i7-12700K (3.6 GHz, 12 cores); Corsair VENGEANCE LPX DDR4 RAM 64GB (2 x 32GB) 3200MHz; Samsung 970 EVO Plus SSD 2TB (M.2 NVMe).