streamly-archive: Stream data from archives using the streamly library.

[ archive, bsd3, codec, library, streaming, streamly ] [ Propose Tags ] [ Report a vulnerability ]

Please see the README on GitHub at https://github.com/shlok/streamly-archive#readme

[Skip to Readme]

Modules

[Index] [Quick Jump]

Streamly
- External
  - Streamly.External.Archive
    - Internal
      - Streamly.External.Archive.Internal.Foreign

Downloads

streamly-archive-0.3.0.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

shlok

For package maintainers and hackage trustees

edit package information

Candidates

0.0.1, 0.0.2, 0.1.0, 0.2.0, 0.3.0

Versions [RSS]	0.0.1, 0.0.2, 0.1.0, 0.2.0, 0.3.0
Change log	ChangeLog.md
Dependencies	base (>=4.7 && <5), bytestring (>=0.10.10.0 && <0.12), containers (>=0.6.2.1 && <0.8), streamly (>=0.10.0 && <0.11), streamly-core (>=0.2.0 && <0.3) [details]
License	BSD-3-Clause
Copyright	2024 Shlok Datye
Author	Shlok Datye
Maintainer	sd-haskell@quant.is
Category	Archive, Codec, Streaming, Streamly
Home page	https://github.com/shlok/streamly-archive#readme
Bug tracker	https://github.com/shlok/streamly-archive/issues
Source repo	head: git clone https://github.com/shlok/streamly-archive
Uploaded	by shlok at 2024-10-07T03:48:06Z
Distributions
Downloads	609 total (17 in the last 30 days)
Rating	2.0 (votes: 1) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs uploaded by user Build status unknown [no reports yet]

Readme for streamly-archive-0.3.0

[back to package description]

streamly-archive

Stream data from archives (tar, tar.gz, zip, or any other format supported by libarchive) using the Haskell streamly library.

Requirements

Install libarchive on your system.

Debian Linux: sudo apt-get install libarchive-dev.
macOS: brew install libarchive.

Quick start

{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}

module Main where

import Crypto.Hash
import Data.ByteString (ByteString)
import Data.Function
import Data.Functor
import Data.Maybe
import Streamly.Data.Fold (Fold)
import qualified Streamly.Data.Fold as F
import qualified Streamly.Data.Stream.Prelude as S
import Streamly.External.Archive

main :: IO ()
main = do
  -- A fold for converting each archive entry (which is a Header followed by
  -- zero or more ByteStrings) into a path and corresponding SHA-256 hash
  -- (Nothing for no data).
  let entryFold :: Fold IO (Either Header ByteString) (String, Maybe String) =
        F.foldlM'
          ( \(mpath, mctx) e ->
              case e of
                Left h -> do
                  mpath' <- headerPathName h
                  return (mpath', mctx)
                Right bs ->
                  return
                    ( mpath,
                      Just . (`hashUpdate` bs) $
                        fromMaybe (hashInit @SHA256) mctx
                    )
          )
          (return (Nothing, Nothing))
          <&> ( \(mpath, mctx) ->
                  ( show $ fromMaybe (error "path expected") mpath,
                    show . hashFinalize <$> mctx
                  )
              )

  -- Execute the stream, grouping at the headers (the Lefts) using the above
  -- fold, and output the paths and SHA-256 hashes along the way.
  S.unfold readArchive (id, "/path/to/archive.tar.gz")
    & groupByLeft entryFold
    & S.mapM print
    & S.fold F.drain

Benchmarks

See ./bench/README.md. Summary (with rough figures from our machine^†):

For 1-byte files, this library has roughly a 70 ns/byte overhead compared to plain Haskell IO code, which has roughly a 895 ns/byte overhead compared to plain C.
For larger (> 10 KB) files, this library performs just as good as plain Haskell IO code, which has roughly a 0.15 ns/byte overhead compared to plain C.

^† July 2024; NixOS 22.11; Intel i7-12700K (3.6 GHz, 12 cores); Corsair VENGEANCE LPX DDR4 RAM 64GB (2 x 32GB) 3200MHz; Samsung 970 EVO Plus SSD 2TB (M.2 NVMe).