streamly-archive: Stream data from archives using the streamly library.

[ archive, bsd3, codec, library, streaming, streamly ] [ Propose Tags ]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0.0.1, 0.0.2, 0.1.0, 0.2.0, 0.3.0
Change log ChangeLog.md
Dependencies base (>=4.7 && <5), bytestring (>=0.10.10.0 && <0.12), containers (>=0.6.2.1 && <0.8), streamly (>=0.10.0 && <0.11), streamly-core (>=0.2.0 && <0.3) [details]
License BSD-3-Clause
Copyright 2024 Shlok Datye
Author Shlok Datye
Maintainer sd-haskell@quant.is
Category Archive, Codec, Streaming, Streamly
Home page https://github.com/shlok/streamly-archive#readme
Bug tracker https://github.com/shlok/streamly-archive/issues
Source repo head: git clone https://github.com/shlok/streamly-archive
Uploaded by shlok at 2024-10-07T03:48:06Z
Distributions
Downloads 582 total (15 in the last 30 days)
Rating 2.0 (votes: 1) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for streamly-archive-0.3.0

[back to package description]

streamly-archive

Hackage CI

Stream data from archives (tar, tar.gz, zip, or any other format supported by libarchive) using the Haskell streamly library.

Requirements

Install libarchive on your system.

  • Debian Linux: sudo apt-get install libarchive-dev.
  • macOS: brew install libarchive.

Quick start

{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}

module Main where

import Crypto.Hash
import Data.ByteString (ByteString)
import Data.Function
import Data.Functor
import Data.Maybe
import Streamly.Data.Fold (Fold)
import qualified Streamly.Data.Fold as F
import qualified Streamly.Data.Stream.Prelude as S
import Streamly.External.Archive

main :: IO ()
main = do
  -- A fold for converting each archive entry (which is a Header followed by
  -- zero or more ByteStrings) into a path and corresponding SHA-256 hash
  -- (Nothing for no data).
  let entryFold :: Fold IO (Either Header ByteString) (String, Maybe String) =
        F.foldlM'
          ( \(mpath, mctx) e ->
              case e of
                Left h -> do
                  mpath' <- headerPathName h
                  return (mpath', mctx)
                Right bs ->
                  return
                    ( mpath,
                      Just . (`hashUpdate` bs) $
                        fromMaybe (hashInit @SHA256) mctx
                    )
          )
          (return (Nothing, Nothing))
          <&> ( \(mpath, mctx) ->
                  ( show $ fromMaybe (error "path expected") mpath,
                    show . hashFinalize <$> mctx
                  )
              )

  -- Execute the stream, grouping at the headers (the Lefts) using the above
  -- fold, and output the paths and SHA-256 hashes along the way.
  S.unfold readArchive (id, "/path/to/archive.tar.gz")
    & groupByLeft entryFold
    & S.mapM print
    & S.fold F.drain

Benchmarks

See ./bench/README.md. Summary (with rough figures from our machine):

  • For 1-byte files, this library has roughly a 70 ns/byte overhead compared to plain Haskell IO code, which has roughly a 895 ns/byte overhead compared to plain C.
  • For larger (> 10 KB) files, this library performs just as good as plain Haskell IO code, which has roughly a 0.15 ns/byte overhead compared to plain C.

July 2024; NixOS 22.11; Intel i7-12700K (3.6 GHz, 12 cores); Corsair VENGEANCE LPX DDR4 RAM 64GB (2 x 32GB) 3200MHz; Samsung 970 EVO Plus SSD 2TB (M.2 NVMe).