streamly-archive: Stream data from archives using the streamly library.

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

Please see the README on GitHub at https://github.com/shlok/streamly-archive#readme


[Skip to Readme]

Properties

Versions 0.0.1, 0.0.2, 0.1.0, 0.2.0, 0.3.0, 0.3.0
Change log ChangeLog.md
Dependencies base (>=4.7 && <5), bytestring (>=0.10.10.0 && <0.12), containers (>=0.6.2.1 && <0.8), streamly (>=0.10.0 && <0.11), streamly-core (>=0.2.0 && <0.3) [details]
License BSD-3-Clause
Copyright 2024 Shlok Datye
Author Shlok Datye
Maintainer sd-haskell@quant.is
Category Archive, Codec, Streaming, Streamly
Home page https://github.com/shlok/streamly-archive#readme
Bug tracker https://github.com/shlok/streamly-archive/issues
Source repo head: git clone https://github.com/shlok/streamly-archive
Uploaded by shlok at 2024-10-07T03:43:59Z

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


Readme for streamly-archive-0.3.0

[back to package description]

streamly-archive

Hackage CI

Stream data from archives (tar, tar.gz, zip, or any other format supported by libarchive) using the Haskell streamly library.

Requirements

Install libarchive on your system.

Quick start

{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}

module Main where

import Crypto.Hash
import Data.ByteString (ByteString)
import Data.Function
import Data.Functor
import Data.Maybe
import Streamly.Data.Fold (Fold)
import qualified Streamly.Data.Fold as F
import qualified Streamly.Data.Stream.Prelude as S
import Streamly.External.Archive

main :: IO ()
main = do
  -- A fold for converting each archive entry (which is a Header followed by
  -- zero or more ByteStrings) into a path and corresponding SHA-256 hash
  -- (Nothing for no data).
  let entryFold :: Fold IO (Either Header ByteString) (String, Maybe String) =
        F.foldlM'
          ( \(mpath, mctx) e ->
              case e of
                Left h -> do
                  mpath' <- headerPathName h
                  return (mpath', mctx)
                Right bs ->
                  return
                    ( mpath,
                      Just . (`hashUpdate` bs) $
                        fromMaybe (hashInit @SHA256) mctx
                    )
          )
          (return (Nothing, Nothing))
          <&> ( \(mpath, mctx) ->
                  ( show $ fromMaybe (error "path expected") mpath,
                    show . hashFinalize <$> mctx
                  )
              )

  -- Execute the stream, grouping at the headers (the Lefts) using the above
  -- fold, and output the paths and SHA-256 hashes along the way.
  S.unfold readArchive (id, "/path/to/archive.tar.gz")
    & groupByLeft entryFold
    & S.mapM print
    & S.fold F.drain

Benchmarks

See ./bench/README.md. Summary (with rough figures from our machine):

July 2024; NixOS 22.11; Intel i7-12700K (3.6 GHz, 12 cores); Corsair VENGEANCE LPX DDR4 RAM 64GB (2 x 32GB) 3200MHz; Samsung 970 EVO Plus SSD 2TB (M.2 NVMe).