conduit-vfs: Virtual file system for Conduit; disk, pure, and in-memory impls.

[ bsd3, conduit, filesystem, io, library ] [ Propose Tags ]

The goal of this package is to provide a common library and the core implementations for things that can be made to look like filesystems. In this package, a "filesystem" is tree of nodes, where the leaf nodes are files and the rest are directories. A "directory" is defined as a node that contains other nodes, and those other nodes are each keyed by a name. A "file" is defined as a collection of (possibly empty) bytes. This package includes the core types for a Virtual File System (VFS) abstraction for conduit, along with three implementations. The implementations are "disk" (write to the underlying filesystem), "in-memory" (store files in an MVar), and "pure" (pass state in a State monad). Because of the nature of conduits, this library defaults to lazy implementations of various data structures, including lazy ByteStrings and lazy HashMaps. Any overhead in space should be more than warranted by the savings through just-in-time computations. For more information, see the README on GitHub at https://github.com/RobertFischer/vfs-conduit#README.md


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0, 0.1.0.1, 0.1.0.2, 0.1.0.3
Change log ChangeLog.md
Dependencies base (>=4.7 && <5), bytestring (>=0.10.8.2 && <0.11), classy-prelude (>=1.5.0 && <1.6), conduit (>=1.3.1.1 && <1.4), conduit-extra (>=1.3.1.1 && <1.4), directory (>=1.3.3.0 && <1.4), exceptions (>=0.10.2 && <0.11), extra (>=1.6.16 && <1.7), filepath (>=1.4.2.1 && <1.5), monad-loops (>=0.4.3 && <0.5), mono-traversable (>=1.0.11.0 && <1.1), mtl (>=2.2.2 && <2.3), resourcet (>=1.2.2 && <1.3), text (>=1.2.3.1 && <1.3), transformers (>=0.5.6.2 && <0.6), unix (>=2.7.2.2 && <2.8), unliftio (>=0.2.10 && <0.3), unordered-containers (>=0.2.10.0 && <0.3) [details]
License BSD-3-Clause
Copyright (c)2019 Robert Fischer. All Rights Reserved. See LICENSE for liscensing terms.
Author Robert Fischer
Maintainer smokejumperit@gmail.com
Category Conduit, IO, Filesystem
Home page https://github.com/RobertFischer/vfs-conduit#readme
Bug tracker https://github.com/RobertFischer/vfs-conduit/issues
Source repo head: git clone https://github.com/RobertFischer/vfs-conduit
Uploaded by RobertFischer at 2019-06-12T18:11:08Z
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Downloads 1668 total (12 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2019-06-12 [all 1 reports]

Readme for conduit-vfs-0.1.0.3

[back to package description]

Virtual File System (VFS) Conduit for Haskell

Motivation

I was writing a Haskell program to process a lot of JSON-formatted log records: far too many to hold in memory. This is a perfect situation for conduits, so I built out my pipeline. However, I encountered a bit of a hitch: many of the files were stored in .tar.gz formats, so what I wanted was to recursively navigate into the .tar.gz file and feed its embedded files into my pipeline. What would be even cooler is if I could slurp in S3 files, or files from Dropbox, or any of that, and then process them all the same way.

Concepts

This library is an abstraction of a filesystem that operates via conduits. A filesystem is defined as a particular kind of key-value store. The keys are hierarchical strings, with different levels of the hierarchy being seperated by </>. These levels are referred to as directories. Directories may be listed and navigated, but are neither directly created nor directly destroyed. The leaf values are lazy ByteString values. These values are referred to as files. Files can be listed, removed, and written. A file's path consists of the names of its parent directories, from most distant to immediate, concatenated by </>. The directories and files of a filesystem are that filesystem's nodes. A directory that contains no child nodes (ie: an empty directory) may or may not be deleted, which may cause recursive deletes of ancestor directories if the ancestor is now empty. When a file is written, any missing directories in the file's path will be created.

Provided Virtual File Systems

There are three virtual file system implementations provided as a part of this library:

  • InMemory -- This filesystem uses an MVar to store the state of the filesystem in memory. All the data for all the files are currently stored in-memory as the raw bytes: future versions of this library may persist larger files into compressed bytestrings or temp files.
  • Pure -- This filesystem uses StateT to persist the state of the filesystem. Unlike InMemory, the state is not portable and changes in state are not persisted after the monad resolves. However, this implementation is pure, which means it can be used with pure conduits or as a way to make testing faster and idempotent.
  • Disk -- This virtual file system is the traditional disk-based filesystem, with folders persisted to the operating system's filesystem.

Other Virtual File Systems

There are other VFS implementations in the works:

  • Zip -- Treat a .zip file as a filesystem, and provide conduits that will auto-expand zip entries provided by upstream VFS conduits.
  • Tar -- Treat a .tar file as a filesystem, and provide conduits that will auto-expand tar entries provided by upstream VFS conduits.
  • S3 -- Treat an AWS S3 bucket as a filesystem.
  • Twitter -- Treat Twitter as a file system. (eg: /users/AnthroPunk/status/1042294921216978944 goes to https://twitter.com/AnthroPunk/status/1042294921216978944).
  • SQL -- Treat an HDBC implementation as a filesystem. (eg: /foos/id/12345 retrieves the tuple from the foos table where the id field is 12345)

In addition, we are looking to implement a conduit which automatically uncompresses or compresses .xz, .lzma, .z, and .gz files provided by upstream VFS conduits.