knit-haskell: a minimal Rmarkdown sort-of-thing for haskell, by way of Pandoc

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

knit-haskell is a beginning attempt at bringing some of the benefits of Rmarkdown to Haskell. It includes an effects stack (using polysemy rather than mtl) which includes logging, a simplified interface to Pandoc and various writer-like effects to intersperse document building with regular code. Also included is a cache (in-memory and persisted to disk) to make caching results of long running computations simple. The cache provides tools for basic dependency tracking. Various helper functions are provided to simplify common operations, making it especially straightforward to build an HTML document from bits of markdown, latex and Lucid or Blaze html. Support is also included for including hvega visualizations and diagrams from the diagrams package. More information is available in the readme.


[Skip to Readme]

Properties

Versions 0.1.0.0, 0.2.0.0, 0.3.0.0, 0.4.0.0, 0.5.0.0, 0.6.0.0, 0.6.0.1, 0.7.0.0, 0.8.0.0, 0.8.0.0
Change log ChangeLog.md
Dependencies aeson-pretty (>=0.8.7 && <0.9), base (>=4.12.0 && <4.15), base64-bytestring (>=1.0.0.2 && <1.2), blaze-colonnade (>=1.2.2 && <1.3), blaze-html (>=0.9.1 && <0.10), bytestring (>=0.10.8 && <0.11), case-insensitive (>=1.2.0.11 && <1.3), cereal (>=0.5.7 && <0.6), colonnade (>=1.1 && <1.3), constraints (>=0.10 && <0.13), containers (>=0.5.0 && <0.7), diagrams-lib (>=1.4 && <1.5.0.0), diagrams-svg (>=1.4.1 && <1.5.0.0), directory (>=1.3.3.0 && <1.4.0.0), doctemplates (>=0.2 && <0.9), exceptions (>=0.10.0 && <0.11), Glob (>=0.10.0 && <0.11.0), http-client (>=0.6.4 && <0.8), http-client-tls (>=0.3.5.3 && <0.4), http-types (>=0.12.3 && <0.13), hvega (>=0.2.0 && <0.11), lucid (>=2.9.11 && <2.10), monad-control (>=1.0.2 && <1.1), mtl (>=2.2.2 && <2.3), network (>=2.8.0.0 && <3.2), network-uri (>=2.6.1.0 && <2.8), pandoc (>=2.7.2 && <2.11), polysemy (>=1.3.0 && <1.4), polysemy-plugin (>=0.2.0.0 && <0.3), polysemy-zoo (>=0.6.0 && <0.8), prettyprinter (>=1.2.1 && <1.7), random (>=1.1 && <1.3), say (>=0.1.0 && <0.2), stm (>=2.4.5.1 && <2.6), streamly (>=0.7.2 && <0.7.3), streamly-bytestring (>=0.1.0 && <0.2), svg-builder (>=0.1.1 && <0.2), text (>=1.2.3 && <1.3), time (>=1.8.0 && <2.0.0), transformers-base (>=0.4.5 && <0.5) [details]
License BSD-3-Clause
Copyright 2019 Adam Conner-Sax
Author Adam Conner-Sax
Maintainer adam_conner_sax@yahoo.com
Category Text
Home page https://github.com/adamConnerSax/knit-haskell#readme
Bug tracker https://github.com/adamConnerSax/knit-haskell/issues
Source repo head: git clone https://github.com/adamConnerSax/knit-haskell
Uploaded by adamCS at 2020-07-02T18:01:03Z

Modules

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


Readme for knit-haskell-0.8.0.0

[back to package description]

knit-haskell v0.8.0.0

Build Status Hackage Hackage Dependencies

Breaking Changes

To move from v0.7.x.x to v0.8.x.x requires a change in how configuration parameters given to knit-html and knit-htmls are handled: they are now all placed inside a KnitConfig.
It's a trivial change to make and should make the configuration more future-proof as long as you build your KnitConfig like, e.g.,

myConfig :: KnitConfig
myConfig = (defaultKnitConfig $ Just "myCache") { outerLogPrefix = Just "MyReport"}

Also note that newer versions of Pandoc (2.9+) have their own breaking changes. knit-haskell can be compiled against these as well as the older versions, but there are some major changes which may affect you should you use any of the pandoc functions directly. In particular, Pandoc has now switched to using Text instead of String for most (all ?) things.

Introduction

knit-haskell is an attempt to emulate parts of the RMarkdown/knitR experience in haskell. The idea is to be able to build HTML (or, perhaps, some other things Pandoc can write) inside a haskell executable.
This package wraps Pandoc and the PandocMonad, has logging facilities and support for inserting hvega, diagrams, and plots based visualizations.
All of that is handled via writer-like effects, so additions to the documents can be interspersed with regular haskell code.

As of version 0.8.0.0, the effect stack includes a couple of new features. Firstly, an "Async" effect (Polysemy.Async) for running computations concurrently. Combinators for launching a concurrent action (async), awaiting (await) it's result and running some traversable structure of concurrent actions (sequenceConcurrently) are re-exported via Knit.Report. NB: Polysemy returns a Maybe a where the traditional interface returns an a. From the docs "The Maybe returned by async is due to the fact that we can't be sure an Error effect didn't fail locally."

A persistent (using memory and disk) cache for "shelving" the results of computations during and between runs.
Using the default setup, anything which has a Serialize instance from the cereal package can be cached. You can use a different serializer if you so choose, but you will have write a bit of code to bridge the serializer's interface and, depending on what the serializer encodes to, you may also have to write your own persistence functions for saving/loading that type to/from disk. See Knit.Effect.Serialize and Knit.Effect.AtomicCache for details.

If you use the cache, and you are running in a version-controlled directory, you probably want to add your cache directory, specified in KnitConfigure and defaulting to ".knit-haskell-cache", to ".gitignore" or equivalent.

Once data has been loaded from disk/produced once, it remains available in memory (in serialized form) via its key. The cache handles multi-threading gracefully. The in-memory cache is stored in a TVar so only one thread may make requests at a time. If multiple threads request the same item, one not currently in-memory--a relatively common pattern if multiple analyses of the same data are run asynchronously--the first request will fetch or create the data and the rest will block until the first one gets a result, at which point the blocked threads will received the now in-memory data and proceed.

Data can be put into the cache via store, and retrieved via retrieve. Retrieval from cache does not actually retrieve the data, but a structure with a time-stamp (Maybe Time.Clock.UTCTime) and a monadic computation which can produce the data:

data WithCacheTime m a where
  WithCacheTime :: Maybe Time.UTCTime -> m a -> WithCacheTime m a

To get the data from a WithCacheTime you can use functions from the library to "ignore" the time and bind the result: ignoreCacheTime :: WithCacheData m a -> m a or

ignoreCacheTimeM :: m (WithCacheData m a) -> ma 
ignoreCacheTimeM = join . ignoreCacheTime

Though direct storage and retrieval is useful, typically, one would use the cache to store the result of a long-running computation so it need only be run once. This pattern is facilitated via

retrieveOrMake :: k -> WithCacheTime m b -> (b -> m a) -> m (WithCacheTime m a)

which takes a key, a set of dependencies, of type b, with a time-stamp, a (presumably expensive) function taking those dependencies and producing a time-stamped monadic computation for the desired result. If the requested data is cached, the time stamp (modification time of the file in cache, more or less) is compared to the time-stamp on the dependencies. As long as the dependencies are older than the cached data, an action producing the cached result is returned. If there is no data in the cache for that key or the in-cache data is too old, the action producing the dependencies is "run" and those dependencies are fed to the computation given, producing the data and caching the result.

NB: The returned monadic computation is not simply the result of applying the dependencies to the given function. That computation is run, if necessary, in order to produce the data, which is then serialized and cached. The returned monadic computation is either the data produced by the given computation, put into the monad via pure, or the result pulled from the cache before it is deserialized. Running the returned computation performs the deserialization so the data can be used. This allows checking the time-stamp of data without deserializing it in order to make the case where it's never actually used more efficient.

WithCacheTime is an applicative functor, which facilitates its primary use, to store a set of dependencies and the latest time at which something which depends on them could have been computed and still be valid. As an example, suppose you have three long-running computations, the last of which depends on the first two:

longTimeA :: AData
longTimeB :: BData
longTimeC :: AData -> BData -> CData

You might approach caching this sequence thusly:

cachedA <- retrieveOrMake "A.bin" (pure ()) (const longTimeA)
cachedB <- retrieveOrMake "B.bin" (pure ()) (const longTimeB)
let cDeps = (,) <$> cachedA <*> cachedB
cachedC <- retrieveOrMake "C.bin" cDeps $ \(a, b) -> longTimeC a b

and each piece of data will get cached when this is first run. Now suppose you change the computation longTimeA. You realize that the cached data is invalid, so you delete "A.bin" from the cache. The next time this code runs, it will recompute and cache the result of longTimeA, load the BData (serialized) from cache, see that the cached version of CData is out of date, and then deserialize BData````, and use it and the new AData to recompute and re-cacheCData. This doesn't eliminate the need for user intervention: the user still had to manually delete "A.bin" to force re-running longTimeA```, but it handles the downstream work of tracking the uses of that data and recomputing where required. I've found this extremely useful.

Entries can be cleared from the cache via clear.

The cache types are flexible:

-The default key type is Text but you may use anything with an Ord and Show instance (the latter for logging). The persistence layer will need to be able to turn the key into a key in that layer, e.g., a FilePath.

-The default serializer is the cereal package but you may use another (e.g., the binary package or store.

-The default in-memory storage is a streamly array of bytes (Word8) but this can also be changed.

To change these, the user must provide a serializer capable of serializing any data-type to be stored into the desired in-memory storage type, and a persistence layer which can persist that in-memory type.

Please see CacheExample for an example using the default serializer and in-memory storage type. See CacheExample2 for an identical example, but with a custom serializer (based on the store) package and using strict ByteStreams as the in-memory cache type.

Notes:

  1. Using Streamly requires some additional support for both Cereal and Polysemy. The encoding/decoding for Cereal are in this library, in Streamly.External.Cereal. The Polysemy issue is more complex. Since concurrent streamly streams can only be run over a monad with instances of MonadCatch and MonadBaseControl. The former is complex in Polysemy and the latter impossible, for good reason. So knit-haskell contains some helpers for Streamly streams: basically a wrapper over IO which allows use of knit-haskell logging. Concurrent streaming operations can be done over this monad and then, once the stream is serial or the result computed, that monad can be lifted into the regular knit-haskell Polysemy stack.
    See Knit.Utilities.Streamly for more details.

  2. Knit.Report, the main import, provides constraint helpers to use these effects. The clearest way to see how they are used is to look at the examples. The Cache effects are split into their own constraint helper because they have type-parameters and thus add a lot of inference complications. If you don't need them in a function, you need not specify them. Some of that inference can be improved using the polysemy-plugin in the source files where you have issues. Otherwise, you may need to use type applications when calling some functions in Knit.Report.Cache.

Supported Inputs

Examples

There are a few examples in the "examples" directory.

Notes

  1. adding "polysemy-plugin" in build-depends and
  2. Add "ghc-options: -fplugin=Polysemy.Plugin" to your package configuration or {-# OPTIONS_GHC -fplugin=Polysemy.Plugin #-} at the top of any source file with inference issues.

Pandoc effects and writer effects for document building are also provided.