scrapbook: collect posts of site that is wrote in config yaml using feed or scraping

[ library, mit, program, web ] [ Propose Tags ]

Modules

  • ScrapBook
    • ScrapBook.Cmd
      • ScrapBook.Cmd.Options
      • ScrapBook.Cmd.Run

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.3.2, 0.3.3, 0.5.0
Change log CHANGELOG.md
Dependencies base (>=4.7 && <5), drinkery, extensible (>=0.5), githash, rio (>=0.1.5), scrapbook, scrapbook-core (>=0.5), yaml [details]
License MIT
Copyright 2018 MATSUBARA Nobutada
Author MATSUBARA Nobutada
Maintainer MATSUBARA Nobutada
Category Web
Home page https://github.com/matsubara0507/scrapbook#readme
Bug tracker https://github.com/matsubara0507/scrapbook/issues
Source repo head: git clone https://github.com/matsubara0507/scrapbook
Uploaded by matsubara0507 at 2020-12-05T07:40:37Z
Distributions
Executables scrapbook
Downloads 1087 total (8 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs not available [build log]
All reported builds failed as of 2020-12-05 [all 3 reports]

Readme for scrapbook-0.5.0

[back to package description]

scrapbook

Hackage Build Status

This is cli tool that collect posts of site that is wrote in config yaml using feed or scraping.

Usage

  1. clone this repository or add scrapbook package to extra-deps in stack.yaml
  2. run stack install

e.g.

$ stack exec -- scrapbook -o "example" example/sites.yaml

Docker

$ docker run --rm -v `pwd`/example:/work matsubara0507/scrapbook scrapbook sites.yaml

build docker image:

$ stack --docker build -j 1 Cabal # if out of memory in docker
$ stack --docker --local-bin-path=./bin install
$ docker build -t matsubara0507/scrapbook . --build-arg local_bin_path=./bin

Command

scrapbook [options] [input-file]
  -o DIR                --output=DIR                 Write output to DIR instead of stdout.
  -t FORMAT, -w FORMAT  --to=FORMAT, --write=FORMAT  Specify output format. default is `feed`.
                        --version                    Show version

GHCi

>> import Control.Lens ((^.))
>> import Data.Maybe
>> conf <- fromJust <$> readConfig "example/sites.yaml"
>> (Right posts) <- collect . fmap concat $ mapM (fetch . toSite) (conf ^. #sites)
>> collect $ writeFeed "example" (fromJust $ conf ^. #feed) posts
Right ()

Example

see matsuara0507/scrapbook-example

Documentation

How to write config yaml file.

# configuration for generating Atom feed (Optional)
feed:
  ## write as site title to Atom feed
  title: "Sample Site Posts"
  ## write as site url to Atom feed
  baseUrl: "https://example.com"
  ## file name (Optional)
  ### if nothing, use same name from input file
  name: atom.xml

# Haskeller's site configuration
sites:
    ## Title of site
  - title: "ひげメモ"
    ## Author of site
    author: matsubara0507
    ## URL of site
    url: https://matsubara0507.github.io
    ## Feed url of site
    ### there are several field to set feed url
    ### `feed` is basic field. This field auto branch to Atom or RSS 2.0.
    feed: https://matsubara0507.github.io/feed
  - title: "Kuro's Blog"
    author: "Hiroyuki Kurokawa"
    url: http://kurokawh.blogspot.com/
    ### `atom` is for Atom feed.  
    atom:
      ### feed url of Atom
      url: http://kurokawh.blogspot.com/feeds/posts/default
      ### set attr as constraint for link on each entry of Atom feed (Optional)
      ### if nothing, choice head. if set multiple attr, conjunction.
      linkAttrs:
        rel: alternate
  - title: "あどけない話"
    author: "kazu-yamamoto"
    url: http://d.hatena.ne.jp/kazu-yamamoto
    ### `rss` is for RSS 2.0 feed.
    ### set feed url.
    rss: http://d.hatena.ne.jp/kazu-yamamoto/rss2