feed-collect-0.1.0.0: Watch RSS/Atom feeds (and do with them whatever you like).

Safe HaskellNone
LanguageHaskell2010

Web.Feed.Collect

Contents

Description

Intro

This module allows you to run a loop which visits web feeds (RSS, Atom, etc.) periodically and reports new items using a function you provide. You can also insert control commands into the loop, such as adding and removing a feed, so that stopping and restarting it isn't required.

Both http and https URLs are supported.

The original use case which motivated the creation of this library is FunBot.

Running

The run function runs the loop, and it takes a collector callback function (as one of its parameters) to be called when new feed items are found.

Here is a simple usage example.

import Data.Time.Units
import Web.Feed.Collect

collect :: Label -> URL -> Feed -> Item -> IO ()
collect label url feed item = do
    putStrLn $ label ++ " : " ++ url
    putStrLn "Got a new feed item!"
    putStrLn $ showFeed feed
    putStrLn $ showItem item

logError :: Error -> IO ()
logError = print

feeds :: [(Label, Url)]
feeds = [("democ-now", "http://www.democracynow.org/democracynow.rss")]

main :: IO ()
main = run collect Nothing logError Nothing (1 :: Minute) 3 feeds

For quick testing, you can use one of the predefined collector functions. For example, there is collectorNull which discards the feed items and does nothing (you can use it e.g. when testing success of HTTP requests or debugging the library itself), and collectorPretty which writes a short nicely formatted entry to stdout for each new feed item. And there are more.

If new items aren't being detected correctly, you can try using runDebug in place of run. It takes and same arguments and runs the same loop, but also writes a detailed log of the detection process into a file "debug.log". You can use tail -f debug.log from your terminal to watch new log entries get appended to the file.

Using Control Commands

Now let's see how to push control commands into the loop. The 4th parameter run takes is an optional command queue. In the example above we just passed Nothing. Now let's provide an actual queue and use it.

Suppose we are writing a program with a command-line interface. It watches news feeds in the background, and can take commands from the user at the same time on the command-line. Feeds can be added, removed, etc. without restarting the program.

Assume we've written a function named parseCommand, which takes a line of user input and returns a command ready to push into the queue. The parsed commands are of type Command and are created using functions like addFeed, removeFeed and so on, which this module provides. Using our parseCommand we can write the program like this:

import Control.Concurrent (forkIO)
import Data.Time.Units
import System.IO
import Web.Feed.Collect

collect :: Label -> URL -> Feed -> Item -> IO ()
collect = collectPretty

logE :: Error -> IO ()
logE = print

feeds :: [(Label, Url)]
feeds = [("fsf-news", "https://www.fsf.org/static/fsforg/rss/news.xml")]

parseCommand :: String -> Maybe Command
parseCommand line = {- ... - }

main :: IO ()
main = do
    cqueue <- newCommandQueue
    forkIO $ run collect Nothing logE (Just cqueue) (1 :: Minute) 3 feeds
    let loop = do
        line <- getLine
        if line == "quit"
            then putStrLn "Bye!"
            else do
                case parseCommand line of
                    Just cmd -> sendCommand cqueue cmd
                    Nothing  -> putStrLn "Invalid input"
                loop
    loop

Synopsis

Types

type Label = String Source

A short name tag for a feed, for quick reference and internal use.

type Url = String Source

An HTTP or HTTPS URL of a feed or a feed item.

data Command Source

A control command sent to the feed watching loop, and affecting its behavior.

data CommandQueue Source

A queue of control commands for the run loop to execute while running.

Commands can be pushed concurrently from different threads, but there should be only one run invocation reading from the queue (i.e. although this is unlikely anyway, don't pass the same queue to multiple concurrent run calls).

data Error Source

An error occuring while reading from a news feed.

Constructors

HttpError HttpException

Error while creating an HTTP request or receiving a response.

FeedParsingFailed Label Url

Error while parsing the HTTP response body into feed content

Instances

Collectors

collectorNull :: Label -> Url -> Feed -> Item -> IO () Source

An item collector which discards the item, i.e. does nothing.

collectorPrint :: Label -> Url -> Feed -> Item -> IO () Source

An item collector which prints feed and item fields to stdout.

collectorPretty :: Label -> Url -> Feed -> Item -> IO () Source

An item collector which prints short friendly feed and item descriptions to stdout.

collectorLog :: (Label -> FilePath) -> Label -> Url -> Feed -> Item -> IO () Source

An item collector which writes friendly descriptions into a log file, determining the log file's name using the function given as the first argument (the feed label, i.e. second argument, is also used as the argument for that function).

Running

run Source

Arguments

:: (TimeUnit t, MonadIO m) 
=> (Label -> Url -> Feed -> Item -> m ())

Collector, i.e. action to perform when receiving a feed item.

-> Maybe (Label -> Url -> Feed -> [Item] -> m ())

Action to perform when receiving feed items. This is just a chance to provide an efficient shortcut instead of repeated use of the previous parameter. If there is no such shortcut, simply pass Nothing.

-> (Error -> m ())

Error logging action. It will be called when an error occurs while trying to download and parse a feed. Such an error doesn't cause anything to stop. Execution simply goes on to read the next feed, and will try the erronous feed again in the next round.

-> Maybe CommandQueue

A command queue you can use to change settings while the program runs. For example, you can add a new feed or change the interval between polls without relaunching run.

-> t

Time interval between visits of a watched feed.

-> Int

Maximal number of items to collect per visit (if more are available, they will be collected in the next visit)

-> [(Label, Url)]

List of short feed labels (for easy reference and logging), and their URLs to watch.

-> m () 

Watch feeds and perform the given action on received feed items.

runDebug :: (TimeUnit t, MonadIO m) => (Label -> Url -> Feed -> Item -> m ()) -> Maybe (Label -> Url -> Feed -> [Item] -> m ()) -> (Error -> m ()) -> Maybe CommandQueue -> t -> Int -> [(Label, Url)] -> m () Source

Like run, but writes detailed reports into a file named "debug.log". This should help you understand which feeds items are being received and why new items aren't being detected in the way you expect.

For some feeds, the new item detection method used by this library may fail. In that case, please report the issue, and then the method can be updated and extended to support more cases.

Control commands

newCommandQueue :: IO CommandQueue Source

Create a new empty command queue.

sendCommand :: CommandQueue -> Command -> IO () Source

Send a command into the queue.

sendCommands :: CommandQueue -> [Command] -> IO () Source

Send a series of commands into the queue.

addFeed :: Label -> Url -> Command Source

Add a new feed to watch.

removeFeed :: Label -> Command Source

Remove a previously added feed.

feedActive :: Label -> Bool -> Command Source

Set whether a given feed should be watched (active) or not (inactive).

setInterval :: TimeUnit t => t -> Command Source

Set the interval, in microseconds, between feed scans.

setMaxPerVisit :: Int -> Command Source

Set the maximal number of feed items to be collected per feed per scan. If more new items are found, they wait for the next scan.

Utilities

showFeed :: Feed -> String Source

A short one-line description of a feed.

showItem :: Item -> String Source

A short one-line description of a feed item.