wp-archivebot: Subscribe to a wiki's RSS feed and archive external links

[ bsd3, network, program ] [ Propose Tags ] [ Report a vulnerability ]

A MediaWiki's RecentChanges or NewPages links to every new edit or article; this bot will poll the corresponding RSS feeds (easier and more reliable than parsing the HTML), follow the links to the new edit/article, and then use TagSoup to filter out every off-wiki link (eg. to http://cnn.com).

With this list of external links, the bot will then fire off requests to http://webcitation.org/, which will make a backup (similar to the Internet Archive, but on-demand).

Example: to archive links from every article in the English Wikipedia's RecentChanges:

wp-archivebot gwern0@gmail.com 'http://en.wikipedia.org/w/index.php?title=Special:RecentChanges&feed=rss'

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1
Dependencies base (>=3 && <4), feed, HTTP, network, parallel, tagsoup [details]
Tested with ghc ==6.10.2
License BSD-3-Clause
Author Gwern
Maintainer gwern0@gmail.com
Category Network
Uploaded by GwernBranwen at 2009-06-04T16:31:50Z
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Executables wp-archivebot
Downloads 1033 total (1 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs not available [build log]
All reported builds failed as of 2017-01-01 [all 7 reports]