This module provides all the settable options in shpider.
- stayOnDomain :: Bool -> Shpider ()
- setTimeOut :: Long -> Shpider ()
- setStartPage :: String -> Shpider ()
- getStartPage :: Shpider String
- onlyDownloadHtml :: Bool -> Shpider ()
- setCurrentPage :: Page -> Shpider ()
- getCurrentPage :: Shpider Page
- keepTrack :: Shpider ()
- addCurlOpts :: [CurlOption] -> Shpider ()
- setCurlOpts :: [CurlOption] -> Shpider ()
- setThrottle :: Maybe Int -> Shpider ()
Documentation
stayOnDomain :: Bool -> Shpider ()Source
Setting this to True
will forbid you to download
and sendForm
to any site which isn't on the domain shared by the url given in setStartPage
.
setTimeOut :: Long -> Shpider ()Source
Set the CurlTimeout option. Requests will TimeOut after this number of seconds.
setStartPage :: String -> Shpider ()Source
Set the start page of your shpidering antics. The start page must be an absolute URL, if not, this will raise an error.
getStartPage :: Shpider StringSource
Return the starting URL, as set by setStartPage
onlyDownloadHtml :: Bool -> Shpider ()Source
If onlyDownloadHtml is True, then during download
, shpider will make a HEAD request to see if the content type is text/html or application/xhtml+xml, and only if it is, then it will make a GET request.
setCurrentPage :: Page -> Shpider ()Source
Set the given page as the currentPage
.
getCurrentPage :: Shpider PageSource
Return the current page
When keepTrack is set, shpider will remember the pages which have been visited
.
addCurlOpts :: [CurlOption] -> Shpider ()Source
Add CURL options to Shpider
setCurlOpts :: [CurlOption] -> Shpider ()Source
Set Shpider's CURL options from scratch