amazonka-kendra-2.0: Amazon KendraFrontendService SDK.
Copyright(c) 2013-2023 Brendan Hay
LicenseMozilla Public License, v. 2.0.
MaintainerBrendan Hay
Stabilityauto-generated
Portabilitynon-portable (GHC extensions)
Safe HaskellSafe-Inferred
LanguageHaskell2010

Amazonka.Kendra.Types.SeedUrlConfiguration

Description

 
Synopsis

Documentation

data SeedUrlConfiguration Source #

Provides the configuration information for the seed or starting point URLs to crawl.

/When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own webpages, or webpages that you have authorization to index./

See: newSeedUrlConfiguration smart constructor.

Constructors

SeedUrlConfiguration' 

Fields

  • webCrawlerMode :: Maybe WebCrawlerMode

    You can choose one of the following modes:

    • HOST_ONLY – crawl only the website host names. For example, if the seed URL is "abc.example.com", then only URLs with host name "abc.example.com" are crawled.
    • SUBDOMAINS – crawl the website host names with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
    • EVERYTHING – crawl the website host names with subdomains and other domains that the webpages link to.

    The default mode is set to HOST_ONLY.

  • seedUrls :: [Text]

    The list of seed or starting point URLs of the websites you want to crawl.

    The list can include a maximum of 100 seed URLs.

Instances

Instances details
FromJSON SeedUrlConfiguration Source # 
Instance details

Defined in Amazonka.Kendra.Types.SeedUrlConfiguration

ToJSON SeedUrlConfiguration Source # 
Instance details

Defined in Amazonka.Kendra.Types.SeedUrlConfiguration

Generic SeedUrlConfiguration Source # 
Instance details

Defined in Amazonka.Kendra.Types.SeedUrlConfiguration

Associated Types

type Rep SeedUrlConfiguration :: Type -> Type #

Read SeedUrlConfiguration Source # 
Instance details

Defined in Amazonka.Kendra.Types.SeedUrlConfiguration

Show SeedUrlConfiguration Source # 
Instance details

Defined in Amazonka.Kendra.Types.SeedUrlConfiguration

NFData SeedUrlConfiguration Source # 
Instance details

Defined in Amazonka.Kendra.Types.SeedUrlConfiguration

Methods

rnf :: SeedUrlConfiguration -> () #

Eq SeedUrlConfiguration Source # 
Instance details

Defined in Amazonka.Kendra.Types.SeedUrlConfiguration

Hashable SeedUrlConfiguration Source # 
Instance details

Defined in Amazonka.Kendra.Types.SeedUrlConfiguration

type Rep SeedUrlConfiguration Source # 
Instance details

Defined in Amazonka.Kendra.Types.SeedUrlConfiguration

type Rep SeedUrlConfiguration = D1 ('MetaData "SeedUrlConfiguration" "Amazonka.Kendra.Types.SeedUrlConfiguration" "amazonka-kendra-2.0-IHloXAWlYIS8YTp1gXe6J" 'False) (C1 ('MetaCons "SeedUrlConfiguration'" 'PrefixI 'True) (S1 ('MetaSel ('Just "webCrawlerMode") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedStrict) (Rec0 (Maybe WebCrawlerMode)) :*: S1 ('MetaSel ('Just "seedUrls") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedStrict) (Rec0 [Text])))

newSeedUrlConfiguration :: SeedUrlConfiguration Source #

Create a value of SeedUrlConfiguration with all optional fields omitted.

Use generic-lens or optics to modify other optional fields.

The following record fields are available, with the corresponding lenses provided for backwards compatibility:

$sel:webCrawlerMode:SeedUrlConfiguration', seedUrlConfiguration_webCrawlerMode - You can choose one of the following modes:

  • HOST_ONLY – crawl only the website host names. For example, if the seed URL is "abc.example.com", then only URLs with host name "abc.example.com" are crawled.
  • SUBDOMAINS – crawl the website host names with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
  • EVERYTHING – crawl the website host names with subdomains and other domains that the webpages link to.

The default mode is set to HOST_ONLY.

$sel:seedUrls:SeedUrlConfiguration', seedUrlConfiguration_seedUrls - The list of seed or starting point URLs of the websites you want to crawl.

The list can include a maximum of 100 seed URLs.

seedUrlConfiguration_webCrawlerMode :: Lens' SeedUrlConfiguration (Maybe WebCrawlerMode) Source #

You can choose one of the following modes:

  • HOST_ONLY – crawl only the website host names. For example, if the seed URL is "abc.example.com", then only URLs with host name "abc.example.com" are crawled.
  • SUBDOMAINS – crawl the website host names with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
  • EVERYTHING – crawl the website host names with subdomains and other domains that the webpages link to.

The default mode is set to HOST_ONLY.

seedUrlConfiguration_seedUrls :: Lens' SeedUrlConfiguration [Text] Source #

The list of seed or starting point URLs of the websites you want to crawl.

The list can include a maximum of 100 seed URLs.