gdelt-0.1.0.0: GDELT V2 (Global Database of Events, Language, and Tone)

Copyright(c) Marco Zocca 2020
LicenseMIT
Maintainerocramz
Stabilityexperimental
Portabilityportable
Safe HaskellNone
LanguageHaskell2010

GDELT.V2.GKG

Contents

Description

Data : http://data.gdeltproject.org/gkg/index.html (2013 until present)

Codebook : http://data.gdeltproject.org/documentation/GDELT-Global_Knowledge_Graph_Codebook-V2.1.pdf

This codebook introduces the GDELT Global Knowledge Graph (GKG) Version 2.1, which expands GDELT’s ability to quantify global human society beyond cataloging physical occurrences towards actually representing all of the latent dimensions, geography, and network structure of the global news. It applies an array of highly sophisticated natural language processing algorithms to each document to compute a range of codified metadata encoding key latent and contextual dimensions of the document. To sum up the GKG in a single sentence, it connects every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what’s happening around the world, what its context is and who’s involved, and how the world is feeling about it, every single day.

Synopsis

RecordId

data RecordId Source #

GKGRECORDID

(string) Each GKG record is assigned a globally unique identifier.Unlike the EVENT system, which uses semi-sequential numbering to assign numeric IDs to each event record, the GKG system uses a date-oriented serial number. Each GKG record ID takes the form “YYYYMMDDHHMMSS-X” or “YYYYMMDDHHMMSS-TX” in which the first portion of the ID is the full date+time of the 15 minute update batch that this record was created in, followed by a dash, followed by sequential numbering for all GKG records created as part of that update batch. Records originating from a document that was translated by GDELT Translingual will have a capital “T” appearing immediately after the dash to allow filtering of English/non-English material simply by its record identifier. Thus, the fifth GKG record created as part of the update batch generated at 3:30AM on February 3, 2015 would have a GKGRECORDID of “20150203033000-5”and if it was based on a French-language document that was translated, it would have the ID "20150203033000-T5". This ID can be used to uniquely identify this particular record across the entire GKG database.

Constructors

RecordId 
Instances
Eq RecordId Source # 
Instance details

Defined in GDELT.V2.GKG

Show RecordId Source # 
Instance details

Defined in GDELT.V2.GKG

Generic RecordId Source # 
Instance details

Defined in GDELT.V2.GKG

Associated Types

type Rep RecordId :: Type -> Type #

Methods

from :: RecordId -> Rep RecordId x #

to :: Rep RecordId x -> RecordId #

type Rep RecordId Source # 
Instance details

Defined in GDELT.V2.GKG

type Rep RecordId = D1 (MetaData "RecordId" "GDELT.V2.GKG" "gdelt-0.1.0.0-GKPOXl1qZjP41letUwnafg" False) (C1 (MetaCons "RecordId" PrefixI True) (S1 (MetaSel (Just "riDate") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 LocalTime) :*: (S1 (MetaSel (Just "riTranslingual") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Bool) :*: S1 (MetaSel (Just "riSeqNo") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Int))))

V2.1Date

v21date :: Parser (Maybe LocalTime) Source #

V2.1DATE

(integer) This is the date in YYYYMMDDHHMMSSformat on which the news media used to constructthis GKG file was published. NOTE that unlike the main GDELT event stream files, this date representsthe date of publication of the documentfrom which the information was extracted –if the article discusses events in the past, the date is NOT time-shifted as it is for the GDELT event stream. This date will be the same for all rows in a file and is redundant from a data processing standpoint, but is provided to make it easier to load GKG files directly into an SQL database for analysis. NOTE: for somespecial collections this value may be 0 indicating that the field is either not applicable or not known for those materials. For example, OCR’d historical document collections may not have robust metadata on publication date.NOTE: the GKG 2.0 format still encoded this date in YYYYMMDD format, while under GKG 2.1 it is now in YYYYMMDDHHMMSS format.

SourceCollectionIdentifier

data SourceCollectionIdentifier Source #

V2SOURCECOLLECTIONIDENTIFIER

(integer) This is a numeric identifier that refers to the source collection the document came from and is used to interpret the DocumentIdentifier in the next column. In essence, it specifies how to interpret the DocumentIdentifier to locate the actual document. At present, it can hold one of the following values:

1 = WEB (The document originates from the open web and the DocumentIdentifier is a fully-qualified URL that can be used to access the document on the web).

2 = CITATIONONLY (The document originates from a broadcast, print, or other offline source in which only a textual citation is available for the document. In this case the DocumentIdentifier contains the textual citation for the document).

3 = CORE (The document originates from the CORE archive and the DocumentIdentifier contains its DOI, suitable for accessing the original document through the CORE website).

4 = DTIC (The document originates from the DTIC archive and the DocumentIdentifier contains its DOI, suitable for accessing the original document through the DTIC website).

5= JSTOR (The document originates from the JSTOR archive and the DocumentIdentifier contains its DOI, suitable for accessing the original document through your JSTOR subscriptionif your institution subscribes to it).

6 = NONTEXTUALSOURCE (The document originates from a textual proxy (such as closed captioning) of a non-textual information source (such as a video) available via a URL and the DocumentIdentifier provides the URL of the non-textual original source. At present, this Collection Identifier is used for processing of the closed captioning streams of the Internet Archive Television News Archive in which each broadcast is available via a URL, but the URL offers access only tothe video of the broadcast and does not provide any access to the textual closed captioning used to generate the metadata.This code is used in order to draw a distinction between URL-based textual material (Collection Identifier 1 (WEB) and URL-based non-textual material like the Television News Archive).

Instances
Enum SourceCollectionIdentifier Source # 
Instance details

Defined in GDELT.V2.GKG

Eq SourceCollectionIdentifier Source # 
Instance details

Defined in GDELT.V2.GKG

Show SourceCollectionIdentifier Source # 
Instance details

Defined in GDELT.V2.GKG

Generic SourceCollectionIdentifier Source # 
Instance details

Defined in GDELT.V2.GKG

Associated Types

type Rep SourceCollectionIdentifier :: Type -> Type #

type Rep SourceCollectionIdentifier Source # 
Instance details

Defined in GDELT.V2.GKG

type Rep SourceCollectionIdentifier = D1 (MetaData "SourceCollectionIdentifier" "GDELT.V2.GKG" "gdelt-0.1.0.0-GKPOXl1qZjP41letUwnafg" False) ((C1 (MetaCons "SCIWeb" PrefixI False) (U1 :: Type -> Type) :+: (C1 (MetaCons "SCICitationOnly" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "SCICore" PrefixI False) (U1 :: Type -> Type))) :+: (C1 (MetaCons "SCIDTIC" PrefixI False) (U1 :: Type -> Type) :+: (C1 (MetaCons "SCIJSTOR" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "SCINonTextualSource" PrefixI False) (U1 :: Type -> Type))))

Counts

data CountTy Source #

Instances
Eq CountTy Source # 
Instance details

Defined in GDELT.V2.GKG

Methods

(==) :: CountTy -> CountTy -> Bool #

(/=) :: CountTy -> CountTy -> Bool #

Show CountTy Source # 
Instance details

Defined in GDELT.V2.GKG

Generic CountTy Source # 
Instance details

Defined in GDELT.V2.GKG

Associated Types

type Rep CountTy :: Type -> Type #

Methods

from :: CountTy -> Rep CountTy x #

to :: Rep CountTy x -> CountTy #

type Rep CountTy Source # 
Instance details

Defined in GDELT.V2.GKG

type Rep CountTy = D1 (MetaData "CountTy" "GDELT.V2.GKG" "gdelt-0.1.0.0-GKPOXl1qZjP41letUwnafg" False) (((C1 (MetaCons "CTAffect" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "CTArrest" PrefixI False) (U1 :: Type -> Type)) :+: (C1 (MetaCons "CTKidnap" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "CTKill" PrefixI False) (U1 :: Type -> Type))) :+: ((C1 (MetaCons "CTProtest" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "CTSeize" PrefixI False) (U1 :: Type -> Type)) :+: (C1 (MetaCons "CTWound" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "CTOther" PrefixI False) (S1 (MetaSel (Nothing :: Maybe Symbol) NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Text)))))

V1Counts

data CountsV1 Source #

V1COUNTS

(semicolon-delimited blocks, with pound symbol (“”).Unlike the primary GDELT event stream, these records are not issued unique identifier numbers, nor are they dated.

As an example of how to interpret this file, an entry with CountType=KILL, Number=47, ObjectType=”jihadists” indicates that the article stated that 47 jihadists were killed. This field is identical in format and population as the correspondingfield in the GKG 1.0 format.

  • Count Type.(text) This is the value of the NAME field from the Category List spreadsheet indicating which category this count is of. At the time of this writing, this is most often AFFECT, ARREST, KIDNAP, KILL, PROTEST, SEIZE, or WOUND, though other categories may appear here as well in certain circumstances when they appear in context with one of these categories, or as other Count categories are added over time. A value of “PROTEST” in this field would indicatethat this is a count of the number of protesters at a protest.
  • Count.(integer) This is the actual count being reported. If CountType is “PROTEST” and Number is 126, this means that the source article contained a mention of 126 protesters.
  • Object Type.(text) This records any identifying information as to what the number refers to. For example, a mention of “20 Christian missionaries were arrested” will result in “Christian missionaries” being captured here. This field will be blank in cases where no identifying information could be identified.
  • LocationType. See the documentation for V1Locations below.
  • Location FullName.See the documentation for V1Locations below.
  • Location CountryCode.See the documentation for V1Locations below.
  • Location ADM1Code.See the documentation for V1Locations below.
  • Location Latitude. See the documentation for V1Locations below.
  • Location Longitude. See the documentation for V1Locations below.
  • Location FeatureID. See the documentation for V1Locations below.
Instances
Eq CountsV1 Source # 
Instance details

Defined in GDELT.V2.GKG

Show CountsV1 Source # 
Instance details

Defined in GDELT.V2.GKG

Generic CountsV1 Source # 
Instance details

Defined in GDELT.V2.GKG

Associated Types

type Rep CountsV1 :: Type -> Type #

Methods

from :: CountsV1 -> Rep CountsV1 x #

to :: Rep CountsV1 x -> CountsV1 #

type Rep CountsV1 Source # 
Instance details

Defined in GDELT.V2.GKG

V2.1Counts

data CountsV21 Source #

V2.1COUNTS

(semicolon-delimited blocks, with pound symbol (“#”) delimited fields) This field is identical to the V1COUNTS field except that it adds a final additional field to the end of each entry that records its approximate character offset in the document, allowing it to be associated with other entries from other “V2ENHANCED” fields (or Events) that appear in closest proximity to it. Note:unlike the other location-related fields, the Counts field does NOT add ADM2 support at this time. This is to maintain compatibility with assumptions that many applications make about the contents of the Count field. Those applications needing ADM2 support for Counts should cross-reference the FeatureID field of a given Count against the V2Locations field to determine its ADM2 value.

Constructors

CountsV21 
Instances
Eq CountsV21 Source # 
Instance details

Defined in GDELT.V2.GKG

Show CountsV21 Source # 
Instance details

Defined in GDELT.V2.GKG

Generic CountsV21 Source # 
Instance details

Defined in GDELT.V2.GKG

Associated Types

type Rep CountsV21 :: Type -> Type #

type Rep CountsV21 Source # 
Instance details

Defined in GDELT.V2.GKG

type Rep CountsV21 = D1 (MetaData "CountsV21" "GDELT.V2.GKG" "gdelt-0.1.0.0-GKPOXl1qZjP41letUwnafg" False) (C1 (MetaCons "CountsV21" PrefixI True) (S1 (MetaSel (Just "c21countsV1") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 CountsV1) :*: S1 (MetaSel (Just "c21charOffset") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Int)))

LocationV1

data LocationV1 Source #

V1LOCATIONS

(semicolon-delimited blocks, with pound symbol (“#”) delimited fields) This is a list of all locations found in the text, extracted through the Leetaru (2012) algorithm. 2The algorithm is run in a more aggressive stance here than ordinary in order to extract every possible locative referent, so may have a slightly elevated level of false positives. NOTE:some locations have multiple accepted formal or informal names and this field is collapsed on name, rather than feature (since in some applications the understanding of a geographic feature differs based on which name was used to reference it). In cases where it is necessary to collapse by feature, the Geo_FeatureID column should be used, rather than the Geo_Fullname column. This is because the Geo_Fullname column captures the name of the location as expressed in the text and thus reflects differences in transliteration, alternative spellings, and alternative names for the same location. For example, Mecca is often spelled Makkah, while Jeddah is commonly spelled Jiddah or Jaddah. The Geo_Fullname column will reflect each of these different spellings, while the Geo_FeatureID column will resolve them all to the same unique GNS or GNIS feature identification number. For more information on the GNS andGNIS identifiers, see Leetaru (2012). 3This field is identical in format and population as the corresponding field in the GKG 1.0 format. NOTE:there was an error in this field from 2192015 through midday 312015 that caused the CountryCode field to listthe wrong country code in somecases.

  • Location Type. (integer) This field specifies the geographic resolution of the match type and holds one of the following values: 1=COUNTRY (match was at the country level), 2=USSTATE (match was to a US state), 3=USCITY (match was to a US city or landmark), 4=WORLDCITY (match was to a city or landmark outside the US), 5=WORLDSTATE (match was to an Administrative Division 1 outside the US –roughly equivalent to a US state). This can be used to filter counts by geographic specificity, for example, extracting only those counts with a landmark-level geographic resolution for mapping. Note that matches with codes 1 (COUNTRY), 2 (USSTATE), and 5 (WORLDSTATE) will still provide a latitude-longitude pair, which will be the centroid of that country or state, but the FeatureID field below will contain its textual country or ADM1 code instead of a numeric featureid.
  • Location FullName.(text) This is the full human-readable name of the matched location. In the case of a country it is simply the country name. For US and World states it is in the format of “State, Country Name”, while for all other matches it is in the format of “City/Landmark, State, Country”. This can be used to label locations when placing counts on a map. Note: this field reflects the precise name used to refer to the location in the text itself, meaning it may contain multiple spellings of the same location –use the FeatureID column to determine whether two location names refer to the same place.
  • Location CountryCode. (text) This is the 2-character FIPS10-4 country code for the location. Note:GDELT continues to use the FIPS10-4 codes under USG guidance while GNS continues its formal transitionto the successor Geopolitical Entities, Names, and Codes (GENC) Standard (the US Government profile of ISO 3166).
  • Location ADM1Code. (text) This is the 2-character FIPS10-4 country code followed by the 2-character FIPS10-4 administrative division 1 (ADM1) code for the administrative division housing the landmark. In the case of the United States, this is the 2-character shortform of the state’s name (such as “TX” for Texas). Note : see the notice above for CountryCode regarding the FIPS10-4 - GENC transition. Note: to obtain ADM2 (district-level) assignments for locations, you can either perform a spatial join against a ShapeFile template in any GIS software, or cross-walk the FeatureID to the GNIS-GNS databases –this will provide additional fields such as ADM2 codes and MGRS grid references for GNS.
  • Location Latitude. (floating point number) This is the centroid latitude of the landmark for mapping.In the case of a country or administrative division this will reflect the centroid of that entire country resp. division.
  • Location Longitude. (floating point number) This is the centroid longitude of the landmark for mapping.In the case of a country or administrative division this will reflect the centroid of that entire country resp. division.
  • Location FeatureID. (text OR signed integer) This is the numeric GNS or GNIS FeatureID for this location OR a textual country or ADM1 code. More information on these values can be found in Leetaru (2012). Note : This field will be blank or contain a textual ADM1 code for country or ADM1-level matches –see above. Note: For numeric GNS or GNIS FeatureIDs, this field can contain both positive and negative numbers, see Leetaru (2012) for more information on this.
Instances
Eq LocationV1 Source # 
Instance details

Defined in GDELT.V2.GKG

Show LocationV1 Source # 
Instance details

Defined in GDELT.V2.GKG

Generic LocationV1 Source # 
Instance details

Defined in GDELT.V2.GKG

Associated Types

type Rep LocationV1 :: Type -> Type #

type Rep LocationV1 Source # 
Instance details

Defined in GDELT.V2.GKG

data LocationTy Source #

Instances
Enum LocationTy Source # 
Instance details

Defined in GDELT.V2.GKG

Eq LocationTy Source # 
Instance details

Defined in GDELT.V2.GKG

Show LocationTy Source # 
Instance details

Defined in GDELT.V2.GKG

Generic LocationTy Source # 
Instance details

Defined in GDELT.V2.GKG

Associated Types

type Rep LocationTy :: Type -> Type #

type Rep LocationTy Source # 
Instance details

Defined in GDELT.V2.GKG

type Rep LocationTy = D1 (MetaData "LocationTy" "GDELT.V2.GKG" "gdelt-0.1.0.0-GKPOXl1qZjP41letUwnafg" False) ((C1 (MetaCons "LTCountry" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "LTUSState" PrefixI False) (U1 :: Type -> Type)) :+: (C1 (MetaCons "LTUSCity" PrefixI False) (U1 :: Type -> Type) :+: (C1 (MetaCons "LTWorldCity" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "LTWorldState" PrefixI False) (U1 :: Type -> Type))))