Safe Haskell | None |
---|---|
Language | Haskell2010 |
Link-based datasets from https://linqs.soe.ucsc.edu/data
Synopsis
- restoreContent :: Binary c => FilePath -> IO (Map String (Int16, Seq Int16, c))
- data CitesRow a = CitesRow {}
- data ContentRow i c = CRow {}
- stash :: Binary c => FilePath -> String -> Int -> Parser c -> IO ()
- sourceGraphEdges :: (MonadResource m, MonadThrow m) => FilePath -> Map String (Int16, Seq Int16, c) -> ConduitT i (Maybe (Graph (ContentRow Int16 c))) m ()
- loadGraph :: Binary c => FilePath -> IO (Graph (ContentRow Int16 c))
Documentation
:: Binary c | |
=> FilePath | directory where the data files are saved |
-> IO (Map String (Int16, Seq Int16, c)) |
Load the graph node data from local storage
Who cites whom
Instances
Eq a => Eq (CitesRow a) Source # | |
Show a => Show (CitesRow a) Source # | |
Generic (CitesRow a) Source # | |
Binary a => Binary (CitesRow a) Source # | |
type Rep (CitesRow a) Source # | |
Defined in Algebra.Graph.IO.Datasets.LINQS type Rep (CitesRow a) = D1 ('MetaData "CitesRow" "Algebra.Graph.IO.Datasets.LINQS" "algebraic-graphs-io-0.4-E7MKPwsSlpuDfjqM3SouNJ" 'False) (C1 ('MetaCons "CitesRow" 'PrefixI 'True) (S1 ('MetaSel ('Just "cirTo") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 a) :*: S1 ('MetaSel ('Just "cirFrom") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 a))) |
data ContentRow i c Source #
Dataset row of the .content file
The .content file contains descriptions of the papers in the following format:
<paper_id> <word_attributes> <class_label>
The first entry in each line contains the unique string ID of the paper followed by binary values indicating whether each word in the vocabulary is present (indicated by 1) or absent (indicated by 0) in the paper. Finally, the last entry in the line contains the class label of the paper.
Instances
Internal
:: Binary c | |
=> FilePath | directory where the data files will be saved |
-> String | URI of .tar.gz file |
-> Int | dictionary size |
-> Parser c | document class |
-> IO () |
Download, decompress, parse, serialize and save the dataset to local storage
:: (MonadResource m, MonadThrow m) | |
=> FilePath | directory of data files |
-> Map String (Int16, Seq Int16, c) |
|
-> ConduitT i (Maybe (Graph (ContentRow Int16 c))) m () |
Stream out the edges of the citation graph, in which the nodes are decorated with the document metadata.
The full citation graph can be reconstructed by folding over this stream and overlay
ing the graph edges as they arrive.
This way the graph can be partitioned in training , test and validation subsets at the usage site