Safe Haskell	None
Language	Haskell2010

Spark.Core.Internal.DatasetStructures

Synopsis

Documentation

data ComputeNode loc a Source #

(internal) The main data structure that represents a data node in the computation graph.

This data structure forms the backbone of computation graphs expressed with spark operations.

loc is a typed locality tag. a is the type of the data, as seen by the Haskell compiler. If erased, it will be a Cell type.

Constructors

ComputeNode

Fields

_cnNodeId :: NodeId
The id of the node.
Non strict because it may be expensive.
_cnOp :: !NodeOp
The operation associated to this node.
_cnType :: !DataType
The type of the node
_cnParents :: !(Vector UntypedNode)
The direct parents of the node. The order of the parents is important for the semantics of the operation.
_cnLogicalDeps :: !(Vector UntypedNode)
A set of extra dependencies that can be added to force an order between the nodes.
The order is not important, they are sorted by ID.
TODO(kps) add this one to the id
_cnLocality :: !Locality
The locality of this node.
TODO(kps) add this one to the id
_cnName :: !(Maybe NodeName)
The name
_cnLogicalParents :: !(Maybe (Vector UntypedNode))
A set of nodes considered as the logical input for this node. This has no influence on the calculation of the id and is used for organization purposes only.
_cnPath :: NodePath
The path of this oned in a computation flow.
This path includes the node name. Not strict because it may be expensive to compute. By default it only contains the name of the node (i.e. the node is attached to the root)

Instances

Eq (ComputeNode loc a) Source #
Methods (==) :: ComputeNode loc a -> ComputeNode loc a -> Bool # (/=) :: ComputeNode loc a -> ComputeNode loc a -> Bool #
CanRename (ComputeNode loc a) String Source #
Methods (@@) :: ComputeNode loc a -> String -> ComputeNode loc a Source #

data TypedLocality loc Source #

Constructors

TypedLocality
Fields unTypedLocality :: !Locality

Instances

Eq (TypedLocality loc) Source #
Methods (==) :: TypedLocality loc -> TypedLocality loc -> Bool # (/=) :: TypedLocality loc -> TypedLocality loc -> Bool #
Show (TypedLocality loc) Source #
Methods showsPrec :: Int -> TypedLocality loc -> ShowS # show :: TypedLocality loc -> String # showList :: [TypedLocality loc] -> ShowS #

data LocLocal Source #

Instances

IsLocality LocLocal Source #
Methods _getTypedLocality :: TypedLocality LocLocal Source #
CheckedLocalityCast LocLocal Source #
Methods _validLocalityValues :: [TypedLocality LocLocal] Source #

data LocDistributed Source #

Instances

IsLocality LocDistributed Source #
Methods _getTypedLocality :: TypedLocality LocDistributed Source #
CheckedLocalityCast LocDistributed Source #
Methods _validLocalityValues :: [TypedLocality LocDistributed] Source #

data LocUnknown Source #

Instances

CheckedLocalityCast LocUnknown Source #
Methods _validLocalityValues :: [TypedLocality LocUnknown] Source #

type UntypedNode = ComputeNode LocUnknown Cell Source #

type UntypedDataset = Dataset Cell Source #

type UntypedLocalData = LocalData Cell Source #

type Dataset a = ComputeNode LocDistributed a Source #

A typed collection of distributed data.

Most operations on datasets are type-checked by the Haskell compiler: the type tag associated to this dataset is guaranteed to be convertible to a proper Haskell type. In particular, building a Dataset of dynamic cells is guaranteed to never happen.

If you want to do untyped operations and gain some flexibility, consider using UDataFrames instead.

Computations with Datasets and observables are generally checked for correctness using the type system of Haskell.

type LocalData a = ComputeNode LocLocal a Source #

A unit of data that can be accessed by the user.

This is a typed unit of data. The type is guaranteed to be a proper type accessible by the Haskell compiler (instead of simply a Cell type, which represents types only accessible at runtime).

TODO(kps) rename to Observable

type DataFrame = Try (Dataset Cell) Source #

The dataframe type. Any dataset can be converted to a dataframe.

For the Spark users: this is different than the definition of the dataframe in Spark, which is a dataset of rows. Because the support for single columns is more akward in the case of rows, it is more natural to generalize datasets to contain cells. When communicating with Spark, though, single cells are wrapped into rows with single field, as Spark does.

type LocalFrame = Try (LocalData Cell) Source #

Observable, whose type can only be infered at runtime and that can fail to be computed at runtime.

Any observable can be converted to an untyped observable.

Untyped observables are more flexible and can be combined in arbitrary manner, but they will fail during the validation of the Spark computation graph.

TODO(kps) rename to DynObservable

data NodeEdge Source #

The different paths of edges in the compute DAG of nodes, at the start of computations.

scope edges specify the scope of a node for naming. They are not included in the id.

Constructors

ScopeEdge
DataStructureEdge StructureEdge

Instances

Eq NodeEdge Source #
Methods (==) :: NodeEdge -> NodeEdge -> Bool # (/=) :: NodeEdge -> NodeEdge -> Bool #
Show NodeEdge Source #
Methods showsPrec :: Int -> NodeEdge -> ShowS # show :: NodeEdge -> String # showList :: [NodeEdge] -> ShowS #

data StructureEdge Source #

The edges in a compute DAG, after name resolution (which is where most of the checks and computations are being done)

parent edges are the direct parents of a node, the only ones required for defining computations. They are included in the id.
logical edges define logical dependencies between nodes to force a specific ordering of the nodes. They are included in the id.

Constructors

ParentEdge
LogicalEdge

Instances

Eq StructureEdge Source #
Methods (==) :: StructureEdge -> StructureEdge -> Bool # (/=) :: StructureEdge -> StructureEdge -> Bool #
Show StructureEdge Source #
Methods showsPrec :: Int -> StructureEdge -> ShowS # show :: StructureEdge -> String # showList :: [StructureEdge] -> ShowS #

class CheckedLocalityCast loc where Source #

Methods

_validLocalityValues :: [TypedLocality loc] Source #

Instances

class CheckedLocalityCast loc => IsLocality loc where Source #

Methods

_getTypedLocality :: TypedLocality loc Source #

Instances

IsLocality LocDistributed Source #
Methods _getTypedLocality :: TypedLocality LocDistributed Source #
IsLocality LocLocal Source #
Methods _getTypedLocality :: TypedLocality LocLocal Source #