krapsh-0.1.6.1: Haskell bindings for Spark Dataframes and Datasets

Safe HaskellNone
LanguageHaskell2010

Spark.Core.Internal.OpStructures

Contents

Description

A description of the operations that can be performed on nodes and columns.

Synopsis

Documentation

data TransformInvariant Source #

The invariant respected by a transform.

Depending on the value of the invariant, different optimizations may be available.

Constructors

Opaque

This operator has no special property. It may depend on the partitioning layout, the number of partitions, the order of elements in the partitions, etc. This sort of operator is unwelcome in Krapsh...

PartitioningInvariant

This operator respects the canonical partition order, but may not have the same number of elements. For example, this could be a flatMap on an RDD (filter, etc.). This operator can be used locally with the signature a -> [a]

DirectPartitioningInvariant

The strongest invariant. It respects the canonical partition order and it outputs the same number of elements. This is typically a map. This operator can be used locally with the signature a -> a

data Locality Source #

The dynamic value of locality. There is still a tag on it, but it can be easily dropped.

Constructors

Local

The data associated to this node is local. It can be materialized and accessed by the user.

Distributed

The data associated to this node is distributed or not accessible locally. It cannot be accessed by the user.

PHYSICAL OPERATORS ***********

data StandardOperator Source #

An operator defined by default in the release of Krapsh. All other physical operators can be converted to a standard operators.

Constructors

StandardOperator 

data ScalaStaticFunctionApplication Source #

A scala method of a singleton object.

data ColOp Source #

The different kinds of column operations. These operations describe the physical operations on columns as supported by Spark SQL. They can operate on column -> column, column -> row, row->row. Of course, not all operators are valid for each configuration.

Constructors

ColExtraction !FieldPath

A projection onto a single column An extraction is always direct.

ColFunction !Text !(Vector ColOp)

A function of other columns. In this case, the other columns may matter TODO(kps) add if this function is partition invariant. It should be the case most of the time.

ColLit !DataType !Value

A constant defined for each element. The type should be the same as for the column A literal is always direct

ColStruct !(Vector TransformField)

A structure.

Instances

Eq ColOp Source # 

Methods

(==) :: ColOp -> ColOp -> Bool #

(/=) :: ColOp -> ColOp -> Bool #

Show ColOp Source # 

Methods

showsPrec :: Int -> ColOp -> ShowS #

show :: ColOp -> String #

showList :: [ColOp] -> ShowS #

DATASET OPERATORS ************

OBSERVABLE OPERATORS *******

AGGREGATION OPERATORS *****

data NodeOp Source #

Constructors

NodeLocalOp StandardOperator

An operation between local nodes: [Observable] -> Observable

NodeLocalLit !DataType !Value

An observable literal

NodeOpaqueAggregator StandardOperator

Some aggregator that does not respect any particular invariant.

NodeUniversalAggregator UniversalAggregatorOp

A universal aggregator.

NodeStructuredTransform !ColOp

A structured transform, performed either on a local node or a distributed node.

NodeDistributedLit !DataType !(Vector Value)

A distributed dataset (with no partition information)

NodeDistributedOp StandardOperator

An opaque distributed operator.

Instances

makeOperator :: Text -> SQLType a -> StandardOperator Source #

Makes a standard operator with no extra value