Safe Haskell | None |
---|---|
Language | Haskell2010 |
A description of the operations that can be performed on nodes and columns.
- type SqlFunctionName = Text
- type UdafClassName = Text
- type OperatorName = Text
- data HdfsPath = HdfsPath Text
- data DataInputStamp = DataInputStamp Text
- data TransformInvariant
- data Locality
- = Local
- | Distributed
- data StandardOperator = StandardOperator {
- soName :: !OperatorName
- soOutputType :: !DataType
- soExtra :: !Value
- data ScalaStaticFunctionApplication = ScalaStaticFunctionApplication {
- sfaObjectName :: !Text
- sfaMethodName :: !Text
- data ColOp
- data TransformField = TransformField {}
- data StructuredTransform
- = InnerOp !ColOp
- | InnerStruct !(Vector TransformField)
- data UdafApplication
- data AggOp
- data AggField = AggField {}
- data AggTransform
- data SemiGroupOperator
- data DatasetTransformDesc
- data UniversalAggregatorOp = UniversalAggregatorOp {}
- data NodeOp2
- data Pointer = Pointer {}
- data NodeOp
- = NodeLocalOp StandardOperator
- | NodeLocalLit !DataType !Value
- | NodeBroadcastJoin
- | NodeOpaqueAggregator StandardOperator
- | NodeGroupedReduction !AggOp
- | NodeReduction !AggTransform
- | NodeAggregatorReduction UniversalAggregatorOp
- | NodeAggregatorLocalReduction UniversalAggregatorOp
- | NodeStructuredTransform !ColOp
- | NodeDistributedLit !DataType !(Vector Value)
- | NodeDistributedOp StandardOperator
- | NodePointer Pointer
- makeOperator :: Text -> SQLType a -> StandardOperator
Documentation
type SqlFunctionName = Text Source #
The name of a SQL function.
It is one of the predefined SQL functions available in Spark.
type UdafClassName = Text Source #
The classpath of a UDAF.
type OperatorName = Text Source #
The name of an operator defined in Karps.
A path in the Hadoop File System (HDFS).
These paths are usually not created by the user directly.
data DataInputStamp Source #
A stamp that defines some notion of uniqueness of the data source.
The general contract is that: - stamps can be extracted fast (no need to scan the whole dataset) - if the data gets changed, the stamp will change.
Stamps are used for performing aggressing operation caching, so it is better to conservatively update stamps if one is unsure about the freshness of the dataset. For regular files, stamps are computed using the file system time stamps.
data TransformInvariant Source #
The invariant respected by a transform.
Depending on the value of the invariant, different optimizations may be available.
Opaque | This operator has no special property. It may depend on the partitioning layout, the number of partitions, the order of elements in the partitions, etc. This sort of operator is unwelcome in Karps... |
PartitioningInvariant | This operator respects the canonical partition order, but may not have the same number of elements. For example, this could be a flatMap on an RDD (filter, etc.). This operator can be used locally with the signature a -> [a] |
DirectPartitioningInvariant | The strongest invariant. It respects the canonical partition order and it outputs the same number of elements. This is typically a map. This operator can be used locally with the signature a -> a |
The dynamic value of locality. There is still a tag on it, but it can be easily dropped.
Local | The data associated to this node is local. It can be materialized and accessed by the user. |
Distributed | The data associated to this node is distributed or not accessible locally. It cannot be accessed by the user. |
PHYSICAL OPERATORS ***********
data StandardOperator Source #
An operator defined by default in the release of Karps. All other physical operators can be converted to a standard operators.
StandardOperator | |
|
data ScalaStaticFunctionApplication Source #
A scala method of a singleton object.
The different kinds of column operations that are understood by the backend.
These operations describe the physical operations on columns as supported by Spark SQL. They can operate on column -> column, column -> row, row->row. Of course, not all operators are valid for each configuration.
ColExtraction !FieldPath | A projection onto a single column An extraction is always direct. |
ColFunction !SqlFunctionName !(Vector ColOp) | A function of other columns. In this case, the other columns may matter TODO(kps) add if this function is partition invariant. It should be the case most of the time. |
ColLit !DataType !Value | A constant defined for each element. The type should be the same as for the column A literal is always direct |
ColStruct !(Vector TransformField) | A structure. |
data StructuredTransform Source #
The content of a structured transform.
data UdafApplication Source #
When applying a UDAF, determines if it should only perform the algebraic portion of the UDAF (initialize+update+merge), or if it also performs the final, non-algebraic step.
A field in the resulting aggregation transform.
data AggTransform Source #
data SemiGroupOperator Source #
The representation of a semi-group law in Spark.
This is the basic law used in universal aggregators. It is a function on observables that must respect the following laws:
f :: X -> X -> X commutative associative
A neutral element is not required for the semi-group laws. However, if used in the context of a universal aggregator, such an element implicitly exists and corresponds to the empty dataset.
OpaqueSemiGroupLaw !StandardOperator | A standard operator that happens to respect the semi-group laws. |
UdafSemiGroupOperator !UdafClassName | The merging portion of a UDAF |
ColumnSemiGroupLaw !SqlFunctionName | A SQL operator that happens to respect the semi-group laws. |
DATASET OPERATORS ************
data DatasetTransformDesc Source #
OBSERVABLE OPERATORS *******
AGGREGATION OPERATORS *****
A pointer to a node that is assumed to be already computed.
NodeLocalOp StandardOperator | An operation between local nodes: [Observable] -> Observable |
NodeLocalLit !DataType !Value | An observable literal |
NodeBroadcastJoin | A special join that broadcasts a value along a dataset. |
NodeOpaqueAggregator StandardOperator | Some aggregator that does not respect any particular invariant. |
NodeGroupedReduction !AggOp | |
NodeReduction !AggTransform | |
NodeAggregatorReduction UniversalAggregatorOp | A universal aggregator. |
NodeAggregatorLocalReduction UniversalAggregatorOp | |
NodeStructuredTransform !ColOp | A structured transform, performed either on a local node or a distributed node. |
NodeDistributedLit !DataType !(Vector Value) | A distributed dataset (with no partition information) |
NodeDistributedOp StandardOperator | An opaque distributed operator. |
NodePointer Pointer |
makeOperator :: Text -> SQLType a -> StandardOperator Source #
Makes a standard operator with no extra value