krapsh-0.1.6.1: Haskell bindings for Spark Dataframes and Datasets

Safe HaskellNone
LanguageHaskell2010

Spark.Core.Column

Contents

Description

Operations on columns.

Synopsis

Types

type Column ref a = ColumnData ref a Source #

A column of data from a dataset or a dataframe.

This column is typed: the operations on this column will be validdated by Haskell' type inferenc.

type DynColumn = Try (ColumnData UnknownReference Cell) Source #

An untyped column of data from a dataset or a dataframe.

This column is untyped and may not be properly constructed. Any error will be found during the analysis phase at runtime.

Extractions and collations

asCol :: HasCallStack => Dataset a -> Column a a Source #

Represents a dataframe as a single column.

pack1 :: HasCallStack => Column ref a -> Dataset a Source #

Packs a single column into a dataframe.

pack :: forall ref a b. (StaticColPackable2 ref a b, HasCallStack) => a -> Dataset b Source #

Packs a number of columns with the same references into a single dataset.

The type of the dataset must be provided in order to have proper type inference.

TODO: example.

pack' :: DynColPackable a => a -> DataFrame Source #

Packs a number of columns into a single dataframe.

This operation is checked for same origin and no duplication of columns.

This function accepts columns, list of columns and tuples of columns (both typed and untyped).

struct :: forall ref a b. (StaticColPackable2 ref a b, HasCallStack) => a -> Column ref b Source #

Packs a number of columns into a single structure, given a return type.

The field names of the columns are discarded, and replaced by the field names of the structure.

struct' :: HasCallStack => [DynColumn] -> DynColumn Source #

Packs a number of columns into a single column (the struct construct).

Columns must have different names, or an error is returned.

castCol :: ColumnReference ref -> SQLType a -> DynColumn -> Try (Column ref a) Source #

Casts a dynamic column to a statically typed column.

In this case, one must supply the reference (which can be obtained from another column with colRef, or from a dataset), and a type (which can be built using the buildType function).

castCol' :: SQLType a -> DynColumn -> Try (Column UnknownReference a) Source #

Casts a dynamic column to a statically typed column, but does not attempt to enforce a single origin at the type level.

This is useful when building a dataset from a dataframe: the origin information cannot be conveyed since it is not available in the first place.

colRef :: Column ref a -> ColumnReference ref Source #

A tag with the reference of a column.

This is useful when casting dynamic columns to typed columns.

(//) :: forall from proj to. Projection from proj to => from -> proj -> to Source #

The projector operation.

This is the general projection operation in Spark. It lets you extract columns from datasets or dataframes, or sub-observables form observables.

TODO(kps) put an example here.

data StaticColProjection from to Source #

Algebraic structures that are common to columns and observables.

The class of static projections that are guaranteed to succeed by using the type system.

from is the type of the dataset (which is also a typed dataset) to is the type of the final column.

data DynamicColProjection Source #

The class of projections that require some runtime introspection to confirm that the projection is valid.

unsafeStaticProjection Source #

Arguments

:: HasCallStack 
=> SQLType from

The start type

-> String

The name of a field assumed to be found in the start type. This only has to be valid for Spark purposes, not internal Haskell representation.

-> StaticColProjection from to 

Lets the users define their own static projections.

Throws an error if the type cannot be found, so should be used with caution.

String has to be used because of type inferrence issues

Column functions

colType :: Column ref a -> SQLType a Source #

The type of a column.

untypedCol :: Column ref a -> DynColumn Source #

Converts a type column to an antyped column.

colFromObs :: HasCallStack => LocalData a -> Column (LocalData a) a Source #

Takes an observable and makes it available as a column of the same type.

colFromObs' :: HasCallStack => LocalFrame -> DynColumn Source #

Takes a dynamic observable and makes it available as a dynamic column.