karps-0.2.0.0: Haskell bindings for Spark Dataframes and Datasets

Safe HaskellNone
LanguageHaskell2010

Spark.Core.Internal.ContextStructures

Synopsis

Documentation

data SparkSessionConf Source #

The configuration of a remote spark session in Karps.

Constructors

SparkSessionConf 

Fields

  • confEndPoint :: !Text

    The URL of the end point.

  • confPort :: !Int

    The port used to configure the end point.

  • confPollingIntervalMillis :: !Int

    (internal) the polling interval

  • confRequestedSessionName :: !Text

    (optional) the requested name of the session. This name must obey a number of rules: - it must consist in alphanumerical and -,_: [a-zA-Z0-9-_] - if it already exists on the server, it will be reconnected to

    The default value is "" (a new random context name will be chosen).

  • confUseNodePrunning :: !Bool

    If enabled, attempts to prune the computation graph as much as possible.

    This option is useful in interactive sessions when long chains of computations are extracted. This forces the execution of only the missing parts. The algorithm is experimental, so disabling it is a safe option.

    Disabled by default.

data SparkSession Source #

A session in Spark. Encapsualates all the state needed to communicate with Spark and to perfor some simple optimizations on the code.

type SparkState a = SparkStateT IO a Source #

Represents the state of a session and accounts for the communication with the server.

type ComputeGraph = ComputeDag UntypedNode StructureEdge Source #

internal

A graph of computations. This graph is a direct acyclic graph. Each node is associated to a global path.

data HdfsPath Source #

A path in the Hadoop File System (HDFS).

These paths are usually not created by the user directly.

Constructors

HdfsPath Text 

data NodeCacheInfo Source #

This structure describes the last time a node was observed by the controller, and the state it was in.

This information is used to do smart computation pruning, by assuming that the observables are kept by the Spark processes.

data NodeCacheStatus Source #

The status of a node being computed.

On purpose, it does not store data. This is meant to be only the control plane of the compuations.