License | BSD-style |
---|---|
Maintainer | palkovsky.ondrej@gmail.com |
Stability | experimental |
Portability | portable |
Safe Haskell | None |
Language | Haskell2010 |
An incremental applicative-style JSON parser, suitable for high performance memory efficient stream parsing.
The parser is using Data.Aeson types and FromJSON
instance, it can be
easily combined with aeson monadic parsing instances when appropriate.
- data Parser a
- data ParseOutput a
- = ParseYield a (ParseOutput a)
- | ParseNeedData (ByteString -> ParseOutput a)
- | ParseFailed String
- | ParseDone ByteString
- (>^>) :: Parser a -> Parser a -> Parser a
- runParser :: Parser a -> ParseOutput a
- runParser' :: Parser a -> ByteString -> ParseOutput a
- parseByteString :: Parser a -> ByteString -> [a]
- parseLazyByteString :: Parser a -> ByteString -> [a]
- value :: FromJSON a => Parser a
- string :: Parser Text
- bytestring :: Parser ByteString
- safeString :: Int -> Parser Text
- number :: Parser Scientific
- integer :: (Integral i, Bounded i) => Parser i
- real :: RealFloat a => Parser a
- bool :: Parser Bool
- jNull :: Parser ()
- (.:) :: Text -> Parser a -> Parser a
- (.:?) :: Text -> Parser a -> Parser (Maybe a)
- (.|) :: Parser a -> a -> Parser a
- (.!) :: Int -> Parser a -> Parser a
- objectWithKey :: Text -> Parser a -> Parser a
- objectItems :: Parser a -> Parser (Text, a)
- objectValues :: Parser a -> Parser a
- arrayOf :: Parser a -> Parser a
- arrayWithIndexOf :: Int -> Parser a -> Parser a
- indexedArrayOf :: Parser a -> Parser (Int, a)
- nullable :: Parser a -> Parser (Maybe a)
- filterI :: (a -> Bool) -> Parser a -> Parser a
- takeI :: Int -> Parser a -> Parser a
- toList :: Parser a -> Parser [a]
- arrayFound :: a -> a -> Parser a -> Parser a
- objectFound :: a -> a -> Parser a -> Parser a
How to use this library
>>> parseByteString value "[1,2,3]" :: [[Int]] [[1,2,3]]
The value
parser matches any FromJSON
value. The above command is essentially
identical to the aeson decode function; the parsing process can generate more
objects, therefore the results is [a].
Example of json-stream style parsing:
>>> parseByteString (arrayOf integer) "[1,2,3]" :: [Int] [1,2,3]
Parsers can be combinated using <*>
and <|>
operators. The parsers are
run in parallel and return combinations of the parsed values.
JSON: text = [{"name": "John", "age": 20}, {"age": 30, "name": "Frank"} ] >>> let parser = arrayOf $ (,) <$> "name" .: string <*> "age" .: integer >>> parseByteString parser text :: [(Text,Int)] [("John",20),("Frank",30)]
When parsing larger values, it is advisable to use lazy ByteStrings. The parsing is then more memory efficient as less lexical state is needed to be held in memory for parallel parsers.
More examples are available on https://github.com/ondrap/json-stream.
Performance
The parser tries to do the least amount of work to get the job done, skipping over items that are not required. General guidelines to get best performance:
Do not use the value
parser for the whole object if the object is big. Using json-stream
parsers will produce better results with less memory. The integer
parser was optimized in such
a way that the integer numbers skip the conversion to scientific, which is unavoidable in aeson.
It is possible to use the *>
operator to filter objects based on a condition, e.g.:
arrayOf $ id <$> "error" .: number *> "name" .: string
This will return all objects that contain attribute error with number content. The parser will skip trying to decode the name attribute if error is not found.
Constant space decoding
Constant space decoding is possible if the grammar does not specify non-constant
operations. The non-constant operations are value
, string
, toList
and in some instances
<*>
.
The value
parser works by creating an aeson AST and passing it to the
parseJSON
method. The AST can consume a lot of memory before it is rejected
in parseJSON
. To achieve constant space the parsers safeString
, number
, integer
,
real
and bool
must be used; these parsers reject and do not parse data if it does not match the
type.
The object key length is limited to ~64K. Object records with longer key are ignored and unparsed.
Numbers are limited to 200.000 digits. Longer numbers will make the parsing fail.
The toList
parser works by accumulating all matched values. Obviously, number
of such values influences the amount of used memory.
The <*>
operator runs both parsers in parallel and when they are both done, it
produces combinations of the received values. It is constant-space as long as the
number of element produced by child parsers is limited by a constant. This can be achieved by using
.!
and .:
functions combined with constant space
parsers or limiting the number of returned elements with takeI
.
If the source object contains an object with multiple keys with a same name,
json-stream matches the key multiple times. The only exception
is objectWithKey
(.:
and .:?
) that return at most one value for a given key.
Aeson compatibility
The parser uses internally Data.Aeson types, so that the FromJSON instances are
directly usable with the value
parser. It may be more convenient to parse the
outer structure with json-stream and the inner objects with aeson as long as constant-space
decoding is not required.
Json-stream defines the object-access operators .:
, .:?
but in a slightly different albeit more natural way. New operators are .!
for
array access and .|
to handle missing values.
-- JSON: [{"name": "test1", "value": 1}, {"name": "test2", "value": null}, {"name": "test3"}] >>> let person = (,) <$> "name" .: string >>> <*> "value" .: integer .| (-1) >>> let people = arrayOf person >>> parseByteString people (..JSON..) :: [(Text, Int)] [("test1",1),("test2",-1),("test3",-1)]
The Parser
type
A representation of the parser.
Alternative Parser |
json: [{"key1": [1,2], "key2": [5,6], "key3": [8,9]}] >>> let parser = arrayOf $ "key1" .: (arrayOf value) <|> "key2" .: (arrayOf value) >>> parseByteString parser json :: [Int] [1,2,5,6] |
Functor Parser | |
Applicative Parser |
JSON: text = [{"name": "John", "age": 20}, {"age": 30, "name": "Frank"} ] >>> let parser = arrayOf $ (,) <$> "name" .: string <*> "age" .: integer >>> parseByteString parser text :: [(Text,Int)] [("John",20),("Frank",30)] |
data ParseOutput a Source
Result of parsing. Contains continuations to continue parsing.
ParseYield a (ParseOutput a) | Returns a value from a parser. |
ParseNeedData (ByteString -> ParseOutput a) | Parser needs more data to continue parsing. |
ParseFailed String | Parsing failed, error is reported. |
ParseDone ByteString | Parsing finished, unparsed data is returned. |
Operators
(>^>) :: Parser a -> Parser a -> Parser a infixl 3 Source
Match items from the first parser, if none is matched, return items
from the second parser. Constant-space if second parser returns
constant number of items. .|
is implemented using this operator.
>>> let json = "[{\"key1\": [1,2], \"key2\": [5,6], \"key3\": [8,9]}]" >>> let parser = arrayOf $ "key1" .: (arrayOf value) >^> "key2" .: (arrayOf value) >>> parseByteString parser json :: [Int] [1,2] >>> let parser = arrayOf $ "key-non" .: (arrayOf value) >^> "key2" .: (arrayOf value) >>> parseByteString parser json :: [Int] [5,6]
Parsing functions
runParser :: Parser a -> ParseOutput a Source
Run streaming parser, immediately returns ParseNeedData
.
runParser' :: Parser a -> ByteString -> ParseOutput a Source
Run streaming parser with initial input.
parseByteString :: Parser a -> ByteString -> [a] Source
Parse a bytestring, generate lazy list of parsed values. If an error occurs, throws an exception.
parseByteString (arrayOf integer) "[1,2,3,4]" :: [Int] [1,2,3,4]
parseByteString (arrayOf ("name" .: string)) "[{\"name\":\"KIWI\"}, {\"name\":\"BIRD\"}]" ["KIWI","BIRD"]
parseLazyByteString :: Parser a -> ByteString -> [a] Source
Parse a lazy bytestring, generate lazy list of parsed values. If an error occurs, throws an exception.
FromJSON parser
value :: FromJSON a => Parser a Source
Match FromJSON
value. Calls parseJSON on the parsed value.
>>> let json = "[{\"key1\": [1,2], \"key2\": [5,6]}]" >>> parseByteString (arrayOf value) json :: [Value] [Object fromList [("key2",Array (fromList [Number 5.0,Number 6.0])),("key1",Array (fromList [Number 1.0,Number 2.0]))]]
bytestring :: Parser ByteString Source
Match string as a ByteString without decoding the data from UTF8 (strings larger than input chunk, small get always decoded).
Constant space parsers
safeString :: Int -> Parser Text Source
Stops parsing string after the limit is reached. The string will not be matched if it exceeds the size.
number :: Parser Scientific Source
Parse number, return in scientific format.
integer :: (Integral i, Bounded i) => Parser i Source
Parse to bounded integer type (not Integer
).
If you are using integer numbers, use this parser.
It skips the conversion JSON -> Scientific
-> Int
and uses an Int
directly.
Structure operators
(.:) :: Text -> Parser a -> Parser a infixr 7 Source
Synonym for objectWithKey
. Matches key in an object. The .:
operators can be chained.
>>> let json = "{\"key1\": {\"nested-key\": 3}}" >>> parseByteString ("key1" .: "nested-key" .: integer) json :: [Int] [3]
(.|) :: Parser a -> a -> Parser a infixl 6 Source
Return default value if the parsers on the left hand didn't produce a result.
p .| defval = p >^> pure defval
The operator works on complete left side, the following statements are equal:
Record <$> "key1" .: "nested-key" .: value .| defaultValue Record <$> (("key1" .: "nested-key" .: value) .| defaultValue)
(.!) :: Int -> Parser a -> Parser a infixr 7 Source
Synonym for arrayWithIndexOf
. Matches n-th item in array.
>>> parseByteString (arrayOf (1 .! bool)) "[ [1,true,null], [2,false], [3]]" :: [Bool] [True,False]
Structure parsers
objectWithKey :: Text -> Parser a -> Parser a Source
Match only specific key of an object. This function will return only the first matched value in an object even if the source JSON defines the key multiple times (in violation of the specification).
objectItems :: Parser a -> Parser (Text, a) Source
Match all key-value pairs of an object, return them as a tuple. If the source object defines same key multiple times, all values are matched.
objectValues :: Parser a -> Parser a Source
Match all key-value pairs of an object, return only values. If the source object defines same key multiple times, all values are matched. Keys are ignored.
arrayWithIndexOf :: Int -> Parser a -> Parser a Source
Match nith item in an array.
indexedArrayOf :: Parser a -> Parser (Int, a) Source
Match all items of an array, add index to output.
Parsing modifiers
filterI :: (a -> Bool) -> Parser a -> Parser a Source
Let only items matching a condition pass.
>>> parseByteString (filterI (>5) $ arrayOf integer) "[1,2,3,4,5,6,7,8,9,0]" :: [Int] [6,7,8,9]
takeI :: Int -> Parser a -> Parser a Source
Take maximum n matching items.
>>> parseByteString (takeI 3 $ arrayOf integer) "[1,2,3,4,5,6,7,8,9,0]" :: [Int] [1,2,3]
toList :: Parser a -> Parser [a] Source
Gather matches and return them as list.
>>> let json = "[{\"keys\":[1,2], \"values\":[5,6]}, {\"keys\":[9,8], \"values\":[7,6]}]" >>> let parser = arrayOf $ (,) <$> toList ("keys" .: arrayOf integer) <*> toList ("values" .: arrayOf integer) >>> parseByteString parser json :: [([Int], [Int])] [([1,2],[5,6]),([9,8],[7,6])]
SAX-like parsers
arrayFound :: a -> a -> Parser a -> Parser a Source
Generate start/end values when an object is found, in between run a parser. The inner parser is not run if an array is not found.
>>> let test = "[[1,2,3],true,[],false,{\"key\":1}]" :: ByteString >>> parseByteString (arrayOf (arrayFound 10 20 (1 .! integer))) test :: [Int] [10,2,20,10,20]
objectFound :: a -> a -> Parser a -> Parser a Source
Generate start/end values when an array is found, in between run a parser. The inner parser is not run if an array is not found.