parsergen
Introduction
parsergen
is a library aimed at generating fast Haskell parsers for fixed
width packets. It uses a DSL in which these packets can be specified, augmented
with Haskell parsers.
In order to create a packet and a parser for it, usually two files are used,
Foo.hs
and Foo.ths
.
Tutorial
Datatypes and parsers
Syntax
Let's start by defining a datatype in the .ths
file. The syntax here is:
TypeName
ConstructorName [fields prefix]
[Nx] [_]FieldName [!] FieldType [+]FieldWidth [FieldParser]
where
-
TypeName
: Name of the type itself, e.g. Maybe
-
ConstructorName
: Name of constructor with given set of fields. If no prefix
is provided, downcased capital letters from the constructor name will be used
instead.
-
Nx
: Number of times to repeat this matcher
-
FieldName
: Name of the field which will be used (with constructor prefix
prepended)
-
_
: This field will be ignored (skipped if possible or parsed)
-
!
: This field will be strict
-
FieldType
: type name when using existing datatype, e.g. Int
or
ByteString
, or a custom type Foo
-
FieldWidth
: Number for size based parsing, e.g. 12
. This field is needed
to perform some optimisations as well, so you have to specify field width even
if you going to specify FieldParser
.
-
+
: Only for numerical fields: the first character will be treated as the
sign
-
FieldParser
: A parser which will be used to parse it. This can be omitted
for types such as Int
or ByteString
. Otherwise, you can either specify a
fixed string or a parser of the type Parser
.
In the .hs
file, one can now use:
$(genDataTypeFromFile "Foo.ths")
$(genParserFromFile "Foo.ths")
to generate a parser and a datatype for it.
Example
Let's look at an example .ths
file:
Packet
Warning
_PacketType ByteString 4 "WARN"
DangerType DangerType 2 dangerType
ChanceOfSurvival Int 3
LotteryWin
_PacketType ByteString 4 "LOTT"
Amount Money 10
6x WinningEntry LotteryEntry 2
And the .hs
file:
{-# LANGUAGE OverloadedStrings, TemplateHaskell #-}
import Data.ByteString (ByteString)
import ParserGen.Gen
import ParserGen.Repack -- Needed later on
import qualified ParserGen.Parser as P
data DangerType
= Earthquake
| ZombieApocalypse
| RobotUprising
| AngryGirlfriend
deriving (Eq, Show)
dangerType :: P.Parser DangerType
dangerType = do
bs <- P.take 2
case bs of
"EQ" -> return Earthquake
"ZA" -> return ZombieApocalypse
"RI" -> return RobotUprising
"AG" -> return AngryGirlfriend
_ -> fail $ "Unknown danger type: " ++ show bs
newtype Money = Money Int
deriving (Eq, Show)
type LotteryEntry = Int
$(genDataTypeFromFile "Packet.ths")
$(genParserFromFile "Packet.ths")
sampleWarning :: ByteString
sampleWarning = "WARNRI002"
sampleLotteryWin :: ByteString
sampleLotteryWin = "LOTT9999999999040815162342"
main :: IO ()
main = do
print $ P.parse parserForWarning sampleWarning
print $ P.parse parserForLotteryWin sampleLotteryWin
The parsergen
generates:
- The
Packet
datatype
- The parser functions
parserForWarning, parserForLotteryWin :: Parser Packet
Note how we have used three kinds of parsers:
"WARN"
is an example of a hardcoded string which the packet must match
dangerType
is a custom parser, specified in the Haskell file
- We don't specify parsers for numeral types, these are automatically derived
(even for
newtype
s and type
synonyms)
Repackers
A powerful feature from the library, repackers allow us to change the contents
of multiple fields without actually parsing a packet.
Syntax
The syntax looks like this:
repackerForName ConstructorName
FieldName [FieldUnParser]
Example
Let's add the following the bottom of our .ths
file:
repackerForLotteryNumbers LotteryWin
WinningEntry
And the following to our Haskell file:
$(genRepackFromFile "Packet.ths")
which generates the function
repackerForLotteryNumbers :: [LotteryEntry] -> ByteString -> ByteString
Use it like:
print $ repackerForLotteryNumbers [1 .. 6] sampleLotteryWin
Things to note:
- For numerical types, you don't need to specify an unparser, this is only
needed for custom types. These should have the type
SomeType -> ByteString
.
- The repacker will take a list when the field is repeated (e.g.
6x
in this
case) and a single value otherwise