dawgdic: Generation and traversal of compressed directed acyclic dawg graphs

Versions [RSS]	0.1.0, 1.0.0
Change log	CHANGELOG.md
Dependencies	base (>=4.17.2.1 && <4.25.0.0), binary, bitvec, bytestring, containers, deepseq, hashable, optparse-applicative, primitive, vector, vector-algorithms, vector-hashtables [details]
License	BSD-3-Clause
Author	Andrey Prokopenko
Maintainer	persiantiger@yandex.ru
Uploaded	by swamp_agr at 2026-01-10T12:52:46Z
Category	Data
Distributions
Executables	dawgdic
Downloads	11 total (4 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs uploaded by user Build status unknown [no reports yet]

Readme for dawgdic-1.0.0

[back to package description]

dawgdic

Overview

dawgdic is a library for building and accessing dictionaries implemented with directed acyclic word graphs (DAWG).

This is a ported version of C++ dawgdic library.

Features

This library offers DAWG, Dictionary, Guide and Completer data types as well as their builders.

DAWG represents word graph. It is provided as intermediate data structure.
Dictionary represents compact layout for DAWG. It offers API to check the presence of the word and associated value in the graph. Also it could be stored into a file and loaded from file.
Guide is being used as a tree-like index to get the faster navigation through the dictionary. Also could be stored into a file and loaded from file.

Input characters must be in range 0-255.

This port does not contain RankedGuide and RankedCompleter (yet).

Comparison with other DAWG libraries

dictionary file size, in bytes:

package/lexicon size (words)	2.6K	26K	370K
dawg: Data.DAWG.Dynamic	370868	2823660	29933964
dawg: Data.DAWG.Static	133138	1030578	11128690
dawg-ord: Data.DAWG.Int	N/A	N/A	N/A
dawg-ord: Data.DAWG.Ord	N/A	N/A	N/A
packed-dawg: Data.DAWG.Packed	15619	128491	1481671
dawgdic: Data.DAWG.Dictionary	3088	141328	1603600
dawgdic: Guide (w/ Dictionary)	4640	212000	2405408

features:

feature	dawg	dawg-ord	packed-dawg	dawgdic
build from list	+	+	+	+
persistence	+	-	+	+
insert	Dynamic	+	-	Builder
delete key	Dynamic	+	-	-
follow character	Static	~edges	lookupPrefix	+
lookup value	+	+	-	+
member	~lookup	~lookup	+	+
keys	+	+	+	+
values	+	+	-	+
toList (assocs)	+	+	-	+
complete word	~submap	-	~toList	+
fuzzy search	-	-	-	-
max size	N/A	N/A	2^22 nodes	N/A

Installation

Add dawgdic dependency to your project and run cabal build.

Building and Querying

Building DAWG from lexicon of words and ignoring insertion failures is as simple as this:

>>> import Data.DAWG.DAWG
>>> dawg <- fromAscList . lines =<< readFile "/path/to/lexicon"

Otherwise, consider mutable builder. It could also be useful if words are associated with values.

buildOrError content = do
  dawgBuilder <- new
  forM_ content \(word, value) -> do
    result <- insert word (Just value) dawgBuilder
    unless result $ error "Insert failed"
  freeze dawgBuilder

Building dictionary and guide:

>>> import qualified Data.DAWG.Dictionary as D
>>> import qualified Data.DAWG.Guide as G
>>> dict <- D.build' dawg
>>> guide <- G.build' dawg dict

Saving dictionary and guide:

>>> D.write "dict.dawg" dict
>>> G.write "guide.dawg" guide

Loading dictionary and guide:

>>> dict <- D.load "dict.dawg"
>>> guide <- G.load "guide.dawg"

Consider using Completer for auto-complete-like queries or if you need to obtain lexicon back from Dictionary and Guide.

Benchmarks

Utilities

Benchmark                                   default(μs)
------------------------------------------- -----------
Utilities/10/Dawg.fromAscList                   5549.76
Utilities/10/Dict.build'                       26981.64
Utilities/10/Dict.contains                        22.46
Utilities/10/Dict.lookup                          22.68
Utilities/10/Dict.follow                          22.42
Utilities/10/Guide.build'                         76.81
Utilities/10/Completer.completeKeys              293.73
Utilities/10/RankedCompleter.completeKeys       3437.14
------------------------------------------- ------------
Utilities/100/Dawg.fromAscList                  66486.21
Utilities/100/Dict.build'                      194881.38
Utilities/100/Dict.contains                       326.16
Utilities/100/Dict.lookup                         323.45
Utilities/100/Dict.follow                         319.86
Utilities/100/Guide.build'                        782.24
Utilities/100/Completer.completeKeys             5157.75
Utilities/100/RankedCompleter.completeKeys      66817.64
------------------------------------------- ------------
Utilities/1000/Dawg.fromAscList                888061.61
Utilities/1000/Dict.build'                    1659798.44
Utilities/1000/Dict.contains                     3627.54
Utilities/1000/Dict.lookup                       3638.10
Utilities/1000/Dict.follow                       3564.64
Utilities/1000/Guide.build'                      7992.73
Utilities/1000/Completer.completeKeys           56788.87
Utilities/1000/RankedCompleter.completeKeys   1333117.49
------------------------------------------- ------------

How to reproduce:

Install bench-show:

cabal install bench-show --overwrite-policy=always

Run and wait (it might take around 3m to complete):

time cabal bench +RTS "-N4 -A64m -n4m -qb0" -RTS  --benchmark-options="--output bench.html --csv results.csv"

Generate report:

bench-show --presentation=Solo report results.csv

Comparison

Benchmark                      default(μs)
------------------------------ -----------
10/dawgdic.follow                    21.04
10/dawg.follow                       85.41
10/dawg-ord.follow                  110.42
10/packed-dawg.follow               147.30
100/dawgdic.follow                  311.05
100/dawg.follow                    1367.81
100/dawg-ord.follow                1774.78
100/packed-dawg.follow             2098.12
1000/dawgdic.follow                3273.02
1000/dawg.follow                  18842.47
1000/dawg-ord.follow              21992.03
1000/packed-dawg.follow           28938.01
------------------------------ -----------
10/dawgdic.lookup value              21.74
10/dawg.lookup value                 61.89
10/dawg-ord.lookup value            209.93
100/dawgdic.lookup value            290.13
100/dawg.lookup value               847.27
100/dawg-ord.lookup value          2973.35
1000/dawgdic.lookup value          3551.44
1000/dawg.lookup value            14217.29
1000/dawg-ord.lookup value        40111.28
10/dawgdic.member                    21.02
------------------------------ -----------
10/dawg.member                       56.43
10/dawg-ord.member                  191.74
10/packed-dawg.member               173.97
100/dawgdic.member                  343.32
100/dawg.member                     873.70
100/dawg-ord.member                2863.96
100/packed-dawg.member             2200.33
1000/dawgdic.member                9288.22
1000/dawg.member                  13878.72
1000/dawg-ord.member              31428.47
1000/packed-dawg.member           29089.83
------------------------------ -----------
10/dawgdic.keys                     108.33
10/dawg.keys                        100.23
10/dawg-ord.keys                    160.51
10/packed-dawg.keys                  50.50
100/dawgdic.keys                   1392.78
100/dawg.keys                      1413.00
100/dawg-ord.keys                  2324.99
100/packed-dawg.keys                692.23
1000/dawgdic.keys                 15628.83
1000/dawg.keys                    16541.79
1000/dawg-ord.keys                27612.77
1000/packed-dawg.keys              7415.48
------------------------------ -----------
10/dawgdic.values                    52.26
10/dawg.values                       72.29
10/dawg-ord.values                  134.79
100/dawgdic.values                  616.02
100/dawg.values                    1060.87
100/dawg-ord.values                1794.82
1000/dawgdic.values                6457.32
1000/dawg.values                  11734.25
1000/dawg-ord.values              21532.54
------------------------------ -----------
10/dawgdic.toList                   107.26
10/dawg.toList                      116.89
10/dawg-ord.toList                  202.35
100/dawgdic.toList                 1367.87
100/dawg.toList                    1608.26
100/dawg-ord.toList                2898.26
1000/dawgdic.toList               16334.34
1000/dawg.toList                  18578.59
1000/dawg-ord.toList              36360.44
------------------------------ -----------
10/dawgdic.complete word            305.66
10/dawg.complete word               240.24
10/packed-dawg.complete word        315.09
100/dawgdic.complete word          5341.40
100/dawg.complete word             4497.49
100/packed-dawg.complete word      4866.73
1000/dawgdic.complete word        59699.40
1000/dawg.complete word           55223.88
1000/packed-dawg.complete word    59961.42
------------------------------ -----------

How to reproduce:

Install bench-show:

cabal install bench-show --overwrite-policy=always

Run and wait (it might take around 7m to complete):

time cabal bench +RTS "-N4 -A64m -n4m -qb0" -RTS  --benchmark-options="--output bench.html --csv results.csv"

Generate report:

bench-show --presentation=Solo report results.csv

Contributing

In cases of issues please attach callstack, provide minimal dictionary lexicon and provide logs with enabled tracing.

git pull
git switch traces
cabal build -ftrace

Acknowledgments

Susumu Yata as original author of C++ library.