dawgdic: Generation and traversal of compressed directed acyclic dawg graphs

[ bsd3, data, library ] [ Propose Tags ] [ Report a vulnerability ]
Versions [RSS] 0.1.0
Change log CHANGELOG.md
Dependencies base (>=4.17.2.1 && <4.25.0.0), binary, bitvec, deepseq, hashable, primitive, vector, vector-binary, vector-hashtables [details]
License BSD-3-Clause
Author Andrey Prokopenko
Maintainer persiantiger@yandex.ru
Category Data
Uploaded by swamp_agr at 2025-08-28T19:22:45Z
Distributions
Downloads 3 total (3 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for dawgdic-0.1.0

[back to package description]

dawgdic

Overview

dawgdic is a library for building and accessing dictionaries implemented with directed acyclic word graphs (DAWG).

This is a ported version of C++ dawgdic library.

Features

This library offers DAWG, Dictionary, Guide and Completer data types as well as their builders.

  • DAWG represents word graph. It is provided as intermediate data structure.
  • Dictionary represents compact layout for DAWG. It offers API to check the presence of the word and associated value in the graph. Also it could be stored into a file and loaded from file.
  • Guide is being used as a tree-like index to get the faster navigation through the dictionary. Also could be stored into a file and loaded from file.

Input characters must be in range 0-255.

This port does not contain RankedGuide and RankedCompleter (yet).

Comparison with other DAWG libraries

  • dictionary file size, in bytes:
package/lexicon size (words) 2.6K 26K 370K
dawg: Data.DAWG.Dynamic 370868 2823660 29933964
dawg: Data.DAWG.Static 133138 1030578 11128690
dawg-ord: Data.DAWG.Int N/A N/A N/A
dawg-ord: Data.DAWG.Ord N/A N/A N/A
packed-dawg: Data.DAWG.Packed 15619 128491 1481671
dawgdic: Data.DAWG.Dictionary 3088 141328 1603600
dawgdic: Guide (w/ Dictionary) 4640 212000 2405408
  • features:
feature dawg dawg-ord packed-dawg dawgdic
build from list + + + +
persistence + - + +
insert Dynamic + - Builder
delete key Dynamic + - -
follow character Static ~edges lookupPrefix +
lookup value + + - +
member ~lookup ~lookup + +
keys + + + +
values + + - +
toList (assocs) + + - +
complete word ~submap - ~toList +
fuzzy search - - - -
max size N/A N/A 2^22 nodes N/A

Installation

Add dawgdic dependency to your project and run cabal build.

Building and Querying

Building DAWG from lexicon of words and ignoring insertion failures is as simple as this:

>>> import Data.DAWG.DAWG
>>> dawg <- fromAscList . lines =<< readFile "/path/to/lexicon"

Otherwise, consider mutable builder. It could also be useful if words are associated with values.

buildOrError content = do
  dawgBuilder <- new
  forM_ content \(word, value) -> do
    result <- insert word (Just value) dawgBuilder
    unless result $ error "Insert failed"
  freeze dawgBuilder

Building dictionary and guide:

>>> import qualified Data.DAWG.Dictionary as D
>>> import qualified Data.DAWG.Guide as G
>>> dict <- D.build' dawg
>>> guide <- G.build' dawg dict

Saving dictionary and guide:

>>> D.write "dict.dawg" dict
>>> G.write "guide.dawg" guide

Loading dictionary and guide:

>>> dict <- D.load "dict.dawg"
>>> guide <- G.load "guide.dawg"

Consider using Completer for auto-complete-like queries or if you need to obtain lexicon back from Dictionary and Guide.

Benchmarks

Utilities

Benchmark                             default(μs)
------------------------------------- ------------
Utilities/10/Dawg.fromAscList             5549.76
Utilities/10/Dict.build'                 26981.64
Utilities/10/Dict.contains                  22.46
Utilities/10/Dict.lookup                    22.68
Utilities/10/Dict.follow                    22.42
Utilities/10/Guide.build'                   76.81
Utilities/10/Completer.completeKeys        385.35
------------------------------------- ------------
Utilities/100/Dawg.fromAscList           66486.21
Utilities/100/Dict.build'               194881.38
Utilities/100/Dict.contains                326.16
Utilities/100/Dict.lookup                  323.45
Utilities/100/Dict.follow                  319.86
Utilities/100/Guide.build'                 782.24
Utilities/100/Completer.completeKeys      7016.36
------------------------------------- ------------
Utilities/1000/Dawg.fromAscList         888061.61
Utilities/1000/Dict.build'             1659798.44
Utilities/1000/Dict.contains              3627.54
Utilities/1000/Dict.lookup                3638.10
Utilities/1000/Dict.follow                3564.64
Utilities/1000/Guide.build'               7992.73
Utilities/1000/Completer.completeKeys    82343.20
------------------------------------- ------------

How to reproduce:

  • Install bench-show:
cabal install bench-show --overwrite-policy=always
  • Run and wait (it might take around 3m to complete):
time cabal bench +RTS "-N4 -A64m -n4m -qb0" -RTS  --benchmark-options="--output bench.html --csv results.csv"
  • Generate report:
bench-show --presentation=Solo report results.csv

Comparison

Benchmark                      default(μs)
------------------------------ -----------
10/dawgdic.follow                    21.04
10/dawg.follow                       85.41
10/dawg-ord.follow                  110.42
10/packed-dawg.follow               147.30
100/dawgdic.follow                  311.05
100/dawg.follow                    1367.81
100/dawg-ord.follow                1774.78
100/packed-dawg.follow             2098.12
1000/dawgdic.follow                3273.02
1000/dawg.follow                  18842.47
1000/dawg-ord.follow              21992.03
1000/packed-dawg.follow           28938.01
------------------------------ -----------
10/dawgdic.lookup value              21.74
10/dawg.lookup value                 61.89
10/dawg-ord.lookup value            209.93
100/dawgdic.lookup value            290.13
100/dawg.lookup value               847.27
100/dawg-ord.lookup value          2973.35
1000/dawgdic.lookup value          3551.44
1000/dawg.lookup value            14217.29
1000/dawg-ord.lookup value        40111.28
10/dawgdic.member                    21.02
------------------------------ -----------
10/dawg.member                       56.43
10/dawg-ord.member                  191.74
10/packed-dawg.member               173.97
100/dawgdic.member                  343.32
100/dawg.member                     873.70
100/dawg-ord.member                2863.96
100/packed-dawg.member             2200.33
1000/dawgdic.member                9288.22
1000/dawg.member                  13878.72
1000/dawg-ord.member              31428.47
1000/packed-dawg.member           29089.83
------------------------------ -----------
10/dawgdic.keys                     108.33
10/dawg.keys                        100.23
10/dawg-ord.keys                    160.51
10/packed-dawg.keys                  50.50
100/dawgdic.keys                   1392.78
100/dawg.keys                      1413.00
100/dawg-ord.keys                  2324.99
100/packed-dawg.keys                692.23
1000/dawgdic.keys                 15628.83
1000/dawg.keys                    16541.79
1000/dawg-ord.keys                27612.77
1000/packed-dawg.keys              7415.48
------------------------------ -----------
10/dawgdic.values                    52.26
10/dawg.values                       72.29
10/dawg-ord.values                  134.79
100/dawgdic.values                  616.02
100/dawg.values                    1060.87
100/dawg-ord.values                1794.82
1000/dawgdic.values                6457.32
1000/dawg.values                  11734.25
1000/dawg-ord.values              21532.54
------------------------------ -----------
10/dawgdic.toList                   107.26
10/dawg.toList                      116.89
10/dawg-ord.toList                  202.35
100/dawgdic.toList                 1367.87
100/dawg.toList                    1608.26
100/dawg-ord.toList                2898.26
1000/dawgdic.toList               16334.34
1000/dawg.toList                  18578.59
1000/dawg-ord.toList              36360.44
------------------------------ -----------
10/dawgdic.complete word            367.97
10/dawg.complete word               248.14
10/packed-dawg.complete word        303.69
100/dawgdic.complete word          6151.51
100/dawg.complete word             4494.85
100/packed-dawg.complete word      4913.83
1000/dawgdic.complete word        71152.49
1000/dawg.complete word           58222.76
1000/packed-dawg.complete word    60828.53
------------------------------ -----------

How to reproduce:

  • Install bench-show:
cabal install bench-show --overwrite-policy=always
  • Run and wait (it might take around 7m to complete):
time cabal bench +RTS "-N4 -A64m -n4m -qb0" -RTS  --benchmark-options="--output bench.html --csv results.csv"
  • Generate report:
bench-show --presentation=Solo report results.csv

Contributing

In cases of issues please attach callstack, provide minimal dictionary lexicon and provide logs with enabled tracing.

cabal build -ftrace

Acknowledgments