spelling-suggest: Spelling suggestion tool with library and command-line interfaces.

[ bsd3, console, library, program, text ] [ Propose Tags ]

Given a possibly-misspelled word, this tool spits out one or more properly-spelled words in order of likelihood of similarity.

This functionality is exported as a library via Text.SpellingSuggest (suggest) and as a command-line program "thimk" (an old joke)

Running the program "thimk-makedb" is an optional (but highly recommended) step to speed up lookups, permitting reasonable performance on enormous dictionaries by creating a precompiled SQlite database of phonetic codes for a dictionary.


[Skip to Readme]

Flags

Automatic Flags
NameDescriptionDefault
debugEnabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.5.0, 0.5.0.1, 0.5.1, 0.5.1.0, 0.5.2.0, 0.5.2.1
Dependencies base (>=4.2 && <5), edit-distance (>=0.2 && <0.3), parseargs (>=0.1.1 && <0.2), phonetic-code (>=0.1 && <0.2), sqlite (>=0.5.1 && <0.6) [details]
License BSD-3-Clause
Copyright Copyright © 2010 Bart Massey and Greg Weber
Author Bart Massey and Greg Weber
Maintainer bart@cs.pdx.edu, greg@gregweber.info
Category Console, Text
Home page https://github.com/gregwebs/haskell-spell-suggest
Source repo head: git clone git://github.com/gregwebs/haskell-spell-suggest.git
this: git clone git://github.com/gregwebs/haskell-spell-suggest.git(tag v0.5.2.1)
Uploaded by BartonMassey at 2012-08-27T05:14:28Z
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Executables thimk-makedb, thimk
Downloads 4366 total (16 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for spelling-suggest-0.5.2.1

[back to package description]

Thimk and spelling-suggest

Spelling word suggestion tool and library
Copyright © 2010 Bart Massey and Greg Weber

This software is licensed under the "3-clause ('new') BSD License". Please see the file COPYING provided with this distribution for license terms.

This package is a newer version of the original package called "thimk".

"thimk" (an old joke) is a command-line spelling word suggestion tool. You give it a possibly-misspelled word, and it spits out one or more properly-spelled words in order of likelihood of similarity.

Thimk is structured as a command-line interface to its spelling-suggest library, originally split out by Greg Weber. You can use this library for other applications also. There is sufficient Haddock to work out how to use it. It is packaged on Hackage as spelling-suggest.

There is little documentation of the thimk command as-of yet, but the usage message from the program should tell everything needed to get started with it.

The idea and name for thimk came from an old program that used to hang around Reed College, probably written by Graham Ross and now apparently lost in the mists of time. See this Usenet post for the one very vague reference I've found on the web (in the SEE ALSO section of the referenced manpage). I originally re-implemented thimk in Nickle some years ago, but that implementation has been slow, clunky, and non-portable.

The current implementation is a bit more sophisticated than I recall the original being. By default it uses a prefilter that discards words with large edit distances from the target, then filters words with a different phonetic code than the target, then presents the top result sorted by edit distance.

The Soundex and Phonix phonetic codes are designed for names, but seem to work about the same with other words. I follow the common practice of not truncating the codes for greater precision, although Phonix does truncate its final "sound" for greater recall.

The latest change to the implementation is an addition of an optional precompiled SQlite database of phonetic codes for the entire dictionary, created with "thimk-makedb". This greatly speeds lookup, permitting reasonable performance on enormous dictionaries.

Building thimk and spelling-suggest requires my parseargs and phonetic-code packages from hackage, as well as edit-distance and sqlite-0.5.1 or newer if you want to build and use the optional phonetic codes database. It is probably easiest to build using cabal-install, which should take care of most everything for you.

--Bart Massey 2012-08-26