pronounce: A library for interfacing with the CMU Pronouncing Dictionary

[ bsd3, library, text ] [ Propose Tags ]

Text.Pronounce is a Haskell library for interfacing and CMU Pronouncing Dictionary. It is based off of Allison Parrish's python library pronouncing, and it exports much of the same functionality. The underlying data structure that I used for representing the dictionary was a Map from entries to lists of their possible phones as represented in the CMU dict. Many functions rely on access to the CMU dict and may return more than one result (more on the layout of the cmu dict later), so I decided to encompass this underlying state of the dictionary by using the ReaderT Monad Transformer with the List Monad embedded inside it.

In order to properly use this library, a basic understanding of the CMU Pronouncing Dictionary is assumed. Basically, the dictionary maps English words to their pronunciations transcribed using ARPAbet. This transcription reduces each word to a sequence of phones (vowel/consonant sounds) with stresses indicated by numbers at the ends of vowels. In addition, since some words can have multiple pronunciations, there can be multiple entries for a word:


Most users need not worry about the actual syntax of the cmu dict; however, and should merely note that such an entry in the CMUdict would consist of the mapping from the Entry "CONSOLE" to some [Phones], a list of possible sequences of phones for this word (stresses included). For a better description of the actual cmu pronouncing dictionary, I recommend visiting the official website or simply looking through the cmu dict itself.

When working with this library, the default setting is to load the dictionary from an included binary file, but the user has the option to parse the dictionary from a unicode text file, or encode the text file into binary themselves. For this last purpose, I included the script I originally used to encode the dictionary into a binary in the examples folder.

Finally, I would like to note that Text.Pronounce.ParseDict operates on utf8 encoded files, due to compatibility with Text, which is utf encoded, despite the fact the original CMU Pronouncing Dictionary uses latin1 encoding. Because of this, if the user wants to use a version of the CMU Dictionary other than the included one, they must change to encoding to utf before parsing.

[Skip to Readme]


Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


Versions [RSS],,,
Change log
Dependencies base (>=4.10 && <4.12), binary (>=0.8.4 && <0.9), containers (>=0.5 && <0.6), filepath (>=1.4 && <1.5), mtl (>=2.2 && <2.3), safe (>=0.3 && <0.4), text (>=1.2 && <1.3) [details]
License BSD-3-Clause
Author Noah Goodman
Category Text
Home page
Source repo head: git clone
Uploaded by NoahGoodman at 2018-08-23T13:04:28Z
Reverse Dependencies 1 direct, 0 indirect [details]
Downloads 2193 total (1 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2018-08-23 [all 1 reports]

Readme for pronounce-

[back to package description]


A pronunciation and rhyming library that uses the CMU Pronouncing Dictionary

This package is a basic interface for the Carnegie Mellon University Pronouncing Dictionary, based off of Allison Parrish's Python API, pronouncing.


In general, a cabal sandbox is the safest and easiest way to install most Haskell packages, so I recommend running

cabal sandbox init
cabal update
cabal install pronounce

in the project directory where you would like to use Text.Pronounce.


A general overview and information about the package can be found on Text.Pronounce's Hackage page

For basic descriptions of the package's exports, the Haddockumentation can also be found on Hackage