Safe Haskell | Safe-Inferred |
---|
Documentation
segment :: String -> [String]Source
segment
s
splits s
into a list of sentences.
It looks for punctuation characters that indicate an end-of-sentence and tries to ignore some uses of puncuation which do not correspond to ends of sentences
It's a good idea to view the source code to this module, especially the test suite.
I imagine this sort of task is actually ambiguous and that you actually won't be able to write an exact segmenter.
It may be a good idea to go see the literature on how to do segmentation right, maybe implement something which returns the N most probable segmentations instead.