Copyright | Justin Bailey Chris Kuklewicz Daniel Fischer |
---|---|
License | BSD3 |
Maintainer | Daniel Fischer <daniel.is.fischer@googlemail.com> |
Stability | Provisional |
Portability | non-portable (BangPatterns) |
Safe Haskell | None |
Language | Haskell98 |
Fast search of lazy ByteString
values using the
Knuth-Morris-Pratt algorithm.
A description of the algorithm can be found at http://en.wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm.
Original authors: Justin Bailey (jgbailey at gmail.com) and Chris Kuklewicz (haskell at list.mightyreason.com).
- indices :: ByteString -> ByteString -> [Int64]
- nonOverlappingIndices :: ByteString -> ByteString -> [Int64]
- strictify :: ByteString -> ByteString
Overview
This module provides two functions for finding the occurrences of a
pattern in a target string using the Knuth-Morris-Pratt algorithm.
It exists mostly for systematic reasons, the functions from
Data.ByteString.Lazy.Search are much faster, except for very short
patterns or long patterns with a short period if overlap is allowed.
In the latter case, indices
from this module may be the best choice
since the Boyer-Moore function's performance degrades if there are many
matches and the DFA function's automaton needs much space for long
patterns.
In the former case, for some pattern/target combinations DFA has better
performance, for others KMP, usually the difference is small.
Complexity and Performance
The preprocessing of the pattern is O(patternLength
) in time and space.
The time complexity of the searching phase is O(targetLength
) for both
functions.
In most cases, these functions are considerably slower than the Boyer-Moore variants, performance is close to that of those from Data.ByteString.Search.DFA.
Partial application
Both functions can be usefully partially applied. Given only a pattern, the auxiliary data will be computed only once, allowing for efficient re-use.
Functions
:: ByteString | Strict pattern to find |
-> ByteString | Lazy string to search |
-> [Int64] | Offsets of matches |
:: ByteString | Strict pattern to find |
-> ByteString | Lazy string to search |
-> [Int64] | Offsets of matches |
finds the starting indices of all
non-overlapping occurrences of the pattern in the target string.
It is more efficient than removing indices from the list produced
by nonOverlappingIndices
indices
.
Convenience
strictify :: ByteString -> ByteString Source
transforms a lazy strictify
ByteString
into a strict
ByteString
, to make it a suitable pattern for the searching
functions.