Changelog for tiktoken-1.0.2
1.0.2
-
Correctly handle gaps in ranks
The old implementation assumed that encoding don't have gaps in their ranks, but some do (especially near the end, typically reserved for special tokens). This change fixes the internal implementation to correctly handle those gaps.
-
Fix
o200k_base
regex to match upstreamThe upstream
tiktoken
package uses a flavor of regex that subtly differs from the Haskellpcre-light
package. Specifically, they differ in whether they treat ideographic space (U+3000) as whitespace (which this change fixes).There may be other differences yet to be uncovered, but this is the only one that has arisen so far when comparing to upstream on a large corpus of text.
1.0.1
1.0.0
- Initial release