loc: Line and column positions and ranges in text files

[ apache, data-structures, library, text ] [ Propose Tags ] [ Report a vulnerability ]

The package name loc stands for “location” and is also an allusion to the acronym for “lines of code”.

The Loc type represents a caret position in a text file, the Span type is a nonempty range between two Locs, and the Area type is a set of non-touching Spans.

[Skip to Readme]

Modules

[Index] [Quick Jump]

Data
- Data.Loc

Downloads

loc-0.2.0.0.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

chris_martin, Monoid_Mary

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1.0.0, 0.1.0.1, 0.1.0.2, 0.1.1.0, 0.1.2.0, 0.1.2.1, 0.1.2.2, 0.1.2.3, 0.1.3.0, 0.1.3.1, 0.1.3.2, 0.1.3.3, 0.1.3.4, 0.1.3.6, 0.1.3.8, 0.1.3.10, 0.1.3.12, 0.1.3.14, 0.1.3.16, 0.1.4.0, 0.1.4.1, 0.2.0.0
Change log	changelog.md
Dependencies	base (>=4.16 && <4.19), containers (>=0.6.4 && <0.7), integer-types (>=0.1.2 && <0.2) [details]
License	Apache-2.0
Copyright	2017 Mission Valley Software LLC
Author	Chris Martin
Maintainer	Chris Martin, Julie Moronuki
Uploaded	by chris_martin at 2023-06-26T19:18:27Z
Category	Data Structures, Text
Home page	https://github.com/typeclasses/loc
Bug tracker	https://github.com/typeclasses/loc/issues
Source repo	head: git clone git://github.com/typeclasses/loc.git
Distributions
Reverse Dependencies	1 direct, 1 indirect [details]
Downloads	10028 total (46 in the last 30 days)
Rating	2.0 (votes: 1) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs available [build log] Last success reported on 2023-06-26 [all 1 reports]

Readme for loc-0.2.0.0

[back to package description]

The package name loc stands for “location” and is also an allusion to the acronym for “lines of code”.

Overview of the concepts:

Example text illustrating Loc, Span, and Area

Loc - a cursor position, starting at the origin 1:1
Span - a nonempty contiguous region between two locs
Area - a set of zero or more spans with gaps between them

`Pos`

Since all of the numbers we're dealing with in this domain are positive, we define a "positive integer" type. This is a newtype for Natural that doesn't allow zero.

newtype Pos = Pos Natural
  deriving (Eq, Ord)

instance Num Pos where
  fromInteger = Pos . checkForUnderflow . fromInteger
  Pos x + Pos y = Pos (x + y)
  Pos x - Pos y = Pos (checkForUnderflow (x - y))
  Pos x * Pos y = Pos (x * y)
  abs = id
  signum _ = Pos 1
  negate _ = throw Underflow

checkForUnderflow :: Natural -> Natural
checkForUnderflow n =
  if n == 0 then throw Underflow else n

Pos does not have an Integral instance, because that would require implementing quotRem :: Pos -> Pos -> (Pos, Pos), which doesn't make much sense. Therefore we can't use toInteger on Pos. Instead we use our own ToNat class to convert positive numbers to natural numbers.

class ToNat a where
  toNat :: a -> Natural

instance ToNat Pos where
  toNat (Pos n) = n

`Line`, `Column`

We then add some newtypes to be more specific about whether we're talking about line or column numbers.

newtype Line = Line Pos
  deriving (Eq, Ord, Num, Real, Enum, ToNat)

newtype Column = Column Pos
  deriving (Eq, Ord, Num, Real, Enum, ToNat)

`Loc`

A Loc is a Line and a Column.

data Loc = Loc
  { line   :: Line
  , column :: Column
  }
  deriving (Eq, Ord)

Note that this library has chosen to be remain entirely agnostic of the text that the positions are referring to. Therefore there is no "plus one" operation on Loc, because the next Loc after 4:17 could be either 4:18 or 5:1 - we can't tell without knowing the line lengths.

`Span`

A Span is a start Loc and an end Loc.

data Span = Span
  { start :: Loc
  , end   :: Loc
  } deriving (Eq, Ord)

A Span is not allowed to be empty; in other words, start and end must be different.

There are two functions for constructing a Span. They both reorder their arguments as appropriate to make sure the start comes before the end (so that spans are never backwards). They take different approaches to ensuring that spans are never empty: the first can throw an exception, whereas the second is typed as Maybe.

fromTo :: Loc -> Loc -> Span
fromTo a b =
  maybe (throw EmptySpan) id (fromToMay a b)

fromToMay :: Loc -> Loc -> Maybe Span
fromToMay a b =
  case compare a b of
    LT -> Just (Span a b)
    GT -> Just (Span b a)
    EQ -> Nothing

The choice to use an exclusive upper bound [start, end) rather than two inclusive bounds [start, end] is forced by the decision to be text-agnostic. With inclusive ranges, you couldn't tell whether span 4:16-4:17 abuts span 5:1-5:2 without knowing whether the character at position 4:17 is a newline.

`Area`

Conceptually, an area is a set of spans. To support efficient union and difference operations, Area is defined like this:

data Terminus = Start | End
  deriving (Eq, Ord)

newtype Area = Area (Map Loc Terminus)
  deriving (Eq, Ord)

You can think of this as a sorted list of the spans' start and end positions, along with a tag indicating whether each is a start or an end.

`Show`

We define custom Show and Read instances to be able to write terse tests like:

>>> addSpan (read "1:1-6:1") (read "[1:1-3:1,6:1-6:2,7:4-7:5]")
[1:1-6:2,7:4-7:5]

These are the showsPrec implementations for Loc and Span:

locShowsPrec :: Int -> Loc -> ShowS
locShowsPrec _ (Loc l c) =
  shows l .
  showString ":" .
  shows c

spanShowsPrec :: Int -> Span -> ShowS
spanShowsPrec _ (Span a b) =
  locShowsPrec 10 a .
  showString "-" .
  locShowsPrec 10 b

`Read`

The parser for Pos is based on the parser for Natural, applying mfilter (/= 0) to make the parser fail if the input represents a zero.

posReadPrec :: ReadPrec Pos
posReadPrec =
  Pos <$> mfilter (/= 0) readPrec

As a reminder, the type of mfilter is:

mfilter :: MonadPlus m => (a -> Bool) -> m a -> m a

The Loc parser uses a very typical Applicative pattern:

-- | Parses a single specific character.
readPrecChar :: Char -> ReadPrec ()
readPrecChar = void . readP_to_Prec . const . ReadP.char

locReadPrec :: ReadPrec Loc
locReadPrec =
  Loc              <$>
  readPrec         <*
  readPrecChar ':' <*>
  readPrec

We used mfilter above to introduce failure into the Pos parser; for Span we use empty.

empty :: Alternative f => f a

First we use fromToMay to produce a Maybe Span, and then in the case where the result is Nothing we use empty to make the parser fail.

spanReadPrec :: ReadPrec Span
spanReadPrec =
  locReadPrec      >>= \a ->
  readPrecChar '-' *>
  locReadPrec      >>= \b ->
  maybe empty pure (fromToMay a b)

Comparison to similar packages

`srcloc`

srcloc has a similar general purpose: defining types related to positions in text files.

Some differences:

srcloc's Pos type (comparable to our Loc type) has a FilePath parameter, whereas this library doesn't consider file paths at all.
srcloc has nothing comparable to the Area type.

There are some undocumented aspects of srcloc we find confusing:

What does "character offset" mean?
Does srcloc's Loc type use inclusive or exclusive bounds?