hpdft: A tool for looking through PDF file using Haskell

[ library, mit, pdf, program ] [ Propose Tags ] [ Report a vulnerability ]

A command line PDF-to-text converter. It may take a much longer than other similar tools but could yield better results.

This package can also serve as a library for working with text data in PDF files. You could write your own PDF-to-text converter for some particular PDF files, utilizing any meta data or special data structures of those.


[Skip to Readme]

Modules

[Last Documentation]

  • PDF
    • PDF.CFF
    • PDF.Character
    • PDF.Cmap
    • PDF.ContentStream
    • PDF.Definition
    • PDF.DocumentStructure
    • PDF.Object
    • PDF.OpenType
    • PDF.Outlines
    • PDF.PDFIO
    • PDF.Type1

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0.1.0.0, 0.1.0.1, 0.1.0.2, 0.1.0.3, 0.1.0.4, 0.1.0.5, 0.1.0.6, 0.1.1.1, 0.1.1.2, 0.1.1.3 (info)
Dependencies attoparsec (>=0.14.4 && <0.15), base (>=4.18.0 && <4.19), binary (>=0.8.9 && <0.9), bytestring (>=0.11.4 && <0.12), containers (>=0.6.7 && <0.7), directory (>=1.3.8 && <1.4), file-embed (>=0.0.15 && <0.1), hpdft, memory (>=0.18.0 && <0.19), optparse-applicative (>=0.18.1 && <0.19), parsec (>=3.0 && <3.2), regex-base (>=0.94.0 && <0.95), regex-tdfa (>=1.3.2 && <1.4), semigroups (>=0.20 && <0.21), text (>=2.0.2 && <2.1), utf8-string (>=1.0.2 && <1.1), zlib (>=0.6.3 && <0.7) [details]
License MIT
Author Keiichiro Shikano
Maintainer k16.shikano@gmail.com
Category PDF
Home page https://github.com/k16shikano/hpdft
Uploaded by keiichiroShikano at 2023-08-22T05:28:33Z
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Executables hpdft
Downloads 3587 total (28 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs not available [build log]
All reported builds failed as of 2023-08-22 [all 2 reports]

Readme for hpdft-0.1.1.3

[back to package description]

hpdft (Haskell PDF Tools)

hpdft is a PDF parsing tool. It can also be used as a command to grab text, metadata outline (i.e. table of contents) from PDF.

Command usage:

hpdft [-p|--page PAGE] [-r|--ref REF] [-g|--grep RegExp] [-R|--refs]
             [-T|--title] [-I|--info] [-O|--toc] [--trailer] FILE

Available options:
  -p,--page PAGE           Page number (nomble)
  -r,--ref REF             Object reference
  -g,--grep RegExp         grep PDF
  -R,--refs                Show object references in page order
  -T,--title               Show title (from metadata)
  -I,--info                Show PDF metainfo
  -O,--toc                 Show table of contents (from metadata)
  --trailer                Show the trailer of PDF
  FILE                     input pdf file
  -h,--help                Show this help text

install

Clone this repository and do cabal-install.

$ cabal install