hPDB: Protein Databank file format library

[ bioinformatics-, bsd3, library ] [ Propose Tags ]

Protein Data Bank file format is a most popular format for holding biological macromolecular data.

This is a very fast sequential parser:

  • below 7s for the largest entry in PDB - 1HTQ which is over 70MB - as compared with

  • 11s of RASMOL 2.7.5,

  • or 2m15s of BioPython with Python 2.6 interpreter.

In its parallel incarnation it is most probably the fastest parser for PDB format.

It is aimed to not only deliver event-based interface, but also a high-level data structure for manipulating data in spirit of BioPython's PDB parser.

hPDB - Haskell library for processing atomic biomolecular structures in Protein Data Bank format - Michal Jan Gajda. BMC Research Notes 2013, 6:483.


[Skip to Readme]

Modules

[Last Documentation]

  • Bio
    • Bio.PDB
      • EventParser
        • Bio.PDB.EventParser.ExperimentalMethods
        • Bio.PDB.EventParser.HelixTypes
        • Bio.PDB.EventParser.PDBEventParser
        • Bio.PDB.EventParser.PDBEventPrinter
        • Bio.PDB.EventParser.PDBEvents
        • Bio.PDB.EventParser.StrandSense
      • Bio.PDB.Fasta
      • Bio.PDB.IO
        • Bio.PDB.IO.OpenAnyFile
      • Bio.PDB.Iterable
      • Bio.PDB.Structure
        • Bio.PDB.Structure.Elements
        • Bio.PDB.Structure.List
        • Bio.PDB.Structure.Neighbours
        • Bio.PDB.Structure.Vector
      • Bio.PDB.StructureBuilder
      • Bio.PDB.StructurePrinter

Flags

Automatic Flags
NameDescriptionDefault
have-mmap

Use mmap to read input faster.

Enabled
have-sse2

Use -msse2 for faster code.

Enabled
have-text-format

Do not use text-format, since it may require double-conversion and thus linking of libstdc++ which may break compilation due to GHC bug #5289:

http://ghc.haskell.org/trac/ghc/ticket/5289

Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0.99, 0.999, 0.9999, 0.9999.1, 1.0, 1.1, 1.1.1, 1.1.2, 1.2.0, 1.2.0.1, 1.2.0.2, 1.2.0.3, 1.2.0.4, 1.2.0.5, 1.2.0.6, 1.2.0.7, 1.2.0.8, 1.2.0.9, 1.2.0.10, 1.3.0.0, 1.4.0.0, 1.5.0.0 (info)
Change log changelog
Dependencies AC-Vector, base (>=4.0 && <4.12), bytestring, containers, deepseq, directory, ghc-prim, iterable (>=3.0), mmap, mtl, Octree (>=0.5), parallel (>=3.0.0.0), QuickCheck (>=2.5.0.0), tagged (>=0.7), template-haskell, text (>=0.11.1.13), text-format (>=0.3.1.0), unordered-containers (>=0.2.5.0), vector, zlib [details]
License BSD-3-Clause
Copyright Copyright by Michal J. Gajda '2009-'2015
Author Michal J. Gajda
Maintainer mjgajda@googlemail.com
Category Bioinformatics
Home page https://github.com/BioHaskell/hPDB
Bug tracker mailto:mjgajda@googlemail.com
Source repo head: git clone https://github.com/BioHaskell/hPDB.git
Uploaded by MichalGajda at 2018-07-14T20:14:54Z
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Downloads 14786 total (53 in the last 30 days)
Rating 2.0 (votes: 1) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs not available [build log]
All reported builds failed as of 2018-07-14 [all 3 reports]

Readme for hPDB-1.3.0.0

[back to package description]

hPDB

Haskell PDB file format parser.

Build Status Hackage Hackage Dependencies

Protein Data Bank file format is a most popular format for holding biomolecule data.

This is a very fast parser:

  • below 7s for the largest entry in PDB - 1HTQ which is over 70MB
  • as compared with 11s of RASMOL 2.7.5,
  • or 2m15s of BioPython with Python 2.6 interpreter.

It is aimed to not only deliver event-based interface, but also a high-level data structure for manipulating data in spirit of BioPython's PDB parser.

Details on official releases are on Hackage

This package is also a part of Stackage - a stable subset of Hackage.

Projects for the future:

Please let me know if you would be willing to push the project further.

In particular one may considering these features:

  • Migrate out of text-format, since it gives portability trouble, and slows things down when printing.
  • Migrate from AC-Vector to another vector library:
    • vector-space
    • or linear
  • Use lens to facilitate access to the data structures.
    • torsion angles within protein/RNA chain.
  • Add Octree to the default data structure (with automatic update.)
  • Write a combinator library for generic fast parsing.
  • Checking whether GHC 7.8 improved efficiency of fixed point arithmetic, since PDB coordinates have dynamic range of just ~2^20 bits, with smallest step of 0.001.
  • Implement basic spatial operations of RMS superposition (with SVD), affine transform on a substructure.
  • Class-based wrappers showing Structure-Model-Chain-Residue-Atom interface with possible wrapping of Repa/Accelerate arrays for fast computation.

Please ask me any questions on Gitter.