crucible-llvm-0.6: Support for translating and executing LLVM code in Crucible
Copyright(c) Galois Inc 2014-2021
LicenseBSD3
MaintainerRob Dockins <rdockins@galois.com>
Stabilityprovisional
Safe HaskellSafe-Inferred
LanguageHaskell2010

Lang.Crucible.LLVM.Translation

Description

This module translates an LLVM Module into a collection of Crucible control-flow graphs, one per function. The tricky parts of this translation are 1) mapping LLVM types onto Crucible types in a sensible way and 2) translating the phi-instructions of LLVM's SSA form.

To handle the translation of phi-functions, we first perform a pre-pass over all the LLVM basic blocks looking for phi-functions and build a datastructure that tells us what assignments to make when jumping from block l to block l'. We then emit instructions to make these assignments in a separate Crucible basic block (see definePhiBlock). Thus, the translated CFG will generally have more basic blocks than the original LLVM.

Points of note:

  • Immediate (i.e., not in memory) structs and packed structs are translated the same.
  • Undefined values generate special Crucible expressions (e.g., BVUndef) to represent arbitrary bitpatterns.
  • The floating-point domain is interpreted by IsSymInterface as either the IEEE754 floating-point domain, the real domain, or a domain with bitvector values and uninterpreted operations.

Some notes on undefined/poison values: (outcome of discussions between JHx and RWD)

  • Continue to add Crucible expressions for undefined values as required (e.g. for floating-point values). Crucible itself is currently treating undefined values as fresh symbolic inputs; it should instead invent a new category of "arbitrary" values that get passed down into the solvers in a way that is dependent on the task at hand. For example, in verification tasks, it is appropriate to treat the arbitrary values the same as symbolic inputs. However, for preimage-finding tasks, the arbitrary values must be treated as universally-quantified, unlike the symbolic inputs which are existentially-quantified.
  • For poison values, our implementation strategy is to assert side conditions onto values that may create poison. As opposed to assertions (which must be satisfied because you passed through a control-flow point) side conditions are intended to mean that a condition must hold when a value is used (i.e., side conditions follow data-flow). So if a poison value is created but not examined (e.g., because we later do a test to determine if the value is safe), that should be allowed.

A (probably) partial list of things we intend to support, but do not yet:

  • Various vector instructions. This includes a variety of instructions that LLVM allows to take vector arguments, but are currently only defined on scalar (nonvector) arguments. (Progress has been made on this, but may not yet be complete).

A list of things that are unsupported and may never be:

  • Computed jumps and blockaddress expressions
  • Multithreading primitives
  • Alternate calling conventions
Synopsis

Documentation

data ModuleTranslation arch Source #

The result of translating an LLVM module into Crucible CFGs.

Instances

Instances details
TestEquality ModuleTranslation Source # 
Instance details

Defined in Lang.Crucible.LLVM.Translation

Methods

testEquality :: forall (a :: k) (b :: k). ModuleTranslation a -> ModuleTranslation b -> Maybe (a :~: b) #

getTranslatedCFG :: ModuleTranslation arch -> Symbol -> IO (Maybe (Declare, AnyCFG LLVM, [LLVMTranslationWarning])) Source #

Given a ModuleTranslation and a function symbol corresponding to a function defined in the module, attempt to look up the symbol name and retrieve the corresponding Crucible CFG. This will load and translate the CFG if this is the first time the given symbol is requested.

Will return Nothing if the symbol does not refer to a function defined in this module.

getTranslatedFnHandle :: ModuleTranslation arch -> Symbol -> IO (Maybe (Declare, SomeHandle)) Source #

Look up the signature and function handle for a function defined in this module translation. This does not trigger translation of the named function into Crucible if it has not already been requested.

Will return Nothing if the symbol does not refer to a function defined in this module.

data LLVMContext arch Source #

Information about the LLVM module.

Constructors

LLVMContext 

Fields

translateModule Source #

Arguments

:: (?transOpts :: TranslationOptions) 
=> HandleAllocator

Generator for nonces.

-> GlobalVar Mem

Memory model to associate with this context

-> Module

Module to translate

-> IO (Some ModuleTranslation) 

Translate a module into Crucible control-flow graphs. Return the translated module and a list of warning messages generated during translation. Note: We may want to add a map from symbols to existing function handles if we want to support dynamic loading.

data LLVMTranslationWarning Source #

A warning generated during translation

Constructors

LLVMTranslationWarning 

Fields

  • Symbol

    Function name in which the warning was generated

  • Position

    Source position where the warning was generated

  • Text

    Description of the warning

Representation of LLVM constant values

data LLVMConst where Source #

Translation-time LLVM constant values.

Constructors

ZeroConst :: !MemType -> LLVMConst

A constant value consisting of all zero bits.

IntConst :: 1 <= w => !(NatRepr w) -> !(BV w) -> LLVMConst

A constant integer value, with bit-width w.

FloatConst :: !Float -> LLVMConst

A constant floating point value.

DoubleConst :: !Double -> LLVMConst

A constant double value.

LongDoubleConst :: !FP80Value -> LLVMConst

A constant long double value (X86_FP80)

StringConst :: !ByteString -> LLVMConst

A constant sequence of bytes

ArrayConst :: !MemType -> [LLVMConst] -> LLVMConst

A constant array value.

VectorConst :: !MemType -> [LLVMConst] -> LLVMConst

A constant vector value.

StructConst :: !StructInfo -> [LLVMConst] -> LLVMConst

A constant structure value.

SymbolConst :: !Symbol -> !Integer -> LLVMConst

A pointer value, consisting of a concrete offset from a global symbol.

UndefConst :: !MemType -> LLVMConst

The undef value is quite strange. See: The LLVM Language Reference, § Undefined Values.

Instances

Instances details
Show LLVMConst Source #

This also can't be derived, but is completely uninteresting.

Instance details

Defined in Lang.Crucible.LLVM.Translation.Constant

Eq LLVMConst Source #

The interesting cases here are: * IntConst: GHC can't derive this because IntConst existentially quantifies the integer's width. We say that two integers are equal when they have the same width *and* the same value. * UndefConst: Two undef values aren't necessarily the same...

Instance details

Defined in Lang.Crucible.LLVM.Translation.Constant

boolConst :: Bool -> LLVMConst Source #

Create an LLVM constant value from a boolean.

intConst Source #

Arguments

:: MonadError String m 
=> Natural

width of the integer constant, w

-> Integer

value of the integer constant, n

-> m LLVMConst 

Create an LLVM constant of a given width. The resulting integer constant value will be the unsigned integer value n mod 2^w.

Translations from LLVM syntax to constant values

transConstantWithType :: (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr) => Typed Value -> m (MemType, LLVMConst) Source #

Compute the constant value of an expression. Fail if the given value does not represent a constant.

transConstant' :: (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr) => MemType -> Value -> m LLVMConst Source #

Compute the constant value of an expression. Fail if the given value does not represent a constant.

transConstantExpr :: forall m wptr. (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr) => ConstExpr -> m LLVMConst Source #

Compute the value of a constant expression. Fails if the expression does not actually represent a constant value.

Intermediate representation for GEP

data GEP (n :: Nat) (expr :: Type) where Source #

Intermediate representation of a GEP. A GEP n expr is a representation of a GEP with n parallel vector lanes with expressions represented by expr values.

Constructors

GEP_scalar_base :: expr -> GEP 1 expr

Start a GEP with a single base pointer

GEP_vector_base :: NatRepr n -> expr -> GEP n expr

Start a GEP with a vector of n base pointers

GEP_scatter :: NatRepr n -> GEP 1 expr -> GEP n expr

Copy a scalar base vector pointwise into a vector of length n.

GEP_field :: FieldInfo -> GEP n expr -> GEP n expr

Add the offset corresponding to the given field pointwise to each pointer

GEP_index_each :: MemType -> GEP n expr -> expr -> GEP n expr

Add an offset corresponding to the given array index (multiplied by the given type size) pointwise to the pointers in each lane.

GEP_index_vector :: MemType -> GEP n expr -> expr -> GEP n expr

Given a vector of offsets (whose length must match the number of lanes), multiply each one by the type size, and add the offsets to the corresponding pointers.

Instances

Instances details
Foldable (GEP n) Source # 
Instance details

Defined in Lang.Crucible.LLVM.Translation.Constant

Methods

fold :: Monoid m => GEP n m -> m #

foldMap :: Monoid m => (a -> m) -> GEP n a -> m #

foldMap' :: Monoid m => (a -> m) -> GEP n a -> m #

foldr :: (a -> b -> b) -> b -> GEP n a -> b #

foldr' :: (a -> b -> b) -> b -> GEP n a -> b #

foldl :: (b -> a -> b) -> b -> GEP n a -> b #

foldl' :: (b -> a -> b) -> b -> GEP n a -> b #

foldr1 :: (a -> a -> a) -> GEP n a -> a #

foldl1 :: (a -> a -> a) -> GEP n a -> a #

toList :: GEP n a -> [a] #

null :: GEP n a -> Bool #

length :: GEP n a -> Int #

elem :: Eq a => a -> GEP n a -> Bool #

maximum :: Ord a => GEP n a -> a #

minimum :: Ord a => GEP n a -> a #

sum :: Num a => GEP n a -> a #

product :: Num a => GEP n a -> a #

Traversable (GEP n) Source # 
Instance details

Defined in Lang.Crucible.LLVM.Translation.Constant

Methods

traverse :: Applicative f => (a -> f b) -> GEP n a -> f (GEP n b) #

sequenceA :: Applicative f => GEP n (f a) -> f (GEP n a) #

mapM :: Monad m => (a -> m b) -> GEP n a -> m (GEP n b) #

sequence :: Monad m => GEP n (m a) -> m (GEP n a) #

Functor (GEP n) Source # 
Instance details

Defined in Lang.Crucible.LLVM.Translation.Constant

Methods

fmap :: (a -> b) -> GEP n a -> GEP n b #

(<$) :: a -> GEP n b -> GEP n a #

data GEPResult expr where Source #

The result of a GEP instruction translation. It records the number of parallel vector lanes in the resulting instruction, the resulting memory type of the instruction, and the sequence of sub-operations required to compute the GEP instruction.

Constructors

GEPResult :: 1 <= n => NatRepr n -> MemType -> GEP n expr -> GEPResult expr 

Instances

Instances details
Foldable GEPResult Source # 
Instance details

Defined in Lang.Crucible.LLVM.Translation.Constant

Methods

fold :: Monoid m => GEPResult m -> m #

foldMap :: Monoid m => (a -> m) -> GEPResult a -> m #

foldMap' :: Monoid m => (a -> m) -> GEPResult a -> m #

foldr :: (a -> b -> b) -> b -> GEPResult a -> b #

foldr' :: (a -> b -> b) -> b -> GEPResult a -> b #

foldl :: (b -> a -> b) -> b -> GEPResult a -> b #

foldl' :: (b -> a -> b) -> b -> GEPResult a -> b #

foldr1 :: (a -> a -> a) -> GEPResult a -> a #

foldl1 :: (a -> a -> a) -> GEPResult a -> a #

toList :: GEPResult a -> [a] #

null :: GEPResult a -> Bool #

length :: GEPResult a -> Int #

elem :: Eq a => a -> GEPResult a -> Bool #

maximum :: Ord a => GEPResult a -> a #

minimum :: Ord a => GEPResult a -> a #

sum :: Num a => GEPResult a -> a #

product :: Num a => GEPResult a -> a #

Traversable GEPResult Source # 
Instance details

Defined in Lang.Crucible.LLVM.Translation.Constant

Methods

traverse :: Applicative f => (a -> f b) -> GEPResult a -> f (GEPResult b) #

sequenceA :: Applicative f => GEPResult (f a) -> f (GEPResult a) #

mapM :: Monad m => (a -> m b) -> GEPResult a -> m (GEPResult b) #

sequence :: Monad m => GEPResult (m a) -> m (GEPResult a) #

Functor GEPResult Source # 
Instance details

Defined in Lang.Crucible.LLVM.Translation.Constant

Methods

fmap :: (a -> b) -> GEPResult a -> GEPResult b #

(<$) :: a -> GEPResult b -> GEPResult a #

translateGEP Source #

Arguments

:: forall wptr m. (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr) 
=> Bool

inbounds flag

-> Type

base type for calculations

-> Typed Value

base pointer expression

-> [Typed Value]

index arguments

-> m (GEPResult (Typed Value)) 

Given the data for an LLVM getelementpointer instruction, preprocess the instruction into a GEPResult, checking types, computing vectorization lanes, etc.

As a concrete example, consider a call to translateGEP inbounds baseTy basePtr elts with the following instruction:

getelementptr [12 x i8], ptr %aptr, i64 0, i32 1

Here:

  • inbounds is False, as the keyword of the same name is missing from the instruction. (Currently, crucible-llvm ignores this information.)
  • baseTy is [12 x i8]. This is the type used as the basis for subsequent calculations.
  • basePtr is ptr %aptr. This pointer is used as the base address to start calculations from. Note that the type of basePtr is not baseTy, but rather a pointer type.
  • The elts are [i64 0, i32 1]. These are the indices that indicate which of the elements of the aggregate object are indexed.

Utility functions

showInstr :: Instr -> String Source #

Pretty print an LLVM instruction

data TranslationOptions Source #

This datatype encodes a variety of tweakable settings that apply during LLVM translation.

Constructors

TranslationOptions 

Fields

  • debugIntrinsics :: !Bool

    Should we translate llvm.dbg intrinsic statements? The upside to translating them is that they encode debugging information about a program that can be useful for external clients. The downside is that they can bloat the size of translated programs, despite being no-ops during simulation.

  • laxArith :: !Bool

    Should we omit proof obligations related to arithmetic overflow and similar assertions?

  • optLoopMerge :: !Bool

    Should we insert merge blocks in loops with early exits (i.e. breaks or returns)? This may improve simulation performance.

defaultTranslationOptions :: TranslationOptions Source #

The default translation options:

  • Do not translate llvm.dbg intrinsic statements
  • Do not produce proof obligations for arithmetic-related assertions.
  • Do not insert merge blocks in loops with early exits.

debugIntrinsicsTranslationOptions :: TranslationOptions Source #

Like defaultTranslationOptions, but llvm.dbg intrinsic statements are translated.

llvmTypesAsCtx :: forall a wptr. HasPtrWidth wptr => [MemType] -> (forall ctx. CtxRepr ctx -> a) -> a Source #

Translate a list of LLVM expressions into a crucible type context, which is passed into the given continuation.

llvmTypeAsRepr :: forall a wptr. HasPtrWidth wptr => MemType -> (forall tp. TypeRepr tp -> a) -> a Source #

Translate an LLVM type into a crucible type, which is passed into the given continuation

llvmRetTypeAsRepr :: forall a wptr. HasPtrWidth wptr => Maybe MemType -> (forall tp. TypeRepr tp -> a) -> a Source #

Translate an LLVM return type into a crucible type, which is passed into the given continuation

llvmDeclToFunHandleRepr :: HasPtrWidth wptr => FunDecl -> (forall args ret. CtxRepr args -> TypeRepr ret -> a) -> a Source #

Compute the function Crucible function signature that corresponds to the given LLVM function declaration.

llvmDeclToFunHandleRepr' :: (?lc :: TypeContext, HasPtrWidth wptr, MonadFail m) => Declare -> (forall args ret. CtxRepr args -> TypeRepr ret -> m a) -> m a Source #

type LLVMPointerType w = IntrinsicType "LLVM_pointer" (EmptyCtx ::> BVType w) Source #

Crucible type of pointers/bitvector values of width w.

pattern PtrWidth :: HasPtrWidth w => w ~ w' => NatRepr w' Source #

pattern LLVMPointerRepr :: () => (1 <= w, ty ~ LLVMPointerType w) => NatRepr w -> TypeRepr ty Source #

This pattern synonym makes it easy to build and destruct runtime representatives of LLVMPointerType w.

allModuleDeclares :: Module -> [Declare] Source #

Return all declarations derived from both external symbols and internal definitions.