Copyright	(c) Galois Inc 2014-2021
License	BSD3
Maintainer	Rob Dockins <rdockins@galois.com>
Stability	provisional
Safe Haskell	Safe-Inferred
Language	Haskell2010

Lang.Crucible.LLVM.Translation

Contents

Representation of LLVM constant values
Translations from LLVM syntax to constant values
Intermediate representation for GEP
Utility functions

Description

This module translates an LLVM Module into a collection of Crucible control-flow graphs, one per function. The tricky parts of this translation are 1) mapping LLVM types onto Crucible types in a sensible way and 2) translating the phi-instructions of LLVM's SSA form.

To handle the translation of phi-functions, we first perform a pre-pass over all the LLVM basic blocks looking for phi-functions and build a datastructure that tells us what assignments to make when jumping from block l to block l'. We then emit instructions to make these assignments in a separate Crucible basic block (see definePhiBlock). Thus, the translated CFG will generally have more basic blocks than the original LLVM.

Points of note:

Immediate (i.e., not in memory) structs and packed structs are translated the same.
Undefined values generate special Crucible expressions (e.g., BVUndef) to represent arbitrary bitpatterns.
The floating-point domain is interpreted by IsSymInterface as either the IEEE754 floating-point domain, the real domain, or a domain with bitvector values and uninterpreted operations.

Some notes on undefined/poison values: (outcome of discussions between JHx and RWD)

Continue to add Crucible expressions for undefined values as required (e.g. for floating-point values). Crucible itself is currently treating undefined values as fresh symbolic inputs; it should instead invent a new category of "arbitrary" values that get passed down into the solvers in a way that is dependent on the task at hand. For example, in verification tasks, it is appropriate to treat the arbitrary values the same as symbolic inputs. However, for preimage-finding tasks, the arbitrary values must be treated as universally-quantified, unlike the symbolic inputs which are existentially-quantified.
For poison values, our implementation strategy is to assert side conditions onto values that may create poison. As opposed to assertions (which must be satisfied because you passed through a control-flow point) side conditions are intended to mean that a condition must hold when a value is used (i.e., side conditions follow data-flow). So if a poison value is created but not examined (e.g., because we later do a test to determine if the value is safe), that should be allowed.

A (probably) partial list of things we intend to support, but do not yet:

Various vector instructions. This includes a variety of instructions that LLVM allows to take vector arguments, but are currently only defined on scalar (nonvector) arguments. (Progress has been made on this, but may not yet be complete).

A list of things that are unsupported and may never be:

Computed jumps and blockaddress expressions
Multithreading primitives
Alternate calling conventions

Synopsis

data ModuleTranslation arch
getTranslatedCFG :: ModuleTranslation arch -> Symbol -> IO (Maybe (Declare, AnyCFG LLVM, [LLVMTranslationWarning]))
getTranslatedFnHandle :: ModuleTranslation arch -> Symbol -> IO (Maybe (Declare, SomeHandle))
transContext :: Getter (ModuleTranslation arch) (LLVMContext arch)
globalInitMap :: Getter (ModuleTranslation arch) GlobalInitializerMap
modTransDefs :: Getter (ModuleTranslation arch) [(Declare, SomeHandle)]
modTransModule :: Getter (ModuleTranslation arch) Module
modTransHalloc :: Getter (ModuleTranslation arch) HandleAllocator
data LLVMContext arch = LLVMContext {
- llvmArch :: ArchRepr arch
- llvmPtrWidth :: forall a. (16 <= ArchWidth arch => NatRepr (ArchWidth arch) -> a) -> a
- llvmMemVar :: GlobalVar Mem
- _llvmTypeCtx :: TypeContext
- llvmGlobalAliases :: Map Symbol (Set GlobalAlias)
- llvmFunctionAliases :: Map Symbol (Set GlobalAlias)
}
llvmTypeCtx :: Simple Lens (LLVMContext arch) TypeContext
translateModule :: (?transOpts :: TranslationOptions) => HandleAllocator -> GlobalVar Mem -> Module -> IO (Some ModuleTranslation)
data LLVMTranslationWarning = LLVMTranslationWarning Symbol Position Text
data LLVMConst where
- ZeroConst :: !MemType -> LLVMConst
- IntConst :: 1 <= w => !(NatRepr w) -> !(BV w) -> LLVMConst
- FloatConst :: !Float -> LLVMConst
- DoubleConst :: !Double -> LLVMConst
- LongDoubleConst :: !FP80Value -> LLVMConst
- StringConst :: !ByteString -> LLVMConst
- ArrayConst :: !MemType -> [LLVMConst] -> LLVMConst
- VectorConst :: !MemType -> [LLVMConst] -> LLVMConst
- StructConst :: !StructInfo -> [LLVMConst] -> LLVMConst
- SymbolConst :: !Symbol -> !Integer -> LLVMConst
- UndefConst :: !MemType -> LLVMConst
boolConst :: Bool -> LLVMConst
intConst :: MonadError String m => Natural -> Integer -> m LLVMConst
transConstant :: (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr) => Typed Value -> m LLVMConst
transConstantWithType :: (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr) => Typed Value -> m (MemType, LLVMConst)
transConstant' :: (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr) => MemType -> Value -> m LLVMConst
transConstantExpr :: forall m wptr. (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr) => ConstExpr -> m LLVMConst
data GEP (n :: Nat) (expr :: Type) where
- GEP_scalar_base :: expr -> GEP 1 expr
- GEP_vector_base :: NatRepr n -> expr -> GEP n expr
- GEP_scatter :: NatRepr n -> GEP 1 expr -> GEP n expr
- GEP_field :: FieldInfo -> GEP n expr -> GEP n expr
- GEP_index_each :: MemType -> GEP n expr -> expr -> GEP n expr
- GEP_index_vector :: MemType -> GEP n expr -> expr -> GEP n expr
data GEPResult expr where
- GEPResult :: 1 <= n => NatRepr n -> MemType -> GEP n expr -> GEPResult expr
translateGEP :: forall wptr m. (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr) => Bool -> Type -> Typed Value -> [Typed Value] -> m (GEPResult (Typed Value))
showInstr :: Instr -> String
testBreakpointFunction :: String -> Bool
data TranslationOptions = TranslationOptions {
- debugIntrinsics :: !Bool
- laxArith :: !Bool
- optLoopMerge :: !Bool
}
defaultTranslationOptions :: TranslationOptions
debugIntrinsicsTranslationOptions :: TranslationOptions
type VarArgs = VectorType AnyType
varArgsRepr :: TypeRepr VarArgs
llvmTypesAsCtx :: forall a wptr. HasPtrWidth wptr => [MemType] -> (forall ctx. CtxRepr ctx -> a) -> a
llvmTypeAsRepr :: forall a wptr. HasPtrWidth wptr => MemType -> (forall tp. TypeRepr tp -> a) -> a
llvmRetTypeAsRepr :: forall a wptr. HasPtrWidth wptr => Maybe MemType -> (forall tp. TypeRepr tp -> a) -> a
llvmDeclToFunHandleRepr :: HasPtrWidth wptr => FunDecl -> (forall args ret. CtxRepr args -> TypeRepr ret -> a) -> a
llvmDeclToFunHandleRepr' :: (?lc :: TypeContext, HasPtrWidth wptr, MonadFail m) => Declare -> (forall args ret. CtxRepr args -> TypeRepr ret -> m a) -> m a
type LLVMPointerType w = IntrinsicType "LLVM_pointer" (EmptyCtx ::> BVType w)
pattern PtrWidth :: HasPtrWidth w => w ~ w' => NatRepr w'
pattern LLVMPointerRepr :: () => (1 <= w, ty ~ LLVMPointerType w) => NatRepr w -> TypeRepr ty
declareFromDefine :: Define -> Declare
allModuleDeclares :: Module -> [Declare]
liftMemType :: (?lc :: TypeContext, MonadError String m) => Type -> m MemType
liftRetType :: (?lc :: TypeContext, MonadError String m) => Type -> m RetType
liftDeclare :: (?lc :: TypeContext, MonadError String m) => Declare -> m FunDecl

Documentation

data ModuleTranslation arch Source #

The result of translating an LLVM module into Crucible CFGs.

Instances

Instances details

TestEquality ModuleTranslation Source #
Instance details Defined in Lang.Crucible.LLVM.Translation Methods testEquality :: forall (a :: k) (b :: k). ModuleTranslation a -> ModuleTranslation b -> Maybe (a :~: b) #

getTranslatedCFG :: ModuleTranslation arch -> Symbol -> IO (Maybe (Declare, AnyCFG LLVM, [LLVMTranslationWarning])) Source #

Given a ModuleTranslation and a function symbol corresponding to a function defined in the module, attempt to look up the symbol name and retrieve the corresponding Crucible CFG. This will load and translate the CFG if this is the first time the given symbol is requested.

Will return Nothing if the symbol does not refer to a function defined in this module.

getTranslatedFnHandle :: ModuleTranslation arch -> Symbol -> IO (Maybe (Declare, SomeHandle)) Source #

Look up the signature and function handle for a function defined in this module translation. This does not trigger translation of the named function into Crucible if it has not already been requested.

Will return Nothing if the symbol does not refer to a function defined in this module.

transContext :: Getter (ModuleTranslation arch) (LLVMContext arch) Source #

globalInitMap :: Getter (ModuleTranslation arch) GlobalInitializerMap Source #

modTransDefs :: Getter (ModuleTranslation arch) [(Declare, SomeHandle)] Source #

modTransModule :: Getter (ModuleTranslation arch) Module Source #

modTransHalloc :: Getter (ModuleTranslation arch) HandleAllocator Source #

data LLVMContext arch Source #

Information about the LLVM module.

Constructors

LLVMContext

Fields

llvmArch :: ArchRepr arch
Map LLVM symbols to their associated state.
llvmPtrWidth :: forall a. (16 <= ArchWidth arch => NatRepr (ArchWidth arch) -> a) -> a
llvmMemVar :: GlobalVar Mem
_llvmTypeCtx :: TypeContext
llvmGlobalAliases :: Map Symbol (Set GlobalAlias)
For each global variable symbol, compute the set of aliases to that symbol
llvmFunctionAliases :: Map Symbol (Set GlobalAlias)
For each function symbol, compute the set of aliases to that symbol

llvmTypeCtx :: Simple Lens (LLVMContext arch) TypeContext Source #

translateModule Source #

Arguments

:: (?transOpts :: TranslationOptions)
=> HandleAllocator	Generator for nonces.
-> GlobalVar Mem	Memory model to associate with this context
-> Module	Module to translate
-> IO (Some ModuleTranslation)

Translate a module into Crucible control-flow graphs. Return the translated module and a list of warning messages generated during translation. Note: We may want to add a map from symbols to existing function handles if we want to support dynamic loading.

data LLVMTranslationWarning Source #

A warning generated during translation

Constructors

LLVMTranslationWarning
Fields Symbol Function name in which the warning was generated Position Source position where the warning was generated Text Description of the warning

Representation of LLVM constant values

data LLVMConst where Source #

Translation-time LLVM constant values.

Constructors

ZeroConst :: !MemType -> LLVMConst	A constant value consisting of all zero bits.
IntConst :: 1 <= w => !(NatRepr w) -> !(BV w) -> LLVMConst	A constant integer value, with bit-width `w`.
FloatConst :: !Float -> LLVMConst	A constant floating point value.
DoubleConst :: !Double -> LLVMConst	A constant double value.
LongDoubleConst :: !FP80Value -> LLVMConst	A constant long double value (X86_FP80)
StringConst :: !ByteString -> LLVMConst	A constant sequence of bytes
ArrayConst :: !MemType -> [LLVMConst] -> LLVMConst	A constant array value.
VectorConst :: !MemType -> [LLVMConst] -> LLVMConst	A constant vector value.
StructConst :: !StructInfo -> [LLVMConst] -> LLVMConst	A constant structure value.
SymbolConst :: !Symbol -> !Integer -> LLVMConst	A pointer value, consisting of a concrete offset from a global symbol.
UndefConst :: !MemType -> LLVMConst	The `undef` value is quite strange. See: The LLVM Language Reference, § Undefined Values.

Instances

Instances details

Show LLVMConst Source #	This also can't be derived, but is completely uninteresting.
Instance details Defined in Lang.Crucible.LLVM.Translation.Constant Methods showsPrec :: Int -> LLVMConst -> ShowS # show :: LLVMConst -> String # showList :: [LLVMConst] -> ShowS #
Eq LLVMConst Source #	The interesting cases here are: * `IntConst`: GHC can't derive this because `IntConst` existentially quantifies the integer's width. We say that two integers are equal when they have the same width and the same value. * `UndefConst`: Two `undef` values aren't necessarily the same...
Instance details Defined in Lang.Crucible.LLVM.Translation.Constant Methods (==) :: LLVMConst -> LLVMConst -> Bool # (/=) :: LLVMConst -> LLVMConst -> Bool #

boolConst :: Bool -> LLVMConst Source #

Create an LLVM constant value from a boolean.

intConst Source #

Arguments

:: MonadError String m
=> Natural	width of the integer constant, `w`
-> Integer	value of the integer constant, `n`
-> m LLVMConst

Create an LLVM constant of a given width. The resulting integer constant value will be the unsigned integer value n mod 2^w.

Translations from LLVM syntax to constant values

transConstant :: (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr) => Typed Value -> m LLVMConst Source #

transConstantWithType :: (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr) => Typed Value -> m (MemType, LLVMConst) Source #

Compute the constant value of an expression. Fail if the given value does not represent a constant.

transConstant' :: (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr) => MemType -> Value -> m LLVMConst Source #

Compute the constant value of an expression. Fail if the given value does not represent a constant.

transConstantExpr :: forall m wptr. (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr) => ConstExpr -> m LLVMConst Source #

Compute the value of a constant expression. Fails if the expression does not actually represent a constant value.

Intermediate representation for GEP

data GEP (n :: Nat) (expr :: Type) where Source #

Intermediate representation of a GEP. A GEP n expr is a representation of a GEP with n parallel vector lanes with expressions represented by expr values.

Constructors

GEP_scalar_base :: expr -> GEP 1 expr	Start a GEP with a single base pointer
GEP_vector_base :: NatRepr n -> expr -> GEP n expr	Start a GEP with a vector of `n` base pointers
GEP_scatter :: NatRepr n -> GEP 1 expr -> GEP n expr	Copy a scalar base vector pointwise into a vector of length `n`.
GEP_field :: FieldInfo -> GEP n expr -> GEP n expr	Add the offset corresponding to the given field pointwise to each pointer
GEP_index_each :: MemType -> GEP n expr -> expr -> GEP n expr	Add an offset corresponding to the given array index (multiplied by the given type size) pointwise to the pointers in each lane.
GEP_index_vector :: MemType -> GEP n expr -> expr -> GEP n expr	Given a vector of offsets (whose length must match the number of lanes), multiply each one by the type size, and add the offsets to the corresponding pointers.

Instances

Instances details

Foldable (GEP n) Source #
Instance details Defined in Lang.Crucible.LLVM.Translation.Constant Methods fold :: Monoid m => GEP n m -> m # foldMap :: Monoid m => (a -> m) -> GEP n a -> m # foldMap' :: Monoid m => (a -> m) -> GEP n a -> m # foldr :: (a -> b -> b) -> b -> GEP n a -> b # foldr' :: (a -> b -> b) -> b -> GEP n a -> b # foldl :: (b -> a -> b) -> b -> GEP n a -> b # foldl' :: (b -> a -> b) -> b -> GEP n a -> b # foldr1 :: (a -> a -> a) -> GEP n a -> a # foldl1 :: (a -> a -> a) -> GEP n a -> a # toList :: GEP n a -> [a] # null :: GEP n a -> Bool # length :: GEP n a -> Int # elem :: Eq a => a -> GEP n a -> Bool # maximum :: Ord a => GEP n a -> a # minimum :: Ord a => GEP n a -> a # sum :: Num a => GEP n a -> a # product :: Num a => GEP n a -> a #
Traversable (GEP n) Source #
Instance details Defined in Lang.Crucible.LLVM.Translation.Constant Methods traverse :: Applicative f => (a -> f b) -> GEP n a -> f (GEP n b) # sequenceA :: Applicative f => GEP n (f a) -> f (GEP n a) # mapM :: Monad m => (a -> m b) -> GEP n a -> m (GEP n b) # sequence :: Monad m => GEP n (m a) -> m (GEP n a) #
Functor (GEP n) Source #
Instance details Defined in Lang.Crucible.LLVM.Translation.Constant Methods fmap :: (a -> b) -> GEP n a -> GEP n b # (<$) :: a -> GEP n b -> GEP n a #

data GEPResult expr where Source #

The result of a GEP instruction translation. It records the number of parallel vector lanes in the resulting instruction, the resulting memory type of the instruction, and the sequence of sub-operations required to compute the GEP instruction.

Constructors

GEPResult :: 1 <= n => NatRepr n -> MemType -> GEP n expr -> GEPResult expr

Instances

Instances details

Foldable GEPResult Source #
Instance details Defined in Lang.Crucible.LLVM.Translation.Constant Methods fold :: Monoid m => GEPResult m -> m # foldMap :: Monoid m => (a -> m) -> GEPResult a -> m # foldMap' :: Monoid m => (a -> m) -> GEPResult a -> m # foldr :: (a -> b -> b) -> b -> GEPResult a -> b # foldr' :: (a -> b -> b) -> b -> GEPResult a -> b # foldl :: (b -> a -> b) -> b -> GEPResult a -> b # foldl' :: (b -> a -> b) -> b -> GEPResult a -> b # foldr1 :: (a -> a -> a) -> GEPResult a -> a # foldl1 :: (a -> a -> a) -> GEPResult a -> a # toList :: GEPResult a -> [a] # null :: GEPResult a -> Bool # length :: GEPResult a -> Int # elem :: Eq a => a -> GEPResult a -> Bool # maximum :: Ord a => GEPResult a -> a # minimum :: Ord a => GEPResult a -> a # sum :: Num a => GEPResult a -> a # product :: Num a => GEPResult a -> a #
Traversable GEPResult Source #
Instance details Defined in Lang.Crucible.LLVM.Translation.Constant Methods traverse :: Applicative f => (a -> f b) -> GEPResult a -> f (GEPResult b) # sequenceA :: Applicative f => GEPResult (f a) -> f (GEPResult a) # mapM :: Monad m => (a -> m b) -> GEPResult a -> m (GEPResult b) # sequence :: Monad m => GEPResult (m a) -> m (GEPResult a) #
Functor GEPResult Source #
Instance details Defined in Lang.Crucible.LLVM.Translation.Constant Methods fmap :: (a -> b) -> GEPResult a -> GEPResult b # (<$) :: a -> GEPResult b -> GEPResult a #

translateGEP Source #

Arguments

:: forall wptr m. (?lc :: TypeContext, MonadError String m, HasPtrWidth wptr)
=> Bool	inbounds flag
-> Type	base type for calculations
-> Typed Value	base pointer expression
-> [Typed Value]	index arguments
-> m (GEPResult (Typed Value))

Given the data for an LLVM getelementpointer instruction, preprocess the instruction into a GEPResult, checking types, computing vectorization lanes, etc.

As a concrete example, consider a call to translateGEP inbounds baseTy basePtr elts with the following instruction:

getelementptr [12 x i8], ptr %aptr, i64 0, i32 1

Here:

inbounds is False, as the keyword of the same name is missing from the instruction. (Currently, crucible-llvm ignores this information.)
baseTy is [12 x i8]. This is the type used as the basis for subsequent calculations.
basePtr is ptr %aptr. This pointer is used as the base address to start calculations from. Note that the type of basePtr is not baseTy, but rather a pointer type.
The elts are [i64 0, i32 1]. These are the indices that indicate which of the elements of the aggregate object are indexed.

Utility functions

showInstr :: Instr -> String Source #

Pretty print an LLVM instruction

testBreakpointFunction :: String -> Bool Source #

data TranslationOptions Source #

This datatype encodes a variety of tweakable settings that apply during LLVM translation.

Constructors

TranslationOptions

Fields

debugIntrinsics :: !Bool
Should we translate llvm.dbg intrinsic statements? The upside to translating them is that they encode debugging information about a program that can be useful for external clients. The downside is that they can bloat the size of translated programs, despite being no-ops during simulation.
laxArith :: !Bool
Should we omit proof obligations related to arithmetic overflow and similar assertions?
optLoopMerge :: !Bool
Should we insert merge blocks in loops with early exits (i.e. breaks or returns)? This may improve simulation performance.

defaultTranslationOptions :: TranslationOptions Source #

The default translation options:

Do not translate llvm.dbg intrinsic statements
Do not produce proof obligations for arithmetic-related assertions.
Do not insert merge blocks in loops with early exits.

debugIntrinsicsTranslationOptions :: TranslationOptions Source #

Like defaultTranslationOptions, but llvm.dbg intrinsic statements are translated.

type VarArgs = VectorType AnyType Source #

varArgsRepr :: TypeRepr VarArgs Source #

llvmTypesAsCtx :: forall a wptr. HasPtrWidth wptr => [MemType] -> (forall ctx. CtxRepr ctx -> a) -> a Source #

Translate a list of LLVM expressions into a crucible type context, which is passed into the given continuation.

llvmTypeAsRepr :: forall a wptr. HasPtrWidth wptr => MemType -> (forall tp. TypeRepr tp -> a) -> a Source #

Translate an LLVM type into a crucible type, which is passed into the given continuation

llvmRetTypeAsRepr :: forall a wptr. HasPtrWidth wptr => Maybe MemType -> (forall tp. TypeRepr tp -> a) -> a Source #

Translate an LLVM return type into a crucible type, which is passed into the given continuation

llvmDeclToFunHandleRepr :: HasPtrWidth wptr => FunDecl -> (forall args ret. CtxRepr args -> TypeRepr ret -> a) -> a Source #

Compute the function Crucible function signature that corresponds to the given LLVM function declaration.

llvmDeclToFunHandleRepr' :: (?lc :: TypeContext, HasPtrWidth wptr, MonadFail m) => Declare -> (forall args ret. CtxRepr args -> TypeRepr ret -> m a) -> m a Source #

type LLVMPointerType w = IntrinsicType "LLVM_pointer" (EmptyCtx ::> BVType w) Source #

Crucible type of pointers/bitvector values of width w.

pattern PtrWidth :: HasPtrWidth w => w ~ w' => NatRepr w' Source #

pattern LLVMPointerRepr :: () => (1 <= w, ty ~ LLVMPointerType w) => NatRepr w -> TypeRepr ty Source #

This pattern synonym makes it easy to build and destruct runtime representatives of LLVMPointerType w.

declareFromDefine :: Define -> Declare Source #

allModuleDeclares :: Module -> [Declare] Source #

Return all declarations derived from both external symbols and internal definitions.

liftMemType :: (?lc :: TypeContext, MonadError String m) => Type -> m MemType Source #

liftRetType :: (?lc :: TypeContext, MonadError String m) => Type -> m RetType Source #

liftDeclare :: (?lc :: TypeContext, MonadError String m) => Declare -> m FunDecl Source #