Table of Contents
Depending on your choice of install location, you may need to have root privileges to do a make install. The install scripts copy the necessary header and library files to appropriate locations on the system.$
./configure
$
make
$
make install
Example 1. libudis86 Usage Example
#include <stdio.h> #include <udis86.h> int main() { ud_t ud_obj; ud_init(&ud_obj); ud_set_input_file(&ud_obj, stdin); ud_set_mode(&ud_obj, 64); ud_set_syntax(&ud_obj, UD_SYN_INTEL); while (ud_disassemble(&ud_obj)) { printf("\t%s\n", ud_insn_asm(&ud_obj)); } return 0; }
This example should give you an idea of how this library can be used. The following sections describe, in detail, the complete API of libudis86.$
gcc -ludis86 example.c -o example
ud_t
(struct ud
). So, to use libudis86
you must create an instance of this object,
ud_t ud_obj;and initialize it,
ud_init(&ud_obj);You can create multiple such objects and use with the library, each one maintaining it's own disassembly state.
libudis86 exposes decoded instructions in an intermediate form meant
to be useful for programs that want to examine them. This intermediate form
is available as values of certain fields of the ud_t
udis86 object used to disassemble the instruction, as described below.
The program counter (eip/rip) value at which the instruction was
decoded, is available in ud_obj.pc
Prefix bytes that affect the disassembly of the instruction are availabe in the following fields, each of which corressponding to particular type or class of prefixes.
ud_obj.pfx_rex
- 64-bit mode REX prefixud_obj.pfx_seg
- Segment register prefixud_obj.pfx_opr
- Operand-size prefix (66h)ud_obj.pfx_adr
- Address-size prefix (67h)ud_obj.pfx_lock
- Lock prefixud_obj.pfx_rep
- Rep prefixud_obj.pfx_repe
- Repe prefixud_obj.pfx_repne
- Repne prefix
These fields default to UD_NONE
if the respective
prefixes were not found.
The instruction mnemonic in the form of an enumerated constant
(enum ud_mnemonic_code
) is available in
ud_obj.mnemonic
. As a convention all mnemonic
constants are composed by prefixing standard instruction mnemonics
with UD_I
. For example,
UD_Imov
,
UD_Ixor
,
UD_Ijmp
, etc.
The intermediate form for instruction operands are availabe as
an array of objects of type struct ud_operand
.
Given a udis86 object ud_obj
, the
n
th operand is availabe in
ud_obj.operand[n]
.
struct ud_operand
has the following fields,
type
size
base
index
scale
offset
lval
The type
and size
fields
determine the type and size of the operand, respectively. The
possible types of operands are,
UD_NONE
No operand.
UD_OP_MEM
Memory operand. The intermediate form normalizes all memory
address equations to the scale-index-base form. The address
equation is availabe in
base
,
index
, and
scale
.
If the offset
field has a non-zero value
(one of 8, 16, 32, and 64), lval
will
contain the memory offset. Note that base
and index
fields contain the base and
index register of the address equation, in the form of an
enumerated constant enum ud_type
.
scale
contains an integer value that
the index register must be scaled by.
UD_OP_PTR
A Segmet:Offset pointer operand.
size
can have two values 32 (for 16:16 seg:off)
and 48 (for 16:32 seg:off). The value is available in
lval
(lval.ptr.seg
and lval.ptr.off
.)
UD_OP_IMM
Immediate operand. Value available in lval
.
UD_OP_JIMM
Immediate operand to branch instruction (relative offsets).
Value available in lval
.
UD_OP_CONST
Implicit constant operand.
Value available in lval
.
UD_OP_REG
Operand is a register. The specific register is contained in
base
in the form of an enumerated constant,
enum ud_type
.
The lval
is a union data structure that
aggregates integer fields of different sizes, that store values
depending on the type of operand.
lval.sbyte
- Signed Bytelval.ubyte
- Unsigned Bytelval.sword
- Signed Wordlval.uword
- Unsigned Wordlval.sdword
- Signed Double Wordlval.udword
- Unsigned Double Wordlval.sqword
- Signed Quad Wordlval.uqword
- Unsigned Quad Wordlval.ptr.seg
- Pointer Segment in Segment:Offsetlval.ptr.off
- Pointer Offset in Segment:Offset
The following enumerated constants (enum ud_type
)
are possible values for base
and index
.
Note that a value of UD_NONE
simply means that the
field is not valid for the current instruction.
UD_NONE, /* 8 bit GPRs */ UD_R_AL, UD_R_CL, UD_R_DL, UD_R_BL, UD_R_AH, UD_R_CH, UD_R_DH, UD_R_BH, UD_R_SPL, UD_R_BPL, UD_R_SIL, UD_R_DIL, UD_R_R8B, UD_R_R9B, UD_R_R10B, UD_R_R11B, UD_R_R12B, UD_R_R13B, UD_R_R14B, UD_R_R15B, /* 16 bit GPRs */ UD_R_AX, UD_R_CX, UD_R_DX, UD_R_BX, UD_R_SP, UD_R_BP, UD_R_SI, UD_R_DI, UD_R_R8W, UD_R_R9W, UD_R_R10W, UD_R_R11W, UD_R_R12W, UD_R_R13W, UD_R_R14W, UD_R_R15W, /* 32 bit GPRs */ UD_R_EAX, UD_R_ECX, UD_R_EDX, UD_R_EBX, UD_R_ESP, UD_R_EBP, UD_R_ESI, UD_R_EDI, UD_R_R8D, UD_R_R9D, UD_R_R10D, UD_R_R11D, UD_R_R12D, UD_R_R13D, UD_R_R14D, UD_R_R15D, /* 64 bit GPRs */ UD_R_RAX, UD_R_RCX, UD_R_RDX, UD_R_RBX, UD_R_RSP, UD_R_RBP, UD_R_RSI, UD_R_RDI, UD_R_R8, UD_R_R9, UD_R_R10, UD_R_R11, UD_R_R12, UD_R_R13, UD_R_R14, UD_R_R15, /* segment registers */ UD_R_ES, UD_R_CS, UD_R_SS, UD_R_DS, UD_R_FS, UD_R_GS, /* control registers*/ UD_R_CR0, UD_R_CR1, UD_R_CR2, UD_R_CR3, UD_R_CR4, UD_R_CR5, UD_R_CR6, UD_R_CR7, UD_R_CR8, UD_R_CR9, UD_R_CR10, UD_R_CR11, UD_R_CR12, UD_R_CR13, UD_R_CR14, UD_R_CR15, /* debug registers */ UD_R_DR0, UD_R_DR1, UD_R_DR2, UD_R_DR3, UD_R_DR4, UD_R_DR5, UD_R_DR6, UD_R_DR7, UD_R_DR8, UD_R_DR9, UD_R_DR10, UD_R_DR11, UD_R_DR12, UD_R_DR13, UD_R_DR14, UD_R_DR15, /* mmx registers */ UD_R_MM0, UD_R_MM1, UD_R_MM2, UD_R_MM3, UD_R_MM4, UD_R_MM5, UD_R_MM6, UD_R_MM7, /* x87 registers */ UD_R_ST0, UD_R_ST1, UD_R_ST2, UD_R_ST3, UD_R_ST4, UD_R_ST5, UD_R_ST6, UD_R_ST7, /* extended multimedia registers */ UD_R_XMM0, UD_R_XMM1, UD_R_XMM2, UD_R_XMM3, UD_R_XMM4, UD_R_XMM5, UD_R_XMM6, UD_R_XMM7, UD_R_XMM8, UD_R_XMM9, UD_R_XMM10, UD_R_XMM11, UD_R_XMM12, UD_R_XMM13, UD_R_XMM14, UD_R_XMM15, /* eip/rip */ UD_R_RIP
void ud_init (ud_t* ud_obj)
ud_t
object initializer. This function must be called on a
udis86 object before it can used anywhere else.
void ud_set_input_hook(ud_t* ud_obj, int (*hook)(ud_t*))
This function sets the input source for the library. To retrieve each byte in
the stream, libudis86 calls back the function pointed to by hook
.
The hook function, defined by the user code, must return a single byte of code
each time it is called. To signal end-of-input, it must return the constant,
UD_EOI
.
void ud_set_user_opaque_data(ud_t* ud_obj, void* opaque);
Associates a pointer with the udis86 object to be retrieved and used in user functions, such as the input hook callback function.
void* ud_get_user_opaque_data(ud_t* ud_obj);
This function returns any pointer associated with the udis86 object, using
the ud_set_opaque_data
function.
void ud_set_input_buffer(ud_t* ud_obj, unsigned char* buffer, size_t size);
Sets the input source for the library to a buffer of fixed size.
void ud_set_input_file(ud_t* ud_obj, FILE* filep);
This function sets the input source for the library to a file pointed to by the passed FILE pointer. Note that the library does not perform any checks, assuming the file pointer to be properly initialized.
void ud_set_mode(ud_t* ud_obj, uint8_t mode_bits);
Sets the mode of disassembly. Possible values are 16, 32, and 64. By default, the library works in 32bit mode.
void ud_set_pc(ud_t*, uint64_t pc);
Sets the program counter (EIP/RIP). This changes the offset of the assembly output generated, with direct effect on branch instructions.
void ud_set_syntax(ud_t*, void (*translator)(ud_t*));
libudis86 disassembles one instruction at a time into an intermediate form that lets you inspect the instruction and its various aspects individually. But to generate the assembly language output, this intermediate form must be translated. This function sets the translator. There are two inbuilt translators,
UD_SYN_INTEL
- for INTEL (NASM-like) syntax.UD_SYN_ATT
- for AT&T (GAS-like) syntax.If you do not want libudis86 to translate, you can pass
NULL
to the function, with no more translations
thereafter. This is particularly useful for cases when you only
want to identify chunks of code and then create the assembly output
if needed.
If you want to create your own translator, you must pass a pointer to function that accepts a pointer to ud_t. This function will be called by libudis86 after each instruction is decoded.
void ud_set_vendor(ud_t*, unsigned vendor);
Sets the vendor of whose instruction to choose from. This is only useful for selecting the VMX or SVM instruction sets at which point INTEL and AMD have diverged significantly. At a later stage, support for a more granular selection of instruction sets maybe added.
UD_VENDOR_INTEL
- for INTEL instruction set.UD_VENDOR_AMD
- for AMD instruction set.
unsigned int ud_disassemble(ud_t*);
Disassembles the next instruction in the input stream. Returns the number of bytes disassembled. A 0 indicates end of input. Note, to restart disassembly, after the end of input, you must call one of the input setting functions with the new input source.
unsigned int ud_insn_len(ud_t* u);
Returns the number of bytes disassembled.
uint64_t ud_insn_off(ud_t*);
Returns the starting offset of the disassembled instruction relative to the program counter value specified initially.
char* ud_insn_hex(ud_t*);
Returns pointer to character string holding the hexadecimal representation of the disassembled bytes.
uint8_t* ud_insn_ptr(ud_t* u);
Returns pointer to the buffer holding the instruction bytes.
Use ud_insn_len()
, to determine the length of this
buffer.
char* ud_insn_asm(ud_t* u);
If the syntax is specified, returns pointer to the character string holding assembly language representation of the disassembled instruction.
void ud_input_skip(ud_t*, size_t n);
Skips n number of bytes in the input stream