Emulator
Ghidra features a powerful PCode emulator, which can be used to emulate whole functions or pieces of code.
Ghidralib wraps this emulator with a class called Emulator
. The basic usage is
as follows:
emu = Emulator()
emu.emulate(0x400000, 0x400010)
print(emu["eax"])
Basics
Looks easy enough, now let's try it in practice. Create and compile a following C program:
#include <stdio.h>
#include <stdlib.h>
int hash(int value) {
return (value * 10) + (value ^ 7);
}
void check(int value) {
if (hash(value) == 189) {
printf("Success!");
}
}
int main(int argc, char *argv[]) {
if (argc != 2) { return 1; }
check(atoi(argv[1]));
return 0;
}
Compile it with gcc -O0 test.c -o test
. Make sure to disable optimisation with
-O0
, so the functions are not optimized out. Now load this program to Ghidra,
and open interactive console. And check that everything is in order:
>>> from ghidralib import *
>>> print(hex(Function("main").address))
0x401190
>>> print(hex(Function("check").address))
0x401159
>>> print(hex(Function("hash").address))
0x401136
Now the main point, let's try the Emulator. We may want to emulate the hash
function
to check how it behaves. For emulation we need a start and end address. You can copy
it from the listing view, but for the demonstration I will print that with python too:
hash_function = Function("hash")
>>> for instr in hash_function.instructions:
... print("0x{:x} {}".format(instr.address, instr))
...
0x401136 PUSH RBP
0x401137 MOV RBP,RSP
0x40113a MOV dword ptr [RBP + -0x4],EDI
0x40113d MOV EDX,dword ptr [RBP + -0x4]
0x401140 MOV EAX,EDX
0x401142 SHL EAX,0x2
0x401145 ADD EAX,EDX
0x401147 ADD EAX,EAX
0x401149 MOV EDX,EAX
0x40114b MOV EAX,dword ptr [RBP + -0x4]
0x40114e XOR EAX,0x7
0x401151 ADD EAX,EDX
0x401153 POP RBP
0x401154 XOR EDX,EDX
0x401156 XOR EDI,EDI
0x401158 RET
Note: we assume Linux x64 ABI everywhere. If you're on Windows or another architecture, you'll need to adjust the code (especially register names).
So we want to emulate between 0x401136 and 0x401158, and the parameter is in EDI.
>>> emu = Emulator()
>>> emu["RDI"] = 10 # We asume Linux x64 ABI
>>> emu.emulate(0x401136, 0x401158)
>>> print(emu["RAX"])
113
Great, we successfully emulated our first function. Instead of using hardcoded addresses, it's usually easier to use object attributes. This is equivalent:
>>> emu = Emulator()
>>> emu["RDI"] = 10
>>> emu.emulate(hash_function.address, hash_function.exitpoints)
>>> print(emu["RAX"])
113
Function.exitpoints is a list that contains all function exit points - perfect for our use-case here.
By the way, instead of indexing like emu["EAX"]
you can use emu.read_register
and emu.write_register
. Consider using the more verbose format when writing
reusable scripts, but emu["EAX"]
is faster to type when working interactively.
Hooks
Often we are interested in the details of the execution, and we want to
process every instruction in some way. We can easily do this using the callback
parameter:
>>> emu = Emulator()
>>> def print_callback(emu):
... instr = Instruction(emu.pc)
... print("executing 0x{:x} {}".format(emu.pc, instr))
>>> emu.emulate(hash_function.address, hash_function.exitpoints, callback=print_callback)
executing 0x401136 PUSH RBP
executing 0x401137 MOV RBP,RSP
executing 0x40113a MOV dword ptr [RBP + -0x4],EDI
executing 0x40113d MOV EDX,dword ptr [RBP + -0x4]
executing 0x401140 MOV EAX,EDX
executing 0x401142 SHL EAX,0x2
executing 0x401145 ADD EAX,EDX
executing 0x401147 ADD EAX,EAX
executing 0x401149 MOV EDX,EAX
executing 0x40114b MOV EAX,dword ptr [RBP + -0x4]
executing 0x40114e XOR EAX,0x7
executing 0x401151 ADD EAX,EDX
executing 0x401153 POP RBP
executing 0x401154 XOR EDX,EDX
executing 0x401156 XOR EDI,EDI
You can change the emulator context in the hook, and you can control the execution using the callback return value. In particular, you can return:
continue
to continue execution normally (this is the default)break
to stop execution immediatelycontinue_then_break
to stop execution after executing the current instructionskip
to skip the current instruction and execute the one immediately after itretry
means that emulator should try to execute the same instruction again. This is only useful if you changed PC in the callback and want to reevaluate it.
So for example, instead of providing the return address directly you can
execute until the ret
instruction:
>>> emu = Emulator()
>>> def execute_until_ret(emu):
... instr = Instruction(emu.pc)
... if instr.mnemonic == "RET":
... return "break"
... return "continue"
>>> emu["RDI"] = 10
>>> emu.emulate(hash_function.address, callback=execute_until_ret)
>>> print(emu["RAX"])
113
By the way, as a reminder, you can use a symbol name almost everywhere instead of
an address. For example, hash_function.address
is equivalent to "hash" and you
can as well do
>>> emu.emulate("hash", callback=execute_until_ret)
This is probably not a good idea in serious scripts, but it's a nice trick for quick hacks.
Exercise: Use your knowledge of the emulator to find a value that will make the
hash
function return 189. Hint: emulate hash
in a for loop and check the return value.
Hooks and external functions
Another thing we can do with hooks is dealing with calls
. For example, check
function looks like this:
>>> for instr in Function("check").instructions:
... print("0x{:x} {}".format(instr.address, instr))
...
0x401159 PUSH RBP
0x40115a MOV RBP,RSP
0x40115d SUB RSP,0x10
0x401161 MOV dword ptr [RBP + -0x4],EDI
0x401164 MOV EAX,dword ptr [RBP + -0x4]
0x401167 MOV EDI,EAX
0x401169 CALL 0x00401136
0x40116e CMP EAX,0xbd
0x401173 JNZ 0x00401189
0x401175 LEA RAX,[0x402004]
0x40117c MOV RDI,RAX
0x40117f MOV EAX,0x0
0x401184 CALL 0x00401030
0x401189 NOP
0x40118a LEAVE
0x40118b XOR EAX,EAX
0x40118d XOR EDI,EDI
0x40118f RET
The first call is to hash
function, but the second one is to printf
. We can't easily
emulate this function, because it's outside of the current program. To avoid crashing
the emulation, we can use the hook to skip the calls:
>>> check_function = Function("check")
>>> def skip_calls(emu):
... instr = Instruction(emu.pc)
... if instr.mnemonic == "CALL":
... return "skip"
... return "continue"
But let's do something else: let's emulate check
function until the CALL hash
instruction,
but skip the call and just return 189
directly. Then emulate until the CALL printf
instruction,
and print the parameter passed to printf
. The callback gets more complicated now:
>>> def emulate_check(emu):
... instr = Instruction(emu.pc)
... if instr.address == 0x401169:
... emu["RAX"] = 189
... return "skip"
... if instr.address == 0x401184:
... string_addr = emu["RDI"] # cstring parameter is in RDI (linux ABI)
... print(emu.read_cstring(string_addr))
... return "break"
>>> check_function = Function("check")
>>> emu = Emulator()
>>> emu.emulate(check_function.address, callback=emulate_check)
Success!
We successfully "tricked" the check function into executing the "success"
branch and trying to print the Success!
string.
But this code is not very nice. We can use emulator hooks to make it clearer. Hooks are pieces of code that can be automatically executed at certain points during emulation. You can register a hook for a specific address:
def printf_hook(emu):
arg = emu.read_cstring(emu["RDI"])
print("printf called with '{}'".format(arg))
return "break"
def hash_hook(emu):
emu["RAX"] = 189
emu.pc = emu.read_u64(emu.sp)
emu.sp += 8
emu = Emulator()
emu.add_hook("printf", printf_hook)
emu.add_hook("hash", hash_hook)
emu.emulate("check")
Note that we again (ab)use automatic symbol resolution here. The last three lines are equivalent to:
emu.add_hook(Symbol("printf").address, printf_hook)
emu.add_hook(Symbol("hash").address, hash_hook)
emu.emulate(Symbol("check").address)
Exercise: Create a hook for atoi
function that will simulate the libc function -
it should parse the string from the parameter and return it in RAX. Test it by emulating
the "call atoi" instruction with a string parameter.
State inspection
Of course, after emulation we are interested in the final state of the emulator.
We already showcased reading and writing registers, and we used read_cstring
function
in the hook. There are also other useful functions:
emu.read_register(reg)
andemu.write_register(reg, val)
- read or write a registeremu[reg]
andemu[reg] = val
- read or write a register, short versionemu.read_u64(addr)
andemu.write_u64(addr, val)
- read or write a 64-bit value at a given addressemu.read_u32(addr)
andemu.write_u32(addr, val)
- read or write a 32-bit value at a given addressemu.read_u16(addr)
andemu.write_u16(addr, val)
- read or write a 16-bit value at a given addressemu.read_u8(addr)
andemu.write_u8(addr, val)
- read or write an 8-bit value at a given addressemu.read_bytes(addr, size)
- readsize
bytes fromaddr
emu.write_bytes(addr, bytes)
- write the givenbytes
toaddr
emu.read_cstring(addr)
- read bytes starting fromaddr
until a null byte is found.emu.read_unicode(addr)
- read 16bit chars starting fromaddr
until a null character is found.emu.read_varnode
andemu.write_varnode
- read or write a varnode
They should all be self-explanatory, except the last one. Varnodes are a Ghidra term for an almost arbitrary value. In particular, Function signature contains information about how variables and parameters map to varnodes:
>>> Function("hash").return_variable
[int <RETURN>@EAX:4]
>>> Function("hash").return_variable.varnode
(register, 0x0, 4)
>>> Function("hash").parameters
[[uint param_1@EDI:4]]
>>> Function("hash").parameters[0].varnode
(register, 0x38, 4)
You can use read_varnode
and write_varnode
to manipulate these values in a pretty generic way.
For example, this is levaraged by Function.emulate
, to emulate functions in a very generic way:
>>> Function("hash").emulate_simple(10)
113
It doesn't get any easier than that. The simple
in the name refers to the return value -
in many cases you will want to use Function.emulate
to get the whole context of the
emulator after execution.
Exercise: Complete the atoi
hook from the previous exercise first. Then create an emulator,
add printf
and atoi
hooks, and execute a main
function with the correct parameters.
This will require you to pass correct argc
and argv
parameters.
Misc features
maxsteps
When you emulate a function, you may want to limit the number of steps it can take:
>>> emu = Emulator()
>>> def callback(emu):
>>> print("executing {:x}'.format(emu.pc))
>>> emu.trace(Function("main").entrypoint, callback=callback, maxsteps=3)
SUB ESP,0x2d4
PUSH EBX
PUSH EBP
Especially if you're emulatingh random pieces of code, setting maxsteps to something reasonable (like 2000 instructions) may save you from accidentaly executing an infinite loops.
Breakpoints
You can set and remove breakpoints using add_breakpoint
and clear_breakpoint
methods
emulate_fast
Ghidra emulator is not very fast, but ghidralib emulate
is even slower - because
we support callbacks, we need to go back and forth between Python and Java.
To make things faster, you can use the emulate_fast
function. It keeps the
main loop of the emulation in Java, which may matter in some cases.
The downside is that it doesn't support callbacks or instruction counting -
you can only emulate until a specific address. As an upside, function hooks
are supprted.
Emulation shorthands
To save precious keystrokes you may combine creating an emulator, running it, and inspecting the result into one step with:
>>> Emulator.new("main", maxsteps=100)["EAX"]
128
This convenience wrapper is equivalent to the following code:
>>> emu = Emulator()
>>> emu.emulate("main", maxsteps=100)
>>> emu["EAX"]
128
Some other objects also provide helpers to do the obvious thing with emulator. For example, you can emulate a function call with:
>>> emu = Function("test").emulate(10)
>>> emu["EAX"]
113
>>> # Or an even shorter version
>>> Function("test").emulate_simple(10)
113
Unicorn compatibility
There is a very, very thin compatibility layer with Unicorn. There are aliases
provided for the following Unicorn methods: reg_write
, reg_read
, mem_write
,
mem_read
, mem_map
, emu_start
. Why? The idea is that many people already
know Unicorn. It may make it a tiny bit easier for them if they can use familiar
method names instead of learning a completely new set.
The goal is not to provide actual compatibility layer - Unicorn is a very different
library and ghidralib
won't replace it. The only goal is really so Unicorn users
can use familiar names if they forget ghidralib equivalents. If you are not
an Unicorn user, don't use them.
Learn more
Check out relevant examples in the examples
directory, especially: