Radare2 has a very rich set of commands and configuration options to perform data and code analysis, to extract useful information from a binary, like pointers, string references, basic blocks, opcode data, jump targets, cross references and much more. These operations are handled by the a (analyze) command family:
|Usage: a[abdefFghoprxstc] [...]
| aa[?] analyze all (fcns + bbs) (aa0 to avoid sub renaming)
| a8 [hexpairs] analyze bytes
| ab[b] [addr] analyze block at given address
| abb [len] analyze N basic blocks in [len] (section.size by default)
| abt [addr] find paths in the bb function graph from current offset to given address
| ac [cycles] analyze which op could be executed in [cycles]
| ad[?] analyze data trampoline (wip)
| ad [from] [to] analyze data pointers to (from-to)
| ae[?] [expr] analyze opcode eval expression (see ao)
| af[?] analyze Functions
| aF same as above, but using anal.depth=1
| ag[?] [options] draw graphs in various formats
| ah[?] analysis hints (force opcode size, ...)
| ai [addr] address information (show perms, stack, heap, ...)
| an [name] [@addr] show/rename/create whatever flag/function is used at addr
| ao[?] [len] analyze Opcodes (or emulate it)
| aO[?] [len] Analyze N instructions in M bytes
| ap find prelude for current offset
| ar[?] like 'dr' but for the esil vm. (registers)
| as[?] [num] analyze syscall using dbg.reg
| av[?] [.] show vtables
| ax[?] manage refs/xrefs (see also afx?)
In fact, a namespace is one of the biggest in radare2 tool and allows to control very different parts of the analysis:
• Code flow analysis
• Data references analysis
• Using loaded symbols
• Managing different type of graphs, like CFG and call graph
• Manage variables
• Manage types
• Emulation using ESIL VM
• Opcode introspection
• Objects information, like virtual tables
Code Analysis
Code analysis is a common technique used to extract information from assembly code.
Radare2 has different code analysis techniques implemented in the core and available in different commands.
As long as the whole functionalities of r2 are available with the API as well as using commands. This gives you the ability to implement your own analysis loops using any programming language, even with r2 oneliners, shellscripts, or analysis or core native plugins.
The analysis will show up the internal data structures to identify basic blocks, function trees and to extract opcode-level information.
The most common radare2 analysis command sequence is aa, which stands for "analyze all". That all is referring to all symbols and entry-points. If your binary is stripped you will need to use other commands like aaa, aab, aar, aac or so.
Take some time to understand what each command does and the results after running them to find the best one for your needs.
[0x08048440]> aa
[0x08048440]> pdf @ main
; DATA XREF from 0x08048457 (entry0)
/ (fcn) fcn.08048648 141
| ;-- main:
| 0x08048648 8d4c2404 lea ecx, [esp+0x4]
| 0x0804864c 83e4f0 and esp, 0xfffffff0
| 0x0804864f ff71fc push dword [ecx-0x4]
| 0x08048652 55 push ebp
| ; CODE (CALL) XREF from 0x08048734 (fcn.080486e5)
| 0x08048653 89e5 mov ebp, esp
| 0x08048655 83ec28 sub esp, 0x28
| 0x08048658 894df4 mov [ebp-0xc], ecx
| 0x0804865b 895df8 mov [ebp-0x8], ebx
| 0x0804865e 8975fc mov [ebp-0x4], esi
| 0x08048661 8b19 mov ebx, [ecx]
| 0x08048663 8b7104 mov esi, [ecx+0x4]
| 0x08048666 c744240c000. mov dword [esp+0xc], 0x0
| 0x0804866e c7442408010. mov dword [esp+0x8], 0x1 ; 0x00000001
| 0x08048676 c7442404000. mov dword [esp+0x4], 0x0
| 0x0804867e c7042400000. mov dword [esp], 0x0
| 0x08048685 e852fdffff call sym..imp.ptrace
| sym..imp.ptrace(unk, unk)
| 0x0804868a 85c0 test eax, eax
| ,=< 0x0804868c 7911 jns 0x804869f
| | 0x0804868e c70424cf870. mov dword [esp], str.Don_tuseadebuguer_ ; 0x080487cf
| | 0x08048695 e882fdffff call sym..imp.puts
| | sym..imp.puts()
| | 0x0804869a e80dfdffff call sym..imp.abort
| | sym..imp.abort()
| `-> 0x0804869f 83fb02 cmp ebx, 0x2
|,==< 0x080486a2 7411 je 0x80486b5
|| 0x080486a4 c704240c880. mov dword [esp], str.Youmustgiveapasswordforusethisprogram_ ; 0x0804880c
|| 0x080486ab e86cfdffff call sym..imp.puts
|| sym..imp.puts()
|| 0x080486b0 e8f7fcffff call sym..imp.abort
|| sym..imp.abort()
|`--> 0x080486b5 8b4604 mov eax, [esi+0x4]
| 0x080486b8 890424 mov [esp], eax
| 0x080486bb e8e5feffff call fcn.080485a5
| fcn.080485a5() ; fcn.080484c6+223
| 0x080486c0 b800000000 mov eax, 0x0
| 0x080486c5 8b4df4 mov ecx, [ebp-0xc]
| 0x080486c8 8b5df8 mov ebx, [ebp-0x8]
| 0x080486cb 8b75fc mov esi, [ebp-0x4]
| 0x080486ce 89ec mov esp, ebp
| 0x080486d0 5d pop ebp
| 0x080486d1 8d61fc lea esp, [ecx-0x4]
\ 0x080486d4 c3 ret
In this example, we analyze the whole file (aa) and then print disassembly of the main() function (pdf). The aa command belongs to the family of auto analysis commands and performs only the most basic auto analysis steps. In radare2 there are many different types of the auto analysis commands with a different analysis depth, including partial emulation: aa, aaa, aab, aaaa, ... There is also a mapping of those commands to the r2 CLI options: r2 -A, r2 -AA, and so on.
It is a common sense that completely automated analysis can produce non sequitur results, thus radare2 provides separate commands for the particular stages of the analysis allowing fine-grained control of the analysis process. Moreover, there is a treasure trove of configuration variables for controlling the analysis outcomes. You can find them in anal.* and emu.* cfg variables' namespaces.
Analyze functions
One of the most important "basic" analysis commands is the set of af subcommands. af means "analyze function". Using this command you can either allow automatic analysis of the particular function or perform completely manual one.
[0x00000000]> af?
Usage: af
| af ([name]) ([addr]) analyze functions (start at addr or $$)
| afr ([name]) ([addr]) analyze functions recursively
| af+ addr name [type] [diff] hand craft a function (requires afb+)
| af- [addr] clean all function analysis data (or function at addr)
| afa analyze function arguments in a call (afal honors dbg.funcarg)
| afb+ fcnA bbA sz [j] [f] ([t]( [d])) add bb to function @ fcnaddr
| afb[?] [addr] List basic blocks of given function
| afbF([0|1]) Toggle the basic-block 'folded' attribute
| afB 16 set current function as thumb (change asm.bits)
| afC[lc] ([addr])@[addr] calculate the Cycles (afC) or Cyclomatic Complexity (afCc)
| afc[?] type @[addr] set calling convention for function
| afd[addr] show function + delta for given offset
| afF[1|0|] fold/unfold/toggle
| afi [addr|fcn.name] show function(s) information (verbose afl)
| afj [tableaddr] [count] analyze function jumptable
| afl[?] [ls*] [fcn name] list functions (addr, size, bbs, name) (see afll)
| afm name merge two functions
| afM name print functions map
| afn[?] name [addr] rename name for function at address (change flag too)
| afna suggest automatic name for current offset
| afo[?j] [fcn.name] show address for the function name or current offset
| afs[!] ([fcnsign]) get/set function signature at current address (afs! uses cfg.editor)
| afS[stack_size] set stack frame size for function at current address
| afsr [function_name] [new_type] change type for given function
| aft[?] type matching, type propagation
| afu addr resize and analyze function from current address until addr
| afv[absrx]? manipulate args, registers and variables in function
| afx list function references
You can use afl to list the functions found by the analysis.
There are a lot of useful commands under afl such as aflj, which lists the function in JSON format and aflm, which lists the functions in the syntax found in makefiles.
There's also afl=, which displays ASCII-art bars with function ranges.
You can find the rest of them under afl?.
Some of the most challenging tasks while performing a function analysis are merge, crop and resize. As with other analysis commands you have two modes: semi-automatic and manual. For the semi-automatic, you can use afm
Apart from those semi-automatic ways to edit/analyze the function, you can hand craft it in the manual mode with af+ command and edit basic blocks of it using afb commands. Before changing the basic blocks of the function it is recommended to check the already presented ones:
[0x00003ac0]> afb
0x00003ac0 0x00003b7f 01:001A 191 f 0x00003b7f
0x00003b7f 0x00003b84 00:0000 5 j 0x00003b92 f 0x00003b84
0x00003b84 0x00003b8d 00:0000 9 f 0x00003b8d
0x00003b8d 0x00003b92 00:0000 5
0x00003b92 0x00003ba8 01:0030 22 j 0x00003ba8
0x00003ba8 0x00003bf9 00:0000 81
Hand craft function
before start, let's prepare a binary file first, for example:
int code_block()
{
int result = 0;
for(int i = 0; i < 10; ++i)
result += 1;
return result;
}
then compile it with gcc -c example.c -m32 -O0 -fno-pie, we will get the object file example.o. open it with radare2.
since we haven't analyzed it yet, the pdf command will not print out the disassembly here:
$ r2 example.o
[0x08000034]> pdf
p: Cannot find function at 0x08000034
[0x08000034]> pd
;-- section..text:
;-- .text:
;-- code_block:
;-- eip:
0x08000034 55 push ebp ; [01] -r-x section size 41 named .text
0x08000035 89e5 mov ebp, esp
0x08000037 83ec10 sub esp, 0x10
0x0800003a c745f8000000. mov dword [ebp - 8], 0
0x08000041 c745fc000000. mov dword [ebp - 4], 0
,=< 0x08000048 eb08 jmp 0x8000052
.--> 0x0800004a 8345f801 add dword [ebp - 8], 1
:| 0x0800004e 8345fc01 add dword [ebp - 4], 1
:`-> 0x08000052 837dfc09 cmp dword [ebp - 4], 9
`==< 0x08000056 7ef2 jle 0x800004a
0x08000058 8b45f8 mov eax, dword [ebp - 8]
0x0800005b c9 leave
0x0800005c c3 ret
our goal is to hand craft a function with the following structure
create a function at 0x8000034 named code_block:
[0x8000034]> af+ 0x8000034 code_block
In most cases, we use jump or call instructions as code block boundaries. so the range of first block is from 0x08000034 push ebp to 0x08000048 jmp 0x8000052. use afb+ command to add it.
[0x08000034]> afb+ code_block 0x8000034 0x800004a-0x8000034 0x8000052
note that the basic syntax of afb+ is afb+ function_address block_address block_size [jump] [fail]. the final instruction of this block points to a new address(jmp 0x8000052), thus we add the address of jump target (0x8000052) to reflect the jump info.
the next block (0x08000052 ~ 0x08000056) is more likeyly an if conditional statement which has two branches. It will jump to 0x800004a if jle-less or equal, otherwise (the fail condition) jump to next instruction -- 0x08000058.:
[0x08000034]> afb+ code_block 0x8000052 0x8000058-0x8000052 0x800004a 0x8000058
follow the control flow and create the remaining two blocks (two branches) :
[0x08000034]> afb+ code_block 0x800004a 0x8000052-0x800004a 0x8000052
[0x08000034]> afb+ code_block 0x8000058 0x800005d-0x8000058
check our work:
[0x08000034]> afb
0x08000034 0x0800004a 00:0000 22 j 0x08000052
0x0800004a 0x08000052 00:0000 8 j 0x08000052
0x08000052 0x08000058 00:0000 6 j 0x0800004a f 0x08000058
0x08000058 0x0800005d 00:0000 5
[0x08000034]> VV
There are two very important commands for this: afc and afB. The latter is a must-know command for some platforms like ARM. It provides a way to change the "bitness" of the particular function. Basically, allowing to select between ARM and Thumb modes.
afc on the other side, allows to manually specify function calling convention. You can find more information on its usage in calling_conventions.
Recursive analysis