File Analysis: File Identification and profiling in the Linux environment

Please read pp. 379-488 of MF.

Linux is easier

Working in the Linux environment is a lot easier. Running most programs, such as md5sum and sha1sum, does not generally require installing any new software. Depending on what distribution, less common programs such as ssdeep may well be in the distribution's repository.

The old standby for dumping a binary was "od" (octal dump); we now have other programs such "xxd", "ghex", and "hexdump" that might also be in a given distribution.

What is this file? Broad steps...

Of course, one of the handiest is the old "file" program, which does a very credible job of identifying quite a few programs. Also of some use can be "readelf -a" and of course "objdump -a". You can look for namelists with "nm"...

Dynamically linked programs

You can examine a program's dynamic linkage information with "ldd"; for that matter, you can strace the program (though if it's a suspect one, you should try this in a sandbox!) Also, you might try "ltrace"; while this is generally not as useful, it might show you interesting library activity.

Even if a program is "stripped", a dynamically linked program still has linkage information available. (If the binary isn't stripped, look to see if there is any debugging info — if so, you might be able to run "gdb" usefully!)

Antivirus software in the Linux world

Generally, the easiest thing to do is to install packages like "clamav" and "clamav-update". You can then use "clamscan" to scan individual files, and "freshclam" to get the latest signatures.


Please ignore the "Japanese" translation on page 413 of MF, it's not really correct. The definitions for "Kaiten" and "Goraku" are close enough, but "wa" in this case is a particle (not a noun, as the MF book seems to think), and it marks the subject of the sentence ("Kaiten"). Most likely, it's referring to the malware named "kaiten"; see McAfee and PacketStorm's version of kaiten.c, for instance.

(If not, then it most likely refers to a sushi place ("kaiten sushi" refers to a sushi restaurant that puts the sushi trays on a circular course, such as a conveyor belt or maybe a circular water course); even less likely, but still conceivably, it might also be a reference to a WWII Japanese torpedo system called the "kaiten".)

The Google and Altvista translators are okay for most languages; for Japanese, though, the best tend to be in Japanese.

Strings, as always

Running "strings" over a suspect binary is almost always worth doing. While it's possible that a good programmer or packing/encryption has removed or obfuscated all strings, it's also also possible that it hasn't.


Using "ldd" can show you where things are actually at. You can use "file" (if you like) to learn more about a shared library.

nm and variables in execution

Using "nm" can really show you where things are actually at. You can figure out exactly where in memory a variable can be located. If it shows a lot of information, you can use "gdb" (GNU debugger) to get a real feel for what the program is doing.

Sandboxes and strace

You can do an "strace -f PROGRAMNAME" in a sandbox (not a machine that you care about!) to see what's going on; this will show you all of the system calls made by this program and its children. If you find a lot of tasks created (not likely), you can use something like "strace -ff -o FILE PROGRAMNAME" to have all of strace output written to separate files for each task.

UPX, for example

We can install "upx" easily in the Unix/Linux world. This is a simple packer/compressor that can actually save space.

Using upx:

[langley@host Slidy]$ cp /usr/bin/emacs .
[langley@host Slidy]$ pwd
[langley@host Slidy]$ file emacs
emacs: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
[langley@host Slidy]$ upx emacs
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2008
UPX 3.03        Markus Oberhumer, Laszlo Molnar & John Reiser   Apr 27th 2008

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
  11102144 ->   2780472   25.04%  linux/ElfAMD   emacs

Packed 1 file.
[langley@host Slidy]$ file emacs
emacs: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, stripped

ELF, readelf, and objdump

Like the PE format in Windows, the Linux ELF format (also used by other folks), is related to COFF. You can look at /usr/include/elf.h for the exact layout, and use "readelf" and "objdump" to look at ELF files.

There are a number of options that are useful with "readelf":

With "objdump", try "-p" and "-a".


The program radare provides some powerful cross-platform capabilities for binary analysis.

Here's a simple session with radare2 on a debian box:

$ file OddBinary.exe 
OddBinary.exe: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=8c7de897dd2f5d869d108bed2f2152a68b2f7b0a, stripped
$ radare2 OddBinary.exe 
[0x00402830]> f
0x0046076b 0 section_end..shstrtab
0x00460668 259 section..shstrtab
0x00466700 0 section_end..bss
0x00460680 24704 section..bss
0x00460668 0 section_end..data
0x00460380 744 section..data
0x00460368 0 section_end..got.plt
0x0045ffe8 896 section..got.plt
0x0045ffe8 0 section_end..got
0x0045ffe0 8 section..got
0x0045ffe0 0 section_end..dynamic
0x0045fe00 480 section..dynamic
0x0045fe00 0 section_end..jcr
0x0045fdf8 8 section..jcr
0x0045fdf8 0 section_end..dtors
0x0045fde8 16 section..dtors
0x0045fde8 0 section_end..ctors
0x0045fdd8 16 section..ctors
0x0045fdd8 0 section_end..init_array
0x0045fd58 128 section..init_array
0x0045fa0a 0 section_end..gcc_except_table
0x0045c6d4 13110 section..gcc_except_table
0x0045c6d4 0 section_end..eh_frame
0x004514c8 45580 section..eh_frame
0x004514c4 0 section_end..eh_frame_hdr
0x0044ee40 9860 section..eh_frame_hdr
0x0044ee40 0 section_end..rodata
0x00447140 32000 section..rodata
0x00447126 0 section_end..fini
0x00447118 14 section..fini
0x00447118 0 section_end..text
0x00402830 280808 section..text
0x00402830 0 section_end..plt
0x00402150 1760 section..plt
0x00402150 0 section_end..init
0x00402138 24 section..init
0x00402138 0 section_end..rela.plt
0x00401700 2616 section..rela.plt
0x00401700 0 section_end..rela.dyn
0x00401610 240 section..rela.dyn
0x00401610 0 section_end..gnu.version_r
0x00401530 224 section..gnu.version_r
0x00401530 0 section_end..gnu.version
0x0040143e 242 section..gnu.version
0x0040143d 0 section_end..dynstr
0x00400e40 1533 section..dynstr
0x00400e40 0 section_end..dynsym
0x004002e8 2904 section..dynsym
0x004002e8 0 section_end..gnu.hash
0x00400298 80 section..gnu.hash
0x00400298 0 section_end..note.gnu.buildid
0x00400274 36 section..note.gnu.buildid
0x00400274 0 section_end..note.ABItag
0x00400254 32 section..note.ABItag
0x00400254 0 section_end..interp
0x00400238 28 section..interp
[0x00402830]> pd
   ;      [12] va=0x00402830 pa=0x00002830 sz=280808 vsz=280808 rwx=-r-x .text
            ;-- section..text:
            0x00402830    31ed         xor ebp, ebp
            0x00402832    4989d1       mov r9, rdx
            0x00402835    5e           pop rsi
            0x00402836    4889e2       mov rdx, rsp
            0x00402839    4883e4f0     and rsp, 0xfffffffffffffff0
            0x0040283d    50           push rax
            0x0040283e    54           push rsp
            0x0040283f    49c7c0d0704. mov r8, 0x4470d0
            0x00402846    48c7c140704. mov rcx, 0x447040
            0x0040284d    48c7c7909c4. mov rdi, 0x409c90
            0x00402854    e877faffff   call sym.imp.__libc_start_main
               0x004022d0(unk, unk) ; sym.imp.__libc_start_main
            0x00402859    f4           hlt
            0x0040285a    90           nop
            0x0040285b    90           nop
            0x0040285c    4883ec08     sub rsp, 0x8
            0x00402860    488b0579d72. mov rax, [rip+0x25d779] ; 0x0040ffe0 
            0x00402867    4885c0       test rax, rax
        ,=< 0x0040286a    7402         jz 0x40286e
        |   0x0040286c    ffd0         call rax
        |      0x00000000()        
        `-> 0x0040286e    4883c408     add rsp, 0x8
            0x00402872    c3           ret
            0x00402873    90           nop


While you can try a suspect binary inside of an emulation environment such as Wine, it's probably safer to simply put it into a sandbox, such as VirtualBox or qemu. This gives you not only a more realistic environmnet than Wine, but it's safer. You can easily do differential analysis of different snapshots of the VM's filesystems and (if Windows) the registry to track changes.