File Analysis: File Identification and profiling in the Linux environment

Please read pp. 379-488 of MF.

Working in the Linux environment is a lot easier. Running most programs, such as md5sum and sha1sum, does not generally require installing any new software. Depending on what distribution, less common programs such as ssdeep may well be in the distribution's repository.

The old standby for dumping a binary was "od" (octal dump); we now have other programs such "xxd", "ghex", and "hexdump" that might also be in a given distribution.

What is this file? Broad steps...

Of course, one of the handiest is the old "file" program, which does a very credible job of identifying quite a few programs. Also of some use can be "readelf -a" and of course "objdump -a". You can look for namelists with "nm"...

Dynamically linked programs

You can examine a program's dynamic linkage information with "ldd"; for that matter, you can strace the program (though if it's a suspect one, you should try this in a sandbox!) Even if a program is "stripped", a dynamically linked program still has linkage information available. (If the binary isn't stripped, look to see if there is any debugging info — if so, you might be able to run "gdb" usefully!)

Antivirus software in the Linux world

Generally, the easiest thing to do is to install packages like "clamav" and "clamav-update". You can then use "clamscan" to scan individual files, and "freshclam" to get the latest signatures.


Please ignore the "Japanese" translation on page 413 of MF, it's not really correct. The definitions for "Kaiten" and "Goraku" are close enough, but "wa" in this case is a particle (not a noun, as the MF book seems to think), and it marks the subject of the sentence ("Kaiten"). Most likely, it's referring to the malware named "kaiten"; see McAfee and PacketStorm's version of kaiten.c, for instance.

(If not, then it most likely refers to a sushi place ("kaiten sushi" refers to a sushi restaurant that puts the sushi trays on a circular course, such as a conveyor belt or maybe a circular water course); even less likely, but still conceivably, it might also be a reference to a WWII Japanese torpedo system called the "kaiten".)

The Google and Altvista translators are okay for most languages; for Japanese, though, the best tend to be in Japanese.

Strings, as always

Running "strings" over a suspect binary is almost always worth doing. While it's possible that a good programmer or packing/encryption has removed or obfuscated all strings, it's also also possible that it hasn't.


Using "ldd" can show you where things are actually at. You can use "file" (if you like) to learn more about a shared library.

nm and variables in execution

Using "nm" can really show you where things are actually at. You can figure out exactly where in memory a variable can be located. If it shows a lot of information, you can use "gdb" (GNU debugger) to get a real feel for what the program is doing.

Sandboxes and strace

You can do an "strace -f PROGRAMNAME" in a sandbox (not a machine that you care about!) to see what's going on; this will show you all of the system calls made by this program and its children. If you find a lot of tasks created (not likely), you can use something like "strace -ff -o FILE PROGRAMNAME" to have all of strace output written to separate files for each task.

UPX, for example

We can install "upx" easily in the Unix/Linux world. This is a simple packer/compressor that can actually save space.

Using upx:

[langley@host Slidy]$ cp /usr/bin/emacs .
[langley@host Slidy]$ pwd
[langley@host Slidy]$ file emacs
emacs: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
[langley@host Slidy]$ upx emacs
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2008
UPX 3.03        Markus Oberhumer, Laszlo Molnar & John Reiser   Apr 27th 2008

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
  11102144 ->   2780472   25.04%  linux/ElfAMD   emacs

Packed 1 file.
[langley@host Slidy]$ file emacs
emacs: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, stripped

ELF, readelf, and objdump

Like the PE format in Windows, the Linux ELF format (also used by other folks), is related to COFF. You can look at /usr/include/elf.h for the exact layout, and use "readelf" and "objdump" to look at ELF files.

There are a number of options that are useful with "readelf":

With "objdump", try "-p" and "-a".


One possibility for trying out a suspect binary is to use a emulation environment to see what, if anything, you can tell about the program's execution. In the Linux world, there is "wine"; try "strace -f wine BINARY" and see what happens.