Interpretation and compilation

Both Bash and Perl are interpreted languages. When we "run" a script from an interpreted language, we need a program called an "interpreter" to run our code.

A compiled program is completely different. Its code is actually loaded into memory and executed as machine language, literally a sequence of bytes that the hardware processor understands as instructions.

Turning text into bits

Let's look at our obligatory "helloworld.c":

#include <stdio.h>

int main(int argc, char **argv)
{
  printf("Hello world!\n");
}

Peering inside

We can use the program xxd to illustrate the difference in a Bash script and a compiled C program:

$ xxd ~/anagrams/anagram-build.sh | head -7
0000000: 2321 2f62 696e 2f62 6173 680a 0a64 6563  #!/bin/bash..dec
0000010: 6c61 7265 202d 4120 6469 6374 696f 6e61  lare -A dictiona
0000020: 7279 0a0a 2366 6f72 2828 203b 203b 2029  ry..#for(( ; ; )
0000030: 290a 7768 696c 6520 7265 6164 0a64 6f0a  ).while read.do.
0000040: 2320 2020 2072 6561 6420 0a23 2020 2020  #    read .#    
0000050: 6966 205b 2024 3f20 2d67 7420 3020 5d0a  if [ $? -gt 0 ].
0000060: 2320 2020 2074 6865 6e0a 2309 6563 686f  #    then.#.echo
$ xxd ~/helloworld | head -100
0000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
0000010: 0200 3e00 0100 0000 1004 4000 0000 0000  ..>.......@.....
0000020: 4000 0000 0000 0000 4811 0000 0000 0000  @.......H.......
0000030: 0000 0000 4000 3800 0900 4000 1e00 1b00  ....@.8...@.....
0000040: 0600 0000 0500 0000 4000 0000 0000 0000  ........@.......
0000050: 4000 4000 0000 0000 4000 4000 0000 0000  @.@.....@.@.....
0000060: f801 0000 0000 0000 f801 0000 0000 0000  ................
0000070: 0800 0000 0000 0000 0300 0000 0400 0000  ................
0000080: 3802 0000 0000 0000 3802 4000 0000 0000  8.......8.@.....
0000090: 3802 4000 0000 0000 1c00 0000 0000 0000  8.@.............
00000a0: 1c00 0000 0000 0000 0100 0000 0000 0000  ................
00000b0: 0100 0000 0500 0000 0000 0000 0000 0000  ................
00000c0: 0000 4000 0000 0000 0000 4000 0000 0000  ..@.......@.....
00000d0: dc06 0000 0000 0000 dc06 0000 0000 0000  ................
00000e0: 0000 2000 0000 0000 0100 0000 0600 0000  .. .............
00000f0: 280e 0000 0000 0000 280e 6000 0000 0000  (.......(.`.....
0000100: 280e 6000 0000 0000 f801 0000 0000 0000  (.`.............
0000110: 0802 0000 0000 0000 0000 2000 0000 0000  .......... .....
0000120: 0200 0000 0600 0000 500e 0000 0000 0000  ........P.......
0000130: 500e 6000 0000 0000 500e 6000 0000 0000  P.`.....P.`.....
0000140: 9001 0000 0000 0000 9001 0000 0000 0000  ................
0000150: 0800 0000 0000 0000 0400 0000 0400 0000  ................
0000160: 5402 0000 0000 0000 5402 4000 0000 0000  T.......T.@.....
0000170: 5402 4000 0000 0000 4400 0000 0000 0000  T.@.....D.......
0000180: 4400 0000 0000 0000 0400 0000 0000 0000  D...............
0000190: 50e5 7464 0400 0000 0c06 0000 0000 0000  P.td............
00001a0: 0c06 4000 0000 0000 0c06 4000 0000 0000  ..@.......@.....
00001b0: 2c00 0000 0000 0000 2c00 0000 0000 0000  ,.......,.......
00001c0: 0400 0000 0000 0000 51e5 7464 0600 0000  ........Q.td....
00001d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001f0: 0000 0000 0000 0000 0800 0000 0000 0000  ................
0000200: 52e5 7464 0400 0000 280e 0000 0000 0000  R.td....(.......
0000210: 280e 6000 0000 0000 280e 6000 0000 0000  (.`.....(.`.....
0000220: d801 0000 0000 0000 d801 0000 0000 0000  ................
0000230: 0100 0000 0000 0000 2f6c 6962 3634 2f6c  ......../lib64/l
0000240: 642d 6c69 6e75 782d 7838 362d 3634 2e73  d-linux-x86-64.s
0000250: 6f2e 3200 0400 0000 1000 0000 0100 0000  o.2.............
0000260: 474e 5500 0000 0000 0200 0000 0600 0000  GNU.............
0000270: 1800 0000 0400 0000 1400 0000 0300 0000  ................
0000280: 474e 5500 8d33 3a85 1127 4fd6 fc16 d8df  GNU..3:..'O.....
0000290: bc41 9748 36ef 88fe 0100 0000 0100 0000  .A.H6...........
00002a0: 0100 0000 0000 0000 0000 0000 0000 0000  ................
00002b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00002c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00002d0: 1a00 0000 1200 0000 0000 0000 0000 0000  ................
00002e0: 0000 0000 0000 0000 1f00 0000 1200 0000  ................
00002f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000300: 0100 0000 2000 0000 0000 0000 0000 0000  .... ...........
0000310: 0000 0000 0000 0000 005f 5f67 6d6f 6e5f  .........__gmon_
0000320: 7374 6172 745f 5f00 6c69 6263 2e73 6f2e  start__.libc.so.
0000330: 3600 7075 7473 005f 5f6c 6962 635f 7374  6.puts.__libc_st
0000340: 6172 745f 6d61 696e 0047 4c49 4243 5f32  art_main.GLIBC_2
0000350: 2e32 2e35 0000 0000 0200 0200 0000 0000  .2.5............
0000360: 0100 0100 1000 0000 1000 0000 0000 0000  ................
0000370: 751a 6909 0000 0200 3100 0000 0000 0000  u.i.....1.......
0000380: e00f 6000 0000 0000 0600 0000 0300 0000  ..`.............
0000390: 0000 0000 0000 0000 0010 6000 0000 0000  ..........`.....
00003a0: 0700 0000 0100 0000 0000 0000 0000 0000  ................
00003b0: 0810 6000 0000 0000 0700 0000 0200 0000  ..`.............
00003c0: 0000 0000 0000 0000 4883 ec08 e86b 0000  ........H....k..
00003d0: 00e8 fa00 0000 e8d5 0100 0048 83c4 08c3  ...........H....
00003e0: ff35 0a0c 2000 ff25 0c0c 2000 0f1f 4000  .5.. ..%.. ...@.
00003f0: ff25 0a0c 2000 6800 0000 00e9 e0ff ffff  .%.. .h.........
0000400: ff25 020c 2000 6801 0000 00e9 d0ff ffff  .%.. .h.........
0000410: 31ed 4989 d15e 4889 e248 83e4 f050 5449  1.I..^H..H...PTI
0000420: c7c0 a005 4000 48c7 c110 0540 0048 c7c7  ....@.H....@.H..
0000430: f404 4000 e8c7 ffff fff4 9090 4883 ec08  ..@.........H...
0000440: 488b 0599 0b20 0048 85c0 7402 ffd0 4883  H.... .H..t...H.
0000450: c408 c390 9090 9090 9090 9090 9090 9090  ................
0000460: 5548 89e5 5348 83ec 0880 3db0 0b20 0000  UH..SH....=.. ..
0000470: 754b bb40 0e60 0048 8b05 aa0b 2000 4881  uK.@.`.H.... .H.
0000480: eb38 0e60 0048 c1fb 0348 83eb 0148 39d8  .8.`.H...H...H9.
0000490: 7324 660f 1f44 0000 4883 c001 4889 0585  s$f..D..H...H...
00004a0: 0b20 00ff 14c5 380e 6000 488b 0577 0b20  . ....8.`.H..w. 
00004b0: 0048 39d8 72e2 c605 630b 2000 0148 83c4  .H9.r...c. ..H..
00004c0: 085b 5dc3 6666 662e 0f1f 8400 0000 0000  .[].fff.........
00004d0: 4883 3d70 0920 0000 5548 89e5 7412 b800  H.=p. ..UH..t...
00004e0: 0000 0048 85c0 7408 5dbf 480e 6000 ffe0  ...H..t.].H.`...
00004f0: 5dc3 9090 5548 89e5 4883 ec10 897d fc48  ]...UH..H....}.H
0000500: 8975 f0bf fc05 4000 e8e3 feff ffc9 c390  .u....@.........
0000510: 4889 6c24 d84c 8964 24e0 488d 2d03 0920  H.l$.L.d$.H.-.. 
0000520: 004c 8d25 fc08 2000 4c89 6c24 e84c 8974  .L.%.. .L.l$.L.t
0000530: 24f0 4c89 7c24 f848 895c 24d0 4883 ec38  $.L.|$.H.\$.H..8
0000540: 4c29 e541 89fd 4989 f648 c1fd 0349 89d7  L).A..I..H...I..
0000550: e873 feff ff48 85ed 741c 31db 0f1f 4000  .s...H..t.1...@.
0000560: 4c89 fa4c 89f6 4489 ef41 ff14 dc48 83c3  L..L..D..A...H..
0000570: 0148 39eb 75ea 488b 5c24 0848 8b6c 2410  .H9.u.H.\$.H.l$.
0000580: 4c8b 6424 184c 8b6c 2420 4c8b 7424 284c  L.d$.L.l$ L.t$(L
0000590: 8b7c 2430 4883 c438 c30f 1f80 0000 0000  .|$0H..8........
00005a0: f3c3 9090 9090 9090 9090 9090 9090 9090  ................
00005b0: 5548 89e5 5348 83ec 0848 8b05 6808 2000  UH..SH...H..h. .
00005c0: 4883 f8ff 7419 bb28 0e60 000f 1f44 0000  H...t..(.`...D..
00005d0: 4883 eb08 ffd0 488b 0348 83f8 ff75 f148  H.....H..H...u.H
00005e0: 83c4 085b 5dc3 9090 4883 ec08 e86f feff  ...[]...H....o..
00005f0: ff48 83c4 08c3 0000 0100 0200 4865 6c6c  .H..........Hell
0000600: 6f20 776f 726c 6421 0000 0000 011b 033b  o world!.......;
0000610: 2800 0000 0400 0000 d4fd ffff 4400 0000  (...........D...
0000620: e8fe ffff 6c00 0000 04ff ffff 8c00 0000  ....l...........
0000630: 94ff ffff b400 0000 1400 0000 0000 0000  ...............

Simple compilation

Compiling a C program can be (though not necessarily!) a simple process.

Compiling a "Hello World" can be done in a single line:

$ gcc -o helloworld helloworld.c

More details on compilation

We are actually hiding some important steps; actually the first thing that happens is that a text preprocessor (generally m4 these days) is run over the helloworld.c file; we can ask the compiler to just do this stage with the -E option:

$ gcc -E helloworld.c

Assembly language

The next stage is the translation of the pre-processed C source into assembly language (a human-readable represenation of actual machine language); we can ask the compiler to stop after this stage with the -S option:

$ gcc -S helloworld.c
$ cat helloworld.s
	.file	"helloworld.c"
	.section	.rodata
.LC0:
	.string	"Hello world!"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$16, %rsp
	movl	%edi, -4(%rbp)
	movq	%rsi, -16(%rbp)
	movl	$.LC0, %edi
	call	puts
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
	.section	.note.GNU-stack,"",@progbits

From human-readable to computer-readable

We can also ask the compiler to stop at the next stage, the creation of actual machine language but before we "link" the C program with the C runtime and its shared libraries:

$ gcc -c helloworld.c
$ file helloworld.o
$ xxd helloworld.o
0000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
0000010: 0100 3e00 0100 0000 0000 0000 0000 0000  ..>.............
0000020: 0000 0000 0000 0000 3801 0000 0000 0000  ........8.......
0000030: 0000 0000 4000 0000 0000 4000 0d00 0a00  ....@.....@.....
0000040: 5548 89e5 4883 ec10 897d fc48 8975 f0bf  UH..H....}.H.u..
0000050: 0000 0000 e800 0000 00c9 c300 4865 6c6c  ............Hell
0000060: 6f20 776f 726c 6421 0000 4743 433a 2028  o world!..GCC: (
0000070: 5562 756e 7475 2f4c 696e 6172 6f20 342e  Ubuntu/Linaro 4.
0000080: 362e 332d 3175 6275 6e74 7535 2920 342e  6.3-1ubuntu5) 4.
0000090: 362e 3300 0000 0000 1400 0000 0000 0000  6.3.............
00000a0: 017a 5200 0178 1001 1b0c 0708 9001 0000  .zR..x..........
00000b0: 1c00 0000 1c00 0000 0000 0000 1b00 0000  ................
00000c0: 0041 0e10 8602 430d 0656 0c07 0800 0000  .A....C..V......
00000d0: 002e 7379 6d74 6162 002e 7374 7274 6162  ..symtab..strtab
00000e0: 002e 736

Linking/Loading

The final stage is the linking/loading stage, where we resolve any outstanding references, and combine any other needed modules with our code modules to make our final executable. (With C, we generally need at least the C runtime files, such as libcrt?.o)

Automating all of this!

The traditional program in the Unix world to automate the process of compilation is called make. It allows one to specify a set of rules to specify how the units in a compilation (or compilations!) all depend on each other and how to create each bit.

Our first Makefile

helloworld: helloworld.c
<tab> gcc -o helloworld helloworld.c
This is a complete Makefile. Let's try it out:

$ make
make: `helloworld' is up to date.
$ rm helloworld
$ make
gcc -o helloworld helloworld.c
$ make
make: `helloworld' is up to date.
$ touch helloworld.c
$ make
gcc -o helloworld helloworld.c

So make is quite intelligent about when to re-create a binary using the dependency information we have provided in the first line.

More with make

We can quieten make down by using the &:

helloworld: helloworld.c
	@gcc -o helloworld helloworld.c

We can also use the very powerful "wildcard" system to automate compilation:

%.o: %.c
	gcc -c $*.c 

helloworld: helloworld.o
	gcc -o helloworld helloworld.o

Targets of convenience

Another popular thing to do is add targets that are merely conveniences, such as a clean target:

%.o: %.c
	gcc -c $*.c 

helloworld: helloworld.o
	gcc -o helloworld helloworld.o

clean:
	@rm -f helloworld helloworld.o helloworld.s

Now when we do a "clean", all of the generated files that might be lingering around are removed:

$ make clean
$ make
gcc -c helloworld.c 
gcc -o helloworld helloworld.o

Even more automation with CMake

In recent years, cmake has been gaining some ground as a tool to automate the creation of Makefiles.

We can use cmake with our helloworld program:

$ mkdir helloworld.d
$ cd helloworld.d
$ cp ~/helloworld.c .
$ cat > CMakeLists.txt <<EOF
project(helloworld)
add_executable(helloworld helloworld.c)
EOF
$ mkdir build-dir
$ cd build-dir
$ cmake ..
$ make
$ ./helloworld
Hello world!