Process size, part 1.


R. Clayton (rclayton@monmouth.edu)
(no date)


I compile and link a program:

  /home/nobody/useful_dir/gal>ls -lrt hello*

  -rwxr-xr-x 1 nobody devl 20624 Nov 12 01:32 hello

  -rw-r--r-- 1 nobody devl 88 Nov 12 01:32 hello.c

Can I conclude that the size of the hello-world proces is 20,624 bytes?

  Not really. The disk-file size of an executable does not have a strong
  relation to the size of the process created from the executable. If the
  executable file is big, chances are the process will be big too, but not
  necessarily; and small executables don't necessarily lead to small processes
  either. The exact nature and strength of the relation depends on the OS, but
  this imprecision is true for all major general-purpose OSs.

  Although I haven't had a chance to cover it in lecture yet, the internal
  structure of a process is more complicated than we've been assuming.
  However, because you've already covered this when you read Chapter 11
  (particularly Figure 11.11), I'll over-simplify. A process is split into
  three parts: code, uninitialized globals, and initialized globals. You can
  use the size command on *nix to find the size of each of these parts:

    $ cat t.c
    #include <stdio.h>

    int main() {
      printf("hello world!\n");
      return 0;
      }

    $ gcc -o t t.c

    $ ls -l t
    -rwx------ 1 rclayton faculty 6608 Nov 12 10:04 t

    $ size t
       text data bss dec hex filename
       1778 264 32 2074 81a t

    $

  The executable file for t contains 1778 code (text) bytes, 264 bytes of
  initialized globals (data), and 32 bytes of unitialized globals (bss, block
  started by symbol) for a total of 2074 bytes (decimal). The other 6608 -
  2074 = 4534 bytes are overhead required by the linker, loader, and other
  executable-manipulation tools, including the OS.

  To emphasize the point, let's throw more stuff into the executable file by
  compiling and linking for debug, which adds symbol-table information that
  normally isn't included.

    $ gcc -gstabs -o t t.c

    $ ls -l t
    -rwx------ 1 rclayton faculty 9088 Nov 12 10:10 t

    $ size t
       text data bss dec hex filename
       1778 264 32 2074 81a t

    $

  The text, data, and bss sizes are the same, but the file size has increased
  by around 3000 bytes. The executable (text + data + bss) doesn't change
  because the extra information is used only by the debugger and isn't required
  for execution.

  We can beat this point into the ground by compiling the program for
  profiling, which does modify the code to collect statement-execution counts:

    $ gcc -pg -o t t.c

    $ ls -l t
    -rwx------ 1 rclayton faculty 11396 Nov 12 10:17 t

    $ size t
       text data bss dec hex filename
       5689 280 80 6049 17a1 t

    $

  The text has tripled in size, mostly due to the instructions added to collect
  the counts, and the bss has grown by around 50 bytes, which will be used to
  store the statement-execution counts.

  We can beat this text into the ground in a different way by giving the
  hello-world program some bss:

    $ cat t.c
    #include <stdio.h>

    int data[100];

    int main() {
      printf("hello world!\n");
      return 0;
      }

    $ gcc -o t t.c

    $ ls -l t
    -rwx------ 1 rclayton faculty 6656 Nov 12 10:26 t

    $ size t
       text data bss dec hex filename
       1803 264 432 2499 9c3 t

    $

  As expected, the bss size increased by 400 (= 100*sizeof(int)) bytes and the
  data size didn't increase because the amount of initialized global data
  hasn't changed. I don't know why the text size changed; the code is the same
  in either program.

  If we add some initialized global data, that will change too:

    $ cat t.c
    #include <stdio.h>

    int data[100];

    char tag[] = "A man likes milk, so he owns a million cows.";

    int main() {
      printf("hello world!\n");
      return 0;
      }

    $ gcc -o t t.c

    $ ls -l t
    -rwx------ 1 rclayton faculty 6748 Nov 12 10:35 t

    $ size t
       text data bss dec hex filename
       1827 312 432 2571 a0b t

    $

  Now the bss size is unchanged and the data size has increased by 48 bytes (44
  for the string, 1 for the null byte and three pad bytes to maintain an 8-byte
  alignment for whatever follows.) The code size has changed again, and again
  I don't know why.

  (C-C++ savvy readers might want to object to my characterization of data[] as
   uninitialized globals by pointing out that the C-C++ standards require that
   global ints be initialized to 0. That's true, but because the
   initialization value (0) is known to the OS and the executable-manipulation
   tools, it can be delayed until as late as possible, usually when a page of
   bss data is being paged in for the first time.)



This archive was generated by hypermail 2.0b3 on Fri Dec 03 2004 - 12:00:06 EST