Process size, part 3.


R. Clayton (rclayton@monmouth.edu)
(no date)


The previous two messages in this series attempted to find various ways of
answering the question "How big is a process?" All the answers found were
inadquate in some way, either over- or under-estimating the size, and wildly
disagreeing with one another. This message tries to find some definitive ways
to answer the question.

The first point, which I hope at least some of you are beginning to realize, is
that "How big is a process?" is unlikely to be a sensible question (*). Its
interpretation is ambiguous, requiring many qualifiers and clarifications to
make it more specific (on disk? in primary store? static size? dynamic size?
executing? at what point in the execution? and so on). In addition, a more
specific version of the question will have an answer that's heavily dependent
on a complicated context (what libraries? how linked? which data? how much?),
making it difficult to interpret individual answers and practically impossible
to compare answers.

Most of the estimates so far have come from looking at the static form of the
process (that is, the executable file). Looking at the process during
execution should lead to more useful estimates. It's easier to examine the
hello-world process when it's standing still:

  $ cat t.c
  #include <stdio.h>

  int main() {
    printf("hello world!\nHit return to continue");
    fflush(stdout);
    getchar();
    return 0;
    }

  $ gcc -o t.c
  gcc: no input files

  $ gcc -o t t.c

  $ ./t
  hello world!
  Hit return to continue

With t immobilized, the ps command gives some basic information:

  $ ps -o pid,vsz,comm -a | egrep 'PID|/t'
    PID VSZ COMMAND
  29904 960 ./t

  $

The -o option specifies the output format; the -a option lists all processes;
see the ps man page for details. The ps command tells us that the hello-world
process is using 960 kbytes of virtual memory (VSZ), almost a meg; this is an
order of magnitude more then the core-file estimate gave us (around 88 kbytes)
at approximately the same place in process execution (**).

The pmap command provides a closer look at how that 960 kbytes is spread around
the process (I have removed some output that isn't of interest):

  $ /usr/proc/bin/pmap -x 29904
  29904: ./t
   Address Kbytes RSS Mode Mapped File
  00010000 8 8 r-x-- t
  00020000 8 8 rwx-- t
  FF280000 688 688 r-x-- libc.so.1
  FF33C000 24 24 rwx-- libc.so.1
  FF342000 8 8 rwx-- libc.so.1
  FF380000 16 16 r-x-- libc_psr.so.1
  FF3A0000 8 8 r-x-- libdl.so.1
  FF3B0000 8 8 rwx-- [ anon ]
  FF3C0000 160 160 r-x-- ld.so.1
  FF3F8000 8 8 rwx-- ld.so.1
  FFBFA000 24 24 rwx-- [ stack ]
  -------- --------------
  total Kb 960 960

  $

The process code starts at 1000 hex and its data starts at 2000 hex; both
occupy a single page (solaris pages are 8 kbytes). The rest of the process -
the stack and library code - is placed at the other end of virtual memory.
The RSS (resident set size) indicates how much of each part of virtual memory
is mapped into physical memory; in this case, all of it is.

The pmap data indicates why the core-file estimate was off: the core doesn't
contain the libraries, which is over two-thirds of occupied vm. It also
indicates the futility of the question "How big is a process?" Is it 2723
bytes (text+data+bss)? Is it 16k (text+data+bss total page size)? Is it 960k?
Is it 268 Mbytes (FFBFA00 + (24*8) - 1000)?

(*) is unlikely to be a sensible question.

As always, you should qualify statements of this kind with "for general-purpose
operating systems". It is an eminently sensible question for resource limited
environments such as those found in embedded and real-time systems (which is
another reason you should fear embedded and real-time systems).

Given the importance of the question, systems supporting embedded and real-time
systems work hard to make it easy (or at least possible) to get good answers.
They do this mostly by making everything static, so size can be accurately
determined at design, compile or link time.

(**) at approximately the same place in process execution.

You might want to argue that it isn't approximately the same place because this
code includes a getchar() while the core-dump code didn't. If you run through
the exercise of adding a getchar() call to the core-dump code, you'll see the
the size increased by only a few tens of bytes.



This archive was generated by hypermail 2.0b3 on Fri Dec 03 2004 - 12:00:06 EST