User and kernel threads.

From: R. Clayton <rclayton_at_monmouth.edu> Date: Fri, 26 Sep 2014 12:05:28 -0400 · This archive was generated by hypermail 2.2.0 : Tue Sep 30 2014 - 23:47:33 EDT

  In a 1:1 mapping of user-to-kernel threads, how are user threads faster?

Faster than what?  Compare a 1:1-bound thread to a m:1-mapped thread.  A 1:1-
bound user thread is (conceptually) always ready to execute because it is bound
to a dedicated excution engine (the kernel thread).  A m:1-mapped user thread
has to get mapped to the kernel thread before it can execute.  A m:1-mapped
user thread incurs the double overhead of waiting to be mapped and then
spending cycles on the context switch (however fast it may be, it's still not
0).  A 1:1-bound user thread incurs neither of these overheads, and so is
faster in terms of responsiveness to externel events and putting more cpu
cycles to useful work.

  Wouldn't that mean that the user-thread is being carried out on/in a kernel
  thread?

Yes, but not in the way suggested by the pictues.  Given an executing user
thread, there's only one thread, the kernel thread, not two. The thread in user
space is the kernel thread executing in the context (that is, the state) of a
user-space thread.  When it's time to execute that user thread, the associated
kernel thread context switches into the user-thread state and executes in user
space.

Understanding how threadless processes are multiplexed on a single cpu is
helpful.  Once understood, shrink the mechansim down so the schedulable unit
becomes a thread, which essentially means whittle down the state wrangled on a
context switch and add user-space threads so the kernel thread (the core or
virtual core) has something to execute.

  Wouldn't the user thread be slower since there is now a kernel thread that
  the operating system is managing, and a user thread to manage inside the
  kernel thread?

Slower than what?  Managing a kernel thread is much faster than managing a
process largely because of the amount of state involved, but for other reasons
too, such as cache invalidations and page-table management.  Managing user
threads isn't free, but it's relatively cheap even including the overheads
mentioned above.  There is an argument to be had here, as indicated by the
events vs. threads papers mentioned in class, but at that level it's a
contextual argument not easily resolved by considering abstract principles.

  How are threads are running concurrently when there are more threads than
  cores/cpus, and similarly more user threads than kernel threads?

There's true concurrency up to the point the number of kernel threads equals
the number of cores.  After that it's a mixture of true and multiplexed
concurrency.  Multiplexed concurrency isn't necessarily inferior to true
concurrency.  I-O bound threads should be moved out of the core while waiting;
to do otherwise would be wasteful (which may be justified depending on the
demand for cores).  In that case multiplexed concurrency is superior to true
concurrency because it has a higher degree of actual simultanious execution,
even if some of that execution is sitting around waiting.  On the other hand,
compute bound threads suffer from multipexed concurrency in the form of lower
throughput.

  Are there any good explanations that provide a good analogy to threads?

The best way to learn about threads is to implment them.  Kernel threads are
difficult because they're non-standard and heavily influenced by system
details.  User threads are a little easier because of posix.  The best way is
probably to start with understanding how (threadless) processes run on virtual
cpus, and then shrink that mechanism down so it handles threads within
processes.  Understanding process-level virtualization provides a framework for
understanding kernel-thread virtualization.