13 - Threads (2)OutlineThread ImplementationThread SwitchingMultithreaded CodeRule of ThumbWindows ThreadsAnnouncementsReading: MOS 2.2.1 - 2.2.6Thread ImplementationUser-spacePer-threadKernel-spacePer-processHybrid implementationThread SwitchingMultithreaded Code3 approaches to implementing threads:Threads originally implemented as a user-spacelibrary.User-space ThreadsDo user-space threads keep one stack per threador one stack per process?In kernel mode, replace the process structs withthread structs.In user mode, switching is handled with yield callsfrom the threads themselves.Thread library maintains scheduler loop that savesand restore thread context on the yields.Programs were originally designed for a single CPU,single-threaded.Some differences from process switching:PollEvKernel-space ThreadsHybrid ImplementationNo prepared statements.1)1)A)2)2)B)3)3)---------Avoid global variablesLibrary procedures may not support multithreading!Note: printf and errno have long been fixed.Identical output to our pthreads demo.If you have Windows you can try it.Alternatives:Warning----------Kernel doesn't see threads, only schedulesprocesses.switching threads within a process is fasterMany libraries not originally designed formultithreaded execution.e.g. printf was designed with an internal bufferconsidered poor practice in large codebasesFunctions must be reentrant / thread-safe tobe used in multithreaded contextIt is the programmer's responsibility to know ifthe functions used are thread-safe.Private globals inThread-local storageshared by default and require synchronizationGetters and setters that wrap synchronizatione.g. errno was a shared globalbuffer was written to stdout only on certain conditions (fullor newline-terminated)used with perror to print detailed error messagesimplemented with compilersupport:syscalls in different threads could overwrite errnoerrno is a macro now.buffer was shared among threads, which would overwrite oneanother without synchronizationswitching threads in different processes is slowno need to change address space => no cache invalidationswapping address space results in cache invalidation, whichrequires write-back to main memoryThread library maintains its own thread tableand scheduling/switching algorithm.Kernel can manage scheduling more efficientlyuser thread libraries exist on top of kernel-provided thread APIcan have multiple user threads per kernel threaduser-threads typically mapped one-to-one tokernel threadsmultiplexing threads is complex, more uncommon.user-space library (e.g. pthreads API) wraps the kernel APIKernel maintainsprocess table +thread tableKernel schedulesthreads insteadof processesThe disadvantages of user-space threads aresignificant.Close modern example: async-await libraries.Thread tablelives in processmemoryThread libraryimplements athread runtimefor switchingAdvantages:Advantages:Disadvantages:Disadvantages:-------no need for kernel support=> easier implementation=> no syscalls required==> no switching overhead==> faster thread creationkernel-managed scheduling=> fair CPU time distribution=> no per-process thread libno thread library loadingno thread table in memorykernel can't see threads=> thread CPU time less fairrequires thread syscalls=> more time in kernelscheduling managed manuallywith yield callskernel trapping must be offsetby gains from parallelismkernel required for use ofmulticore parallelismsignal behavior formultithreaded processes=> kill all the threads?=> which thread handles?syscalls and page faults blockall threadsfork behavior formultithreaded processes=> which threads to copy?Processes can use differentthread librariesI/O syscalls don't block otherthreadsLibraries can use differentscheduling algorithmsmulti-core parallelism---PortabilitySchedulingScheduling ComplexitySyscall OverheadBlocking OperationsOpen ProblemsNo ParallelismFlexibilityParallelismMemory EfficiencyThread implementation moved into the kernel.Modern implementations have hybrid approach.save state to PCB 0save state to PCB 1load state from PCB 1load state from PCB 0P0Thread 0Thread 1KernelP1handle interrupt/do syscall;scheduler runshandle interrupt/do syscall;scheduler runsinterruptor syscallinterruptor syscallrunningrunningrunningreadyreadyready or blockedready or blockedTCB 0TCB 1TCB 1TCB 0GCCC11 Standard__thread int i;extern __thread struct state s;static __thread char *p;#include <threads.h>thread_local int foo = 0;create_global("bufptr");set_global("bufptr", &buf);bufptr = read_global("bufptr");or yieldor yield