07 - System CallsOutlineSyscalls RecapSyscall ImplementationExample: gettimeofdayMemory AccessAdding a Syscall to LinuxCommentsSyscall MechanismExamplesAnnouncementsReading: MOS 1.6Syscalls RecapSyscall MechanismSyscall ImplementationQuiz 1 will be on Friday, first 10 min of class.10 MCQ online, no cheat sheet.Don't be late!It will include content from today!Rest of PA2 released. If I'm quick, we'll getthrough almost everything you need today.System calls are the kernel's API for user-modeprograms to interact with hardware and system-critical functions.User-mode programs - see syscalls as normal functions - cannot access kernel stuffSystem Call requires CPU context switchSystem Calls are OS-dependent, often not portableHardware invocation mechanism is arch dependent2 general approaches:Syscall parameters are passed via registers.For example, in x86 assembly:On the syscall instruction, the CPU:Hardware saves program counterHardware jumps to entry in kernel-modeKernel assembly procedure saves registerskernel initializes new stack and dispatchesSystem call code runsScheduler decides which process to return toKernel assembly restores process contextKernel restores PC and leaves kernel modeThen the kernel entry code takes over.Kernel entry code determines which syscall toinvoke by looking up the syscall number in a table.Internal prototypes:Arch-specific:Syscall Table:Macro SYSCALL_DEFINE2 declares a syscall with2 arguments. For each argument, the macro takesthe type and name as separate fields.Return value will be returned to user process.copy_{to/from}_user copies data between kerneland process address space.Important:Kernel should never trust the user!Kernel should not abuse its powers!Kernel should copy user data to kernel spacebefore manipulating it, then copy it back to userspace.User Space Memory Access API:Write your syscall functionTo add your syscall to the build system:Pros: - Adding syscalls is "easy"Cons: - Need official syscall number - Interface changes are permanent - Must be registered for every architecture - Not always necessaryAlternative: - Device or virtual file (kernel modules) - Can alter behavior of existing syscalls (e.g. read, write, ioctl)- Add a file: - Modify top-level Makefile:- contents:- contents:(Assume syscall code @ )Add syscall to syscall table (arch-specific)Add syscall prototype to header fileRebuild and reinstall kernelTest with libc syscall function from user code- In an existing file (not for PA 2)- x86: - Use tabs between fields- common:- or arch-specific equivalent- Recompile kernel- Reinstall modules- Reinstall kernel- Reboot with new kernel- In an new file if self-contained - Add file to build system- User-space cannot access kernel-space memory- Kernel-space can access user-space memory- Kernel must check user-provided pointers- Kernel must never execute user-space memory- Minimize access to user-space memorythe table is used by the Linux build systemTrap Modelcolumn 1:column 2:column 3:column 4:syscall numberarchitecture availabilitysyscall nameinternal function name- Similar to hardware interrupt mechanismInterrupt InstructionSpecial InstructionIdea: fake an interrupt (hardware interrupts trigger kernel-mode)Idea: special hardware logic to perform mode and context switchEx. Intel & AMD originally had sysenter/syscall, with differenthardware mechanisms. Now, both use syscall in 64-bit systemse.g. UNIX calls completely different from WindowsKernel code only executes under 2 conditions:(See also MOS Figure 1-18)Process ManagementFile SystemMiscellaneous- hardware access - read/write/manage files in storage devices - read/write to I/O devices- create processes- change security permissions- interrupt- system call1)2)3)pid = fork()pid = waitpid(pid, &statloc, options)s = execve(name, argv, environp)exit(status)create a child process identical to parentwait for child to terminatereplace a process's core imageterminate process execution and return statusfd = open(file, mode, ...)s = close(fd)n = read(fd, buffer, len)n = write(fd, buffer, len)pos = lseek(fd, offset, whence)s = stat(name, &buf)s = mkdir(name, mode)s = rmdir(name)s = link(name1, name2)s = unlink(name)s = mount(special, name, flag)s = unmount(special)open a fileclose an open fileread data from file to bufferwrite data from buffer to filemove the file pointerget file status informationcreate directoryremove empty directorycreate an entry name2 that points to name1remove an entrymount a file systemunmount a file systems = chdir(dirname)s = chmod(name, mode)s = kill(pid, signal)sec = time(&seconds)change working directorychange file's protection bitssend signal to processget seconds since 1/1/1970User Mode ProcessUser Mode ProcessKernel Mode Executionsyscall invokedsyscall returnstime- Enters kernel code, different address range > Kernel checks calling process permissions- Changes execution mode (user -> kernel) > Recall the CPU status/control register(1):>1)1)2)3)4)5)2)3)4)5)6)7)8)>>(2): .textENTRY (syscall) movq %rdi, %rax /* Syscall number -> rax. */ movq %rsi, %rdi /* shift arg1 - arg5. */ movq %rdx, %rsi movq %rcx, %rdx movq %r8, %r10 movq %r9, %r8 movq 8(%rsp),%r9 /* arg6 is on the stack. */ syscall /* Do the system call. */ cmpq $-4095, %rax /* Check %rax for error. */ jae SYSCALL_ERROR_LABEL /* Jump to error handler if error. */ ret /* Return to caller. */PSEUDO_END (syscall)(from glibc source)Saves the process's next instruction addressJumps to fixed kernel entry pointUpdates ctrl/stat register to kernel-modearch/x86/entry/entry_64.Sarch/x86/entry/syscalls/syscall_64.tbldo_syscall_64seeseeseesimilar to interrupts in MOS Figure 2-5Resume a user-mode processSteps 6-8:Step 5:Executes system call properSteps 1-4:Hardware & assembly save contextMOS Figure 1-17include/linux/syscalls.hinclude/linux/syscalls.harch/x86/include/asm/syscalls.hkernel/time/time.cuaccess.hget_userGets a simple variable from user spacePuts a simple variable to user spaceClears a block in user spaceCopies a block from kernel to user spaceCopies a block from user to kernel spaceGets length of a string buffer in user spaceCopies a string from user to kernel spaceput_userclear_usercopy_to_usercopy_from_userstrnlen_userstrncpy_from_userarch/x86/entry/syscalls/syscall_64.tblarch/x86/entry/syscalls/syscall_64.tblinclude/linux/syscalls.hseeKernelSpaceUserSpaceForbiddenDangerousmy_syscall/my_syscall.cmy_syscall/Makefileobj-y += my_syscall.ocore-y := my_syscall/