UNIX operating systems create processes using the system call fork () and overlay (load) a binary/executable image using exec () system call. The system call fork () does not have a parameter and the system call exec () few parameters to load a program from a permanent storage (File System). Practically, the system calls fork () and exec () are twins; fork () is the elder brother and exec () is the younger one. Let us first see the functionality of “fork”. As you may know a process is program in execution that has a state such as data, heap, stack, pending signals, open files and environment variables. When you want to execute something, let us say running a command “ls” from the shell, the “shell” process typically calls fork (). When the fork () is successful, the kernel creates another process which is copy of the process. So each successful call to fork () returns twice – once in the called process (aka parent process) and second time in the newly created process (aka child process). As a standard, all the operating systems copy the address space of parent process and create another address space. There is a overhead involved while creating fork’ ing a process.
Shortly after the process is created, either parent or child process is loaded with some other program. Typically, one of the system calls in “exec” family of system calls is used. When “exec” is executed, the entire address space of the called process is recreated. So there is considerable amount time spent in this “double creation”. Surely, there is some kind of optimization can be done to gain substantial performance.
In Linux, fork () does not copy the address space but just simply creates the kernel data structures needed to the new process. Now, both the parent and child process uses the same address space and the entire address space is marked read-only. So, both the parent and child process can continue to read the process address. When any one process tries to write to a page (memory page), the kernel duplicates the address space and creates the address space. But this is unlikely to occur. Another scenario is the loading a new program into memory. When this occurs, the kernel any way creates a fresh address space and starts to execute the loaded program. By this approach, the “double creation” is avoided. The deferring the duplication of address space has given a performance and sometimes procrastination helpsJ.
The functionality of fork () is called CoW – Copy on Write. So from next time, when you see a Linux box and a running process think about
But you have two process after fork (), which process will be scheduled first? Yes, Linux is a masterpiece.
No comments:
Post a Comment