Skip to content

Latest commit

ย 

History

History

04. Process

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
ย 
ย 
ย 
ย 

Process

  • interrupt, process, file, memory, I/O
  • process definition
  • Multiprogramming, Timesharing
  • process descriptor, process queue, run queue
  • pid, process state, time slice, mm, eip
  • task_struct{}, thread_union{}, KMS(Kernel Mode Stack)
  • init_task, current
  • process schedule example
  • How processes are created and destroyed
  • fork, exec, exit, wait
    • fork: duplicate the parent
    • exec: transform the current process into another
    • exit: stop the current process
    • wait: wait until the child dies
    • All processes in Linux is created through fork and exec
  • process vs thread
  • pthread_create
  • kernel_thread
  • shell
  • Linux initialization
  • Where is CPU

1. What is a process

A process is a program loaded in the memory.

2. History of OS

  • 1945 ~ 1955 :
    • No OS.
  • 1955 ~ 1965 :
    • FMS (Fortran Monitoring System). Runs a set of programs one by one.
    • cpu utilization problem: over 90% time the cpu was idle
  • 1965 ~ :
    • Multiprogramming. Load several programs at once. Run one by one. If the current one is blocked because of I/O, go to the next process. Fairness problem: cpu-bound program is favored.
  • Now :
    • Time Sharing. Allocate x ticks to each process. Now all processes will be stopped either if it executes an I/O instruction or the timer expires.

3. Data structures for process

  • process descriptor:
    • one for each process.
    • contains information about the corresponding process such as process ID, ticks
    • remained, register values of the last stop-point, memory location, size, etc.
    • task_struct{} is the process descriptor in Linux. (include/linux/sched.h)
    • "current" points to the current process
    • thread_union{} = task_struct{} + Kernel Mode Stack
    • init_task is the kernel's process descriptor (arch/x86/kernel/init_task.c, include/linux/init_task.h)
          struct task_struct  init_task = INIT_TASK(init_task);
          INIT_TASK(init_task) = {
              ........
              .parent = &init_task,
              .comm = "swapper",
              ........
          }
  • process queue:
    • linked list of process descriptors
    • &init_task is the start of this queue.
    • traversing the process queue
      void display_processes(){
          struct task_struct *temp;
          temp = &init_task;
          for(;;){
              print process id, user id, program name, process state for temp;
              temp = next_task(temp); // find next process
              if (temp == &init_task) break;
          }
          printk("\n");
      }
  • run queue
    • A linked list of runnable processes.
    • The scheduler looks at this queue to pick the next process after each interrupt
    • Each cpu has its own run queue
    • Each run queue is an array of queues based on the priority
      • : struct prio_array{}.queue[]

Exercise-1

3.1) task_struct is defined in include/linux/sched.h (search for "task_struct {").
Which fields of the task_struct contain information for process ID, parent process ID, user ID, process status, children processes, the memory location of the process, the files opened, the priority of the process, program name?

  • Process ID : pid_t pid
  • Parent process ID : struct task_struct parent;
    • /* parent process */
  • User ID : uid_t uid
  • Process status : volatile long state;
    • /* -1 unrunnable, 0 runnable, >0 stopped */
  • Children processes : struct list_head children;
    • /* list of my children */
  • The memory location of the process : struct mm_struct *mm
  • The files opened : struct files_struct *files;
    • /* open file information */
  • The priority of the process : int orio, static prio, normal prio;
  • Program name : char comm[TASK_COMM_LEN]

3.2) Display all processes with ps โ€“ef. Find the pid of ps -ef, the process you have just executed. Find the pid and program name of the parent process of it, then the parent of this parent, and so on, until you see the init_task whose process ID is 0.



$ ps -ef
UID     PID   PPID    C   STIME   TTY        TIME  CMD
root      1      0    1   00:04   ?      00:00:00  init [3]
...     ...    ...  ...     ...   ...         ...  ...
root   4467      1    0   00:05   tty1   00:00:00  /bin/login --
...     ...    ...  ...     ...   ...         ...  ...
root   4486   4467    0   00:05   tty1   00:00:00  -bash
root   4491   4486    0   00:05   tty1   00:00:00  ps -ef
  • ps -ef์˜ PID๋Š” 4491์ด๋‹ค.
  • ps -ef์˜ PPID๋Š” 4486์ด๋ฏ€๋กœ, parent process๋Š” PID๊ฐ€ 4486์ธ -bash์ด๋‹ค.
  • -bash์˜ PPID๋Š” 4467์ด๋ฏ€๋กœ, parent process๋Š” PID๊ฐ€ 1์ธ init [3]์ด๋‹ค.
  • init [3]์˜ PPID๋Š” 0์œผ๋กœ, parent process๋Š” PID๊ฐ€ 0์ธ init_task์ด๋‹ค.

3.3) Define display_processes() in init/main.c (right before the first function definition). Call this function in the beginning of start_kernel().
Confirm that there is only one process in the beginning. Find the location where the number of processes becomes 2. Find the location where the number of processes becomes 3. Find the location where the number of processes is the greatest. Use dmesg to see the result of display_processes().

init/main.c :

  • init/main.c์— display_processes()๋ฅผ ์ •์˜ํ•˜์˜€๋‹ค. ์ฝ”๋“œ๊ฐ€ ์˜๋ฏธํ•˜๋Š” ๋ฐ”๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
    • ํฌ์ธํ„ฐํ˜• struct์ธ temp๋ฅผ ์„ ์–ธํ•˜๊ณ , init_task (์ปค๋„์˜ process descriptor) ์„ ๋„ฃ์–ด์ค€๋‹ค.
    • ํ•ด๋‹นํ•˜๋Š” ํ”„๋กœ์„ธ์Šค๋งˆ๋‹ค ๊ฐ๊ฐ pid, pname, state๋ฅผ ์ถœ๋ ฅํ•˜๊ณ , temp๋Š” ๊ทธ ๋‹ค์Œ ํ”„๋กœ์„ธ์Šค๋ฅผ ์ง€์ •ํ•˜๋„๋ก ํ•œ๋‹ค.
    • init_task ๋‹ค์Œ์—๋„ init_task๋ผ๋ฉด for๋ฌธ์„ ํƒˆ์ถœํ•˜๊ณ  ํ•จ์ˆ˜๊ฐ€ ๋๋‚˜๊ฒŒ ๋œ๋‹ค.


start_kernel ํ•จ์ˆ˜๊ฐ€ ํ˜ธ์ถœ๋˜๋ฉด display_processes()๋ฅผ ๊ฐ€์žฅ ๋จผ์ € ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜์˜€๋‹ค.

๋ณ€๊ฒฝ์‚ฌํ•ญ์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์ปค๋„์„ ์ปดํŒŒ์ผํ•˜๊ณ , ์žฌ๋ถ€ํŒ…ํ•˜์˜€๋‹ค.

$ make bzImage
$ cp arch/x86/boot/bzImage /boot/bzImage
$ reboot

dmesg๋กœ ๋ถ€ํŒ… ๋ฉ”์„ธ์ง€๋ฅผ ํ™•์ธํ•ด๋ณด์•˜๋‹ค.

pid: 0, pname: swapper, state: 0

์‹คํ–‰ ๊ฒฐ๊ณผ ํ•˜๋‚˜์˜ ํ”„๋กœ์„ธ์Šค๋งŒ ์žˆ์—ˆ๊ณ , swapper ํ”„๋กœ์„ธ์Šค๊ฐ€ 'pid: 0'์œผ๋กœ ์ƒ์„ฑ๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

์‹คํ–‰ ์ค‘์ธ ํ”„๋กœ์„ธ์Šค๊ฐ€ 2๊ฐœ๊ฐ€ ๋œ ์ฒซ ๋ฒˆ์งธ ์‹œ์ ์€ kernel_init์˜ ์ตœ์ƒ๋‹จ์—์„œ display_processes๋ฅผ ํ˜ธ์ถœํ–ˆ์„ ๋•Œ์ด๋‹ค.

๊ฐ€์žฅ ๋งŽ์€ ํ”„๋กœ์„ธ์Šค๊ฐ€ ๋ณด์—ฌ์ง€๋Š” ๊ณณ์€ kernel_init์—์„œ init_post๊ฐ€ ํ˜ธ์ถœ๋œ ์ดํ›„์ด๋‹ค.

3.4) Make a system call that, when called, displays all processes in the system. Run an application program that calls this system call and see if this program displays all processes in the system.

ํ”„๋กœ์„ธ์Šค ์ •๋ณด๋ฅผ ์ถœ๋ ฅํ•˜๋Š” system call์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋ฏ€๋กœ fs/read_write.c์— ํ•จ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค.

fs/read_write.c :

์ดํ›„ arch/x86/kernel/syscall_table_32.S ์—์„œ ๋น„์–ด์žˆ๋Š” 44๋ฒˆ index์— ํ•ด๋‹น ํ•จ์ˆ˜๋ฅผ ์„ค์ •ํ•ด์ฃผ์—ˆ๋‹ค.

arch/x86/kernel/syscall_table_32.S :

์ดํ›„, ๋ณ€๊ฒฝ์‚ฌํ•ญ์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด make ๋ช…๋ น์–ด๋กœ ๋ฆฌ๋ˆ…์Šค ์ปค๋„์„ ์ปดํŒŒ์ผํ•˜๊ณ  ์žฌ๋ถ€ํŒ…ํ•˜์˜€๋‹ค.

$ make bzImage
$ cp arch/x86/boot/bzImage /boot/bzImage
$ reboot

์žฌ๋ถ€ํŒ… ํ›„, ๋งŒ๋“  ์‹œ์Šคํ…œ ์ฝœ ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜๋Š” ํ”„๋กœ๊ทธ๋žจ์„ ์ž‘์„ฑํ•˜์˜€๋‹ค.

syscall_44.c :

#include <unistd.h>

int main(void) {
    syscall(44);
    return 0;
}

๋กœ๊ทธ ๋ ˆ๋ฒจ์„ ๋ฐ”๊พธ๊ณ  ํ”„๋กœ๊ทธ๋žจ์„ ์‹คํ–‰ํ•ด ์‹œ์Šคํ…œ ์ฝœ ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜๋ฉด ํ”„๋กœ์„ธ์Šค ๋ชฉ๋ก์ด ๋ณด์—ฌ์ง„๋‹ค.

$ echo 8 > /proc/sys/kernel/printk
$ ./syscall_44



3.4.1) Make a system call that, when called, displays all ancestor processes of the calling process in the system. For example, if ex1 calls this system call, you should see: ex1, ex1โ€™s parent, ex1โ€™s parentโ€™s parent, etc. until you reach pid=0 which is Linux itself.

๋ชจ๋“  ๋ถ€๋ชจ์˜ ํ”„๋กœ์„ธ์Šค ์ •๋ณด๋ฅผ ์ถœ๋ ฅํ•˜๋Š” system call์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋ฏ€๋กœ fs/read_write.c์— ํ•จ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค.

fs/read_write.c :

my_sys_display_all_ancestors ํ•จ์ˆ˜์—์„œ๋Š”, ํ˜„์žฌ process์˜ ์ •๋ณด๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด, current ํฌ์ธํ„ฐ๋ฅผ ๋ฐ˜ํ™˜ํ•ด์ฃผ๋Š” ์ปค๋„ ํ•จ์ˆ˜์ธ get_current๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค.


์ดํ›„ arch/x86/kernel/syscall_table_32.S ์—์„œ ๋น„์–ด์žˆ๋Š” 53๋ฒˆ index์— ํ•ด๋‹น ํ•จ์ˆ˜๋ฅผ ์„ค์ •ํ•ด์ฃผ์—ˆ๋‹ค.

arch/x86/kernel/syscall_table_32.S :

์ดํ›„, ๋ณ€๊ฒฝ์‚ฌํ•ญ์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด make ๋ช…๋ น์–ด๋กœ ๋ฆฌ๋ˆ…์Šค ์ปค๋„์„ ์ปดํŒŒ์ผํ•˜๊ณ  ์žฌ๋ถ€ํŒ…ํ•œ ํ•˜์˜€๋‹ค.

$ make bzImage
$ cp arch/x86/boot/bzImage /boot/bzImage
$ reboot

์žฌ๋ถ€ํŒ… ํ›„, ๋งŒ๋“  ์‹œ์Šคํ…œ ์ฝœ ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜๋Š” ํ”„๋กœ๊ทธ๋žจ์„ ์ž‘์„ฑํ•˜์˜€๋‹ค.

syscall_53.c :

#include <unistd.h>

int main(void) {
    syscall(53);
    return 0;
}

๋กœ๊ทธ ๋ ˆ๋ฒจ์„ ๋ฐ”๊พธ๊ณ  ํ”„๋กœ๊ทธ๋žจ์„ ์‹คํ–‰ํ•ด ์‹œ์Šคํ…œ ์ฝœ ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜๋ฉด ํ˜„์žฌ ํ”„๋กœ์„ธ์Šค์™€ ์กฐ์ƒ๋“ค์˜ ํ”„๋กœ์„ธ์Šค ๋ชฉ๋ก์ด ๋ณด์—ฌ์ง„๋‹ค.

$ echo 8 > /proc/sys/kernel/printk
$ ./syscall_53

3.5) Run three user programs, f1, f2, and f3, and run another program that calls the above system call as follows.
State 0 means runnable and 1 means blocked. Observe the state changes in f1, f2, f3 and explain what these changes mean.

f1.c :

int i,j; double x=1.2;
for(i=0;i<100;i++){
   for(j=0;j<10000000;j++){ // make f1 busy for a while
       x=x*x;
   }
   // and then sleep 1sec
   usleep(1000000);
}

f2.c :

int i,j; double x=1.2;
for(i=0;i<100;i++){
   for(j=0;j<10000000;j++){ // make f2 busy for a while
       x=x*x;
   }
   // and then sleep 2sec
   usleep(2000000);
}

f3.c :

int i,j; double x=1.2;
for(i=0;i<100;i++){
   for(j=0;j<10000000;j++){ // make f3 busy for a while
       x=x*x;
   }
   // and then sleep 3sec
   usleep(3000000);
}

ex1.c :

      for(i=0;i<100;i++){
         sleep(5);
         syscall(17); // show all processes
                     // assuming the system call number in exercise (3.4) is 44
      }
$ echo 8 > /proc/sys/kernel/printk
$ ./f1&
$ ./f2&
$ ./f3&
$ ./ex1




















์‹คํ–‰ ๊ฒฐ๊ณผ, f1, f2, f3์˜ ์ƒํƒœ๊ฐ€ ๋ฐ”๋€Œ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ex1์˜ state๋Š” 0์œผ๋กœ ์œ ์ง€๋˜์—ˆ๋‹ค.



f1 > f2 > f3 > ex1 ์ˆœ์œผ๋กœ ์ข…๋ฃŒ๋˜์—ˆ๋‹ค.

3.6) Modify your my_sys_display_all_processes() so that it can also display the remaining time slice of each process (current->rt.time_slice) and repeat 3.5) as below to see the effect.
chrt -rr 30 ./f1 will run f1 with priority value = max_priority-30 (lower priority means higher priority).
-rr is to set scheduling policy to SCHED_RR (whose max_priority is 99).

init/main.c :

display_processes ํ•จ์ˆ˜์˜ printk์— current->rt.time_slice๋ฅผ ์ถ”๊ฐ€ํ•ด์ฃผ๊ณ ,
๋ณ€๊ฒฝ์‚ฌํ•ญ์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์ปค๋„์„ ์ปดํŒŒ์ผํ•˜๊ณ , ์žฌ๋ถ€ํŒ…ํ•˜์˜€๋‹ค.

$ make bzImage
$ cp arch/x86/boot/bzImage /boot/bzImage
$ reboot

chrt๋Š” ํ”„๋กœ์„ธ์Šค์˜ real-time ์†์„ฑ์„ ์ˆ˜์ •ํ•˜๋Š” ๋ช…๋ น์–ด๋กœ, -rr์€ ์Šค์ผ€์ค„๋ง ์ •์ฑ…์„ SCHED_RR์œผ๋กœ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.
์ด๋•Œ, ํ”„๋กœ์„ธ์Šค๋Š” ๋‚ฎ์€ ์ˆซ์ž๋ฅผ ๊ฐ€์งˆ์ˆ˜๋ก ๋†’์€ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๊ฐ€์ง„๋‹ค.

$ chrt โ€“rr 30 ./f1&
$ chrt -rr 30 ./f2&
$ chrt -rr 30 ./f3&
$ chrt -rr 30 ./ex1













์‹คํ–‰ ๊ฒฐ๊ณผ ๊ฐ ํ”„๋กœ์„ธ์Šค๊ฐ€ ์‹คํ–‰๋  ๋•Œ์˜ ์ž”์—ฌ ํƒ€์ž„ ์Šฌ๋ผ์ด์Šค๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.
์ดˆ ๋‹จ์œ„๋กœ ์ถ”์ธก๋˜๋ฉฐ 6~9๊ฐœ์˜ ํ”„๋กœ์„ธ์Šค๊ฐ€ ๊ฐ™์€ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์•„ ๊ต‰์žฅํžˆ ์งง์€ ์‹œ๊ฐ„์— ์—ฌ๋Ÿฌ ํ”„๋กœ์„ธ์Šค๋“ค์ด ์ฒ˜๋ฆฌ๋˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

์ฒ˜์Œ์—๋Š” 250 ์ •๋„๊ฐ€ ์ฃผ์–ด์ง€๊ณ  0์— ๋„๋‹ฌํ•˜๋ฉด 25 ์ •๋„๊ฐ€ ์ƒˆ๋กœ ์ฑ„์›Œ์กŒ๋‹ค. ์ด ๊ณผ์ •์ด ex1์ด ๋๋‚  ๋•Œ๊นŒ์ง€ ๋ฐ˜๋ณต๋˜์—ˆ๋‹ค.

4. Kernel code for process data structures and the scheduler

include/linux/sched.h :

struct task_struct {
   long state;  // 0: runnable, >0 : stopped or dead
   int prio;    // priority
   const struct sched_class *sched_class; // scheduling functions depending on
                                          // scheduling class of this process
   struct sched_entity se;  // scheduling info
   struct list_head tasks;  // points to next task
   struct mm_struct *mm;    // memory occupied by this process
   pid_t pid;
   struct task_struct *parent;
   struct list_head children;
   uid_t uid; // owner of this process
   char comm[TASK_COMM_LEN]; // program name
   struct thread_struct thread; // pointer to saved registers
   struct fs_struct *fs;
   struct files_struct *files;
   struct signal_struct *signal;
   struct sighand_struct *sighand;
}

struct sched_class {  // fair class has func name such as task_tick_fair, enqueue_task_fair..
                      // rt class has task_tick_rt, enqueue_task_rt, ...
   void (*enqueue_task)(struct rq *rq, struct task_struct *p, ...);
   void (*dequeue_task)(struct rq *rq, struct task_struct *p, ...);
   struct task_struct *(*pick_next_task)(struct rq *rq);
   void (*task_tick)(struct rq *rq, struct task_struct *p,...);
   .........
}

struct sched_entity {
   u64 sum_exec_runtime;
   u64 vruntime;  // actual runtime normalized(weighted) by the number of
                  // runnable processes. unit is nanosecond
   ................
}

struct list_head is little tricky. It does not point to the next item directly.
For example,
struct list_head tasks;
does not mean (current->tasks).next points to the next task.

include/linux/list.h :

struct list_head{
            struct list_head  *next, *prev;
};
  • (current->tasks).next simply points to another struct list_head that is included in the next process. To find the next process, use macro: list_entry() or next_tasks()
    • list_entry( (current->tasks).next, struct task_struct, tasks)
    • or
    • next_task(current)

We can display all processes in the system by

       struct task_struct *temp;
       temp = &init_task;
       for(;;) {
          printk("pid %d ",temp->pid);
          temp = list_entry(temp->tasks.next, struct task_struct, tasks);
          if (temp == &init_task) break;
       }

To display run queues, it is more difficult.
Each process has a priority (โ€œprio" field in task_struct), and there are 0 to 139 priorities.
For each priority we have different run queue. run_list points to the next process in the run queue with the priority of the corresponding process.
this_rq() will point to the โ€œstruct rq" structure for the current cpu. This structure contains 140 run queues.

kernel/sched.c :

union  thread_union{
    struct  thread_info  thread_info;
    unsigned long  stack[THREAD_SIZE/sizeof(long)];  // 8192 bytes
}
#define next_task(p)  list_entry(rcu_dereference((p)->tasks.next), struct task_struct, tasks)
#define for_each_process(p)  for(p=&init_task;(p=next_task(p))!=&init_task;) do

arch/i386/kernel/init_task.c :

union thread_union init_thread_union;
struct task_struct  init_task = INIT_TASK(init_task);

include/asm-i386/thread_info.h :

struct thread_info{
   struct task_struct *task;        // main task structre
   struct exec_domain *exec_domain; // execution domain
   long               flags;
   long               status;
   __u32              cpu;          // cpu for this thread
   mm_segment_t       addr_limit;   // 0-0xBFFFFFFF (3G bytes) for user-thread
                                    // 0-0xFFFFFFFF (4G bytes) for kernel-thread
}

kernel/sched.c :

#define DEF_TIMESLICE (100*HZ/1000)     // 100 ms for default time slice. HZ=1000
                                        // HZ is num of timer interrupts per second.
struct prio_array {  // run queue
   unsigned int nr_active;
   struct list_head queue[MAX_PRIO];    // run queues for various priorities
};
void __activate_task(p, struct rq *rq) { // wake up p
   struct prio_array * target = rq->active;
   enqueue_task(p, target);
   p->array = array;
}

process priority: each process has priority in โ€œprio" (value 0..139) 0..99 is for real time task. 100..139 for user process

140 priority list

The kernel calls schedule() at the end of each ISR(Interrupt Service Routine) to pick the next process.

kernel/sched.c :

void schedule() {
   struct task_struct *prev, *next;
   struct prio_array *array;

   prev = current;
   rq = this_rq();  // run queue of the belonging cpu
   array = rq->active;  // active run queue
   deactivate_task(prev, rq);
   idx = sched_find_first_bit(array->bitmap);
   queue = array->queue + idx;
   /** old code
   next = list_entry(queue->next, struct task_struct, run_list); // next task
   array = next->array;
   **/
   next=pick_next_task(rq, prev);
   rq->curr = next;  // next is the curr task
   context_switch(rq, prev,next); // move to next
}

struct task_struct * pick_next_task(..){
   class=sched_class_highest;
   p=class->pick_next_task(rq);
   return p;
}

struct task_struct *pick_next_task_fair(rq){ // for cfs case
      cfs_rq=&rq->cfs;
      se=pick_next_entiry(cfs_rq);
      p = task_of(se);
      return p;
}

struct sched_entity *pick_next_entity(..){
   rb_entry(.....); // find the next task in rb tree
}

#define this_rq()  (&__get_cpu_var(runqueues))
static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name) \
__attribute__((__section__(โ€œ.data.percpu"))) __typeof__(type) per_cpu__##name

The above will make

static  struct  rq  per_cpu_ruqueues;

void wake_up_new_task(struct task_struct *p, ..){
   struct rq *rq, *this_rq;
   int  this_cpu, cpu;
   rq = task_rq_lock(p, ...);  // the runqueue of the cpu this task belongs to
   cpu = task_cpu(p);  // cpu p belongs to
   __activate_task(p, rq);  // insert p in rq
}

void scheduler_tick(){ // timer interrupt calls this to
                     // decrease time slice of current p (in old code)
                      // in new code (after 2.6.23), it increases curr->se->vruntime
   int cpu=smp_processor_id();
   struct rq *rq=cpu_rq(cpu);
   struct task_struct *curr=rq->curr;
   curr->sched_class->task_tick(rq, curr, 0); // task_running_tick() in old code
   .......
}

/** old code
void task_running_tick(rq, p){
   if (!--p->time_slice){ // decrease it. if 0, reschedule
      dequeue_task(p, rq->active);
      set_tsk_need_resched(p);
      p->time_slice=task_timeslice(p);  // reset time slice
      enqueue_task(p, rq->active); // put back at the end
   }
}
**/

void task_tick_fair(rq, curr, ..){
      se=&curr->se;  // sched_entity
      cfs_rq=cfs_rq_of(se);
      entity_tick(cfs_rq, se, ...);
}
void entity_tick(cfs_rq, ...){
   update_curr(cfs_rq);
   .........
}
void update_curr(cfs_rq){
   struct sched_entity *curr=cfs_rq->curr;
   now=rq_of(cfs_rq)->clock;
   delta_exec=now - curr->exec_start; // running time so far for curr
   __update_curr(cfs_rq, curr, delta_exec);
   curr->exec_start = now;
}
void __update_curr(cfs_rq, struct sched_entity *curr, delta_exec){
   curr->sum_exec_runtime += delta_exec;
   delta_exec_weighted=calc_delta_fair(delta_exec, curr);
   curr->vruntime += delta_exec_weighted;
}

5. fork

When the kernel starts, we have only one process: init_task, which represents the kernel itself. Other processes are created by fork.

1) fork system call duplicates a process.

example:

#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
int x;
void main(){
   x= fork();
   if (x!=0) {
      printf("korea %d\n", x);
      while (1);
   }
   else {
      printf("china\n");
      while (1);
   }
}

What is the result of above code?

2) Algorithm of fork

fork() ==> mov $2, %eax
int $0x80
==> system_call ==> arch/x86/kernel/process_32.c/sys_fork
=> kernel/fork.c/do_fork()

fork is translated into 2 assembly instructions as below by C library:

        mov $2, %eax
        int $0x80
  • The ISR for interrupt 0x80 is system_call which calls in turn sys_fork when eax=2. sys_fork calls do_fork and do_fork does followings:
    • (1) copy the body of the parent process
    • (2) copy thread_union of the parent process, and adjust some information
    • (3) insert child into the process queue
    • (4) return 0 to the child, and return childโ€™s pid to the parent

3) pthread_create

pthread_create is similar to fork() except it does not copy the body of the parent.

p1.c:

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

void * foo(void * aa){
   printf("hello from child\n");
   return NULL;
}

void main(){
   pthread_t x;
   pthread_create(&x, NULL, foo, NULL); // make a child which starts at foo
   printf("hello from parent\n");
}
$ gcc โ€“o p1 p1.c โ€“lpthread
$./p1
hello from child
hello from parent

4) kernel_thread

Use kernel_thread() in Linux kernel which is similar to pthread_create().

    start_kernel() {
       trap_init();
       init_IRQ();
       time_init();
       console_init();
       ...............
       rest_init();
    }
    rest_init() {
       .........
       kernel_thread(kernel_init, ...........);
       pid=kernel_thread(kthreadd, ....);
       schedule();
       cpu_idle();
    }

The above Linux code calls kernel_thread(kernel_init, ....).
After this call the kernel is duplicated (but only the thread_union of the kernel is duplicated), and the child's starting location is kernel_init().
Similarly, after kernel_thread(kthreadd,...), another child is born whose starting location is kthreadd.
Since we have three processes, they will be scheduled one by one.

6. exec

1) exec system call transforms one process to another

ex1.c :

void main(){
   printf("I am ex1\n");
}

ex2.c :

void main(){
   execve("./ex1", 0,0);
   printf("I am ex2\n");
}
$ gcc ex1.c -o ex1
$ gcc ex2.c -o ex2
$ ex2

What will be the result?

2) Algorithm of exec

exec ==> mov $11, %eax ==> system_call ==> sys_execve
int $0x80

exec is translated into 2 assembly instructions as below by C library:

        mov $11, %eax
        int $0x80
  • The ISR for interrupt 0x80 is system_call which calls in turn sys_execve when eax=11. (sys_execve is in arch/x86/kernel/process_32.c)
    sys_execve calls do_execve(fs/exec.c) which does following things:
    • (1) remove old body
    • (2) load new body
    • (3) update the task_struct
    • (4) update the KMS (the stack portion in thread_union) such that
      • KMS.eip = starting location of the new body

3) example code

exec1.c :

#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
int x;
void main(){
        char * exec2 = "./exec2";
        char * argv[2];
        argv[0] = exec2;
        argv[1] = 0;

        x=fork();
        if (x!=0){
                printf("korea %d\n",x);
                execve("./exec2", argv, 0);
                printf("exec failed\n");
        }
        else{
                printf("japan\n");
                for(;;);
        }
}

exec2.c :

#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
int x;
void main(){
   printf("china\n");
}
$ gcc -o exec2 exec2.c
$ gcc -o exec1 exec1.c
exec1

7. shell

shell uses fork() and exec() to run the command:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>

int main() {
   int x, y;
   char buf[50];
   char * argv[2];

   for(;;) {
      printf("$ ");
      scanf("%s", buf); // get command. no arg can be input this way
      argv[0] = buf;
      argv[1] = NULL;

      x = fork();
      if (x == 0) { // child
          printf("I am child to execute %s\n", buf);
          y = execve(buf, argv, 0);
          if (y < 0) {
             printf("exec failed. errno is %d\n", errno);
             exit(1);
          }
      } else {
         wait();
      }
   }
   return 0;
}

8. Initialization process of Linux

start_kernel()
==> rest_init()
==> kernel_thread(kernel_init, ....); // now we have two processes (init_task and kernel_init)
==> kernel_thread(kthreadd, ...); // init_task runs first and create another thread.

// now we have three processes(init_task, kernel_init, kthredd)

==> schedule(); // init_task calls schedule. the scheduler picks kernel_init.
                // prio of init_task is 140. prio of the other two is 120.
==> kernel_init()
==> do_basic_setup()
...........
init_post();
==> init_post()
==> run_init_process("/sbin/init", ......);
==> kernel_execve(โ€œ/sbin/init", ...); //kernel_init is transformed into /sbin/init.
==> /sbin/init
==> for (i=0;i < number of programs listed in /etc/inittab; i++) {
       x=fork();
       if (x==0){ // child
           execve(next program listed in /etc/inittab, ...);
       }
     } // parent goes back to the loop to create the next child
     for(;;){ // parent
       waits here;
     }
==> fork();
==> child init calls execve(โ€œ/sbin/agetty",..); // child init
                                                //  is transformed into /sbin/agetty
==> when user logins to this server /sbin/agetty execs to /bin/login
    and display
          login:
==> when user types login ID and password correctly /bin/login makes a child and
     execs the child to the shell as specified in /etc/passwd, which is usually /bin/bash
          root:x:0:0:root:/root:/bin/bash
          ...............
==> /bin/bash runs the shell code: display '#', read command, fork, let the child exec to the command, etc.
==> when user types ps -ef, shell forks and execs the child to ps -ef
init_task->/sbin/init->/sbin/agetty->/bin/login->/bin/bash->ps โ€“ef

9. What happens when you enter a Linux command?

Shell code again

   .........
   for(;;){
      printf("$ ");
      scanf("%s", buf); // get command. no arg can be input this way
      .................
      x=fork();
      if (x==0){ // child
          y=execve(buf, argv, 0);
          ............
      } else wait();
   }
    1. Shell runs printf("$"), and this library function calls write(1, "$", 1) which will display prompt. (printf => write => INT 128 => sys_write => display "$")
    1. Shell runs scanf("%s", buf);, and this library function calls read(0, buf, n) which will make the shell sleep until the user enters a command.
      Making shell sleep means setting shell's state to TASK_INTERRUPTIBLE (a blocked state) and taking it out of the run queue.
      Since shell cannot be scheduled, the scheduler picks kernel (pid=0) as the next process and runs cpu_idle(). (scanf => read => INT 128 => sys_read => make shell sleep; cpu jumps to cpu_idle)
    1. A user types command.
      $ ls<Enter>
    1. Each key typing will raise keyboard interrupt.
      press l => INT 33 => atkbd_interrupt => store l => cpu goes back to cpu_idle()
      => release l => INT 33 => atkbd_interrupt => ignore key release => cpu goes back to cpu_idle()
      => press s => INT 33 => ......
      => press => INT 33 => atkbd_interrupt => store ls in shell's buf and wakes up shell.
    • Waking up shell means set its state to TASK_RUNNING and put it back to the run queue. Now the scheduler picks shell as the next process and shell resumes execution.
    1. shell runs x=fork(), and fork() will make a child.
      (fork => INT 128 => do_fork() => make a child; assume the scheduler picks parent first)
    1. parent shell runs wait(), and wait() will make it sleep. Now the scheduler picks child.
      (wait => INT 128 => sys_wait)
    1. child shell runs execve("ls", ....) which will change it to /bin/ls program. The scheduler picks the child again (parent is still sleeping).
      (execve => INT 128 => do_execve)
    1. /bin/ls runs and shows all file names in the current directory in the screen.
      At the end, it calls exit(). exit() will make it a zombie (set its state to TASK_ZOMBIE) and sends a signal to parent.
      This signal wakes up parent (set its state to TASK_RUNNING).
      The scheduler now picks parent.
    1. parent goes back to the beginning of for(;;) loop and runs printf("$").
      $

10. Exercise-2

1) Run the program below. What happens? Explain the result.

ex1.c :

void main(){
   int x;
   x=fork();
   printf("x:%d\n", x);
}

fork๋Š” ์ž์‹ ์˜ body์™€ process descriptor๋ฅผ ๋ณต์‚ฌํ•ด child process๋ฅผ ๋งŒ๋“ค์–ด๋‚ธ๋‹ค.
fork๊ฐ€ ์„ฑ๊ณตํ•˜๋ฉด ์ž์‹ ํ”„๋กœ์„ธ์Šค์—์„œ๋Š” 0์„ ๋ฐ˜ํ™˜ํ•˜๊ณ (์‹คํŒจ์‹œ -1), ๋ถ€๋ชจ ํ”„๋กœ์„ธ์Šค์—์„œ๋Š” ์ž์‹์˜ pid๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

0์€ ์ž์‹ ํ”„๋กœ์„ธ์Šค์˜ printf๋กœ๋ถ€ํ„ฐ ์ถœ๋ ฅ๋œ ๊ฐ’์ด๊ณ , 4506์€ ๋ถ€๋ชจ ํ”„๋กœ์„ธ์Šค์˜ printf๋กœ๋ถ€ํ„ฐ ์ถœ๋ ฅ๋œ ๊ฒƒ์ด๋‹ค.

2) Try below and explain the result.

ex1.c :

void main(){
   fork();
   fork();
   fork();
   for(;;);
}

์œ„ ์ฝ”๋“œ๋Š” fork๋ฅผ 3๋ฒˆํ•˜๊ณ  ๋ฌดํ•œ๋ฃจํ”„๋ฅผ ๋Œ๊ณ  ์žˆ๋‹ค.

$ gcc โ€“o ex1 ex1.c
$ ./ex1 &
$ ps โ€“ef

์ด 8๊ฐœ์˜ ./ex1 ํ”„๋กœ์„ธ์Šค๊ฐ€ ์ƒ์„ฑ๋˜๋Š”๋ฐ, fork ํ•จ์ˆ˜๊ฐ€ 3๊ฐœ์ด๊ธฐ ๋•Œ๋ฌธ์— 2^3=8๊ฐœ์˜ ํ”„๋กœ์„ธ์Šค๊ฐ€ ์ƒ์„ฑ๋˜์—ˆ๋‹ค.




3) Run following code. What happens? Explain the result.

ex1.c :

#include <stdio.h>
#include <unistd.h>

void main(){
   int i; float y=3.14;
   fork();
   fork();
   for(;;){
      for(i=0;i<1000000000;i++) y=y*0.4;
      printf("%d\n", getpid());
   }
}

2๋ฒˆ๊ณผ ๋น„์Šทํ•˜์ง€๋งŒ, fork()๊ฐ€ 2๋ฒˆ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด 4(=2^2)๊ฐœ์˜ ./ex1 ํ”„๋กœ์„ธ์Šค๊ฐ€ ์ƒ์„ฑ๋œ๋‹ค.
4๊ฐœ์˜ ํ”„๋กœ์„ธ์Šค์—์„œ y์— ๋Œ€ํ•œ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ์ž์‹ ์˜ PID๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค.

๋ฌดํ•œ๋ฃจํ”„์ด๊ธฐ ๋•Œ๋ฌธ์— ํ”„๋กœ์„ธ์Šค๊ฐ€ ์ข…๋ฃŒ๋  ๋•Œ๊นŒ์ง€ ๊ณ„์† ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ณ  PID๋ฅผ ์ถœ๋ ฅํ•  ๊ฒƒ์ด๋‹ค.

4) Try below and explain the result.

ex1.c :

void main(){
   char *argv[10];
   argv[0] = "./ex2";
   argv[1] = 0;
   execve(argv[0], argv, 0);
}

ex2.c :

void main(){
   printf("korea\n");
}

execve๋Š” ํ˜„์žฌ ํ”„๋กœ์„ธ์Šค๋ฅผ ์ž…๋ ฅ ๋ฐ›์€ ํ”„๋กœ๊ทธ๋žจ์œผ๋กœ ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ต์ฒดํ•ด ์ƒˆ๋กœ ์‹œ์ž‘ํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค. ์ฒซ ๋ฒˆ์งธ ์ธ์ž๋กœ ํ”„๋กœ๊ทธ๋žจ ๊ฒฝ๋กœ๋ฅผ ๋ฐ›๊ณ  ๋‘ ๋ฒˆ์งธ ์ธ์ž๋กœ ํ”„๋กœ๊ทธ๋žจ์˜ argv์— ๋„˜์–ด๊ฐˆ ๊ฐ’์„ ๋ฐ›๋Š”๋‹ค.

$ gcc โ€“o ex1 ex1.c
$ gcc โ€“o ex2 ex2.c
$ ./ex1

ex1์—์„œ execve๋ฅผ ํ˜ธ์ถœํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ํ˜„์žฌ ํ”„๋กœ์„ธ์Šค body๊ฐ€ ex2์˜ body๋กœ ๊ต์ฒด๋˜๊ณ  ์ƒˆ๋กœ ์‹คํ–‰๋˜๊ธฐ ๋•Œ๋ฌธ์— ex2์˜ ์ถœ๋ ฅ์ธ "korea"๊ฐ€ ์ถœ๋ ฅ๋˜์—ˆ๋‹ค.

5) Run following code and explain the result.

void main() {
   char *argv[10];
   argv[0] = "/bin/ls";
   argv[1] = 0;
   execve(argv[0], argv, 0);
}

4๋ฒˆ๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ, ์‹คํ–‰๋˜๋Š” ํŒŒ์ผ ์ด๋ฆ„์ด "/bin/ls"๋กœ ๋ฐ”๋€Œ์—ˆ๋‹ค.

ls ๋ช…๋ น์–ด๋Š” /bin/ls์ด๋ผ๋Š” ํ”„๋กœ๊ทธ๋žจ์„ ์‹คํ–‰ํ•˜๋Š” ๋ช…๋ น์–ด์ด๋‹ค.
execve๋กœ /bin/ls์„ ์‹คํ–‰ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—, ls๋ฅผ ์‹คํ–‰ํ•œ ๊ฒฐ๊ณผ์™€ ๊ฐ™์€ ๋‚ด์šฉ์ด ์ถœ๋ ฅ๋˜์—ˆ๋‹ค.

6) Run following code and explain the result.

argv๋Š” ๋ฌธ์ž์—ด ๋ฐฐ์—ด๋กœ C์–ธ์–ด์—์„œ ๋ฌธ์ž์—ด๊ณผ ๊ฐ™์ด ๋งˆ์ง€๋ง‰ ์š”์†Œ๋ฅผ NULL ํ‘œ์‹œํ•จ์œผ๋กœ์จ ๋ฐฐ์—ด์˜ ๋์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

void main() {
   char *argv[10];
   argv[0] = "/bin/ls";
   argv[1] = "-a";
   argv[2] = 0;
   execve(argv[0], argv, 0);
}

5๋ฒˆ๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ, argv์˜ ๋‘ ๋ฒˆ์งธ ์š”์†Œ๋กœ "-a"๊ฐ€ ๋“ค์–ด์™”๋‹ค.

ls์˜ -a ์˜ต์…˜์€ ์ˆจ๊ฒจ์ง„ ํŒŒ์ผ์ด๋‚˜ ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ์˜ต์…˜์ด๋‹ค.

ls -a๋ฅผ ์ง์ ‘ ์‹คํ–‰ํ•ด๋ณด์•˜์„ ๋•Œ, ๋™์ผํ•œ ์ถœ๋ ฅ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

7) Run following code and explain the result.

p1.c :

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void * foo(void * aa) {
   printf("hello from child\n");
   return NULL;
}
void main() {
   pthread_t x;
   pthread_create(&x, NULL, foo, NULL);   // make a child which starts at foo
   printf("hi from parent\n");
   pthread_join(x, NULL);                 // wait for the child
}

ํ”„๋กœ์„ธ์Šค(process)๋ž€ ๋‹จ์ˆœํžˆ ์‹คํ–‰ ์ค‘์ธ ํ”„๋กœ๊ทธ๋žจ(program)์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

์ฆ‰, ์‚ฌ์šฉ์ž๊ฐ€ ์ž‘์„ฑํ•œ ํ”„๋กœ๊ทธ๋žจ์ด ์šด์˜์ฒด์ œ์— ์˜ํ•ด ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์„ ํ• ๋‹น๋ฐ›์•„ ์‹คํ–‰ ์ค‘์ธ ๊ฒƒ์„ ๋งํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ํ”„๋กœ์„ธ์Šค๋Š” ํ”„๋กœ๊ทธ๋žจ์— ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ์™€ ๋ฉ”๋ชจ๋ฆฌ ๋“ฑ์˜ ์ž์› ๊ทธ๋ฆฌ๊ณ  ์Šค๋ ˆ๋“œ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

์Šค๋ ˆ๋“œ(thread)๋ž€ ํ”„๋กœ์„ธ์Šค(process) ๋‚ด์—์„œ ์‹ค์ œ๋กœ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์ฃผ์ฒด๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ๋ชจ๋“  ํ”„๋กœ์„ธ์Šค์—๋Š” ํ•œ ๊ฐœ ์ด์ƒ์˜ ์Šค๋ ˆ๋“œ๊ฐ€ ์กด์žฌํ•˜์—ฌ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.
์ถœ์ฒ˜ : TCP School - 69) ์Šค๋ ˆ๋“œ์˜ ๊ฐœ๋…

pthread_create๋Š” ์Šค๋ ˆ๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค. ์ฒซ ๋ฒˆ์งธ ์ธ์ž thread๋Š” ์Šค๋ ˆ๋“œ๊ฐ€ ์ƒ์„ฑ๋˜์—ˆ์„ ๋•Œ, ์ด๋ฅผ ์‹๋ณ„ํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ’์ด๋‹ค. ์„ธ ๋ฒˆ์งธ ์ธ์ž๋Š” ์Šค๋ ˆ๋“œ๊ฐ€ ์‹คํ–‰๋  ๋•Œ, ์‚ฌ์šฉ๋  ํ•จ์ˆ˜๋ฅผ ๋„ฃ์–ด์ค€๋‹ค.

$ gcc โ€“o p1 p1.c โ€“lpthread
$ ./p1

<pthread.h> ํ—ค๋”์˜ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด PThread ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋งํฌํ•ด์•ผ ํ•œ๋‹ค. gcc์—์„œ -l์€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋งํฌํ•˜๋Š” ์˜ต์…˜์œผ๋กœ -lpthread๋Š” /usr/lib/libpthread.so๋ฅผ ๋งํฌํ•œ๋‹ค.

์‹คํ–‰๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

8) Run following code and explain the difference.

p1.c :

#include <stdio.h>
int y=0;

int main() {
   int x;
   x = fork();
   if (x == 0) {
      y = y + 2;
      printf("process child:%d\n", y);
   } else {
      y = y + 2;
      printf("process parent:%d\n", y);
   }
   return 0;
}

p2.c :

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

int y = 0;

void * foo(void *aa) { // aa is arguments passed by parent, if any.
   y = y + 2;
   printf("thread child:%d\n", y);
   return NULL;
}

void main() {
   pthread_t x;
   pthread_create(&x, NULL, foo, NULL);
   y = y + 2;
   printf("thread parent:%d\n", y);
   pthread_join(x, NULL);                // wait for the child
}

p1.c๋Š” process์˜ parent, child๋ฅผ ๋น„๊ต,
p2.c๋Š” thread์˜ parent, child๋ฅผ ๋น„๊ตํ•˜๋Š” ์ฝ”๋“œ์ด๋‹ค.

p1.c์™€ p2.c ๋ชจ๋‘ ์ „์—ญ ๋ณ€์ˆ˜ y๋ฅผ ์„ ์–ธํ•ด์ฃผ์—ˆ๋‹ค.

p1.c๋Š” y์˜ ๊ฐ’์œผ๋กœ parent์™€ child ๋ชจ๋‘ ๋™์ผํ•˜๊ฒŒ 2๊ฐ€ ์ถœ๋ ฅ๋˜์—ˆ์ง€๋งŒ, ๊ฐ๊ฐ์˜ ํ”„๋กœ์„ธ์Šค๋งˆ๋‹ค ๊ฐœ๋ณ„์ ์ธ y๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๋Š” ์„œ๋กœ์˜ ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š์€ ๊ฐ’์ด๋‹ค.

p2.c์—์„œ๋Š” ์Šค๋ ˆ๋“œ๊ฐ€ ํ•œ ํ”„๋กœ์„ธ์Šค ๋‚ด์˜ ์ „์—ญ๋ณ€์ˆ˜ y๋ฅผ ๊ณต์œ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— child process์—์„œ 2๋ฅผ ์ถœ๋ ฅํ•˜๊ณ , parent process์—์„œ 2์ธ y์— 2๋ฅผ ๋”ํ•œ 4๋ฅผ ์ถœ๋ ฅํ–ˆ๋‹ค.

11. Exercise-3

1) Try the shell code in section 7. Try Linux command such as /bin/ls, /bin/date, etc.

shell.c :


Section 7์˜ shell code๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ shell ํ”„๋กœ๊ทธ๋žจ์„ ๋งŒ๋“ค์–ด๋ณด์•˜๋‹ค.

/bin/ls, /bin/date, ๊ทธ๋ฆฌ๊ณ  /bin/pwd/ ๋ช…๋ น์–ด๋ฅผ ์ž…๋ ฅํ•ด๋ณด์•˜๋Š”๋ฐ ๊ฒฐ๊ณผ๊ฐ’์ด ์ •ํ™•ํžˆ ๋‚˜์˜จ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

2) Print the pid of the current process (current->pid) inside rest_init() and kernel_init(). The pid printed inside rest_init() will be 0, but the pid inside kernel_init() is 1. 0 is the pid of the kernel itself.
Why do we have pid=1 inside kernel_init()?
Find a location where current->pid will print 2.

init/main.c :

์œ„์™€ ๊ฐ™์ด rest_init๊ณผ kernel_init ํ•จ์ˆ˜ ์‹œ์ž‘ ๋ถ€๋ถ„์— ํ˜„์žฌ PID๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ์‚ฝ์ž…ํ–ˆ๋‹ค.

๋ณ€๊ฒฝ์‚ฌํ•ญ์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์ปค๋„์„ ์ปดํŒŒ์ผํ•˜๊ณ , ์žฌ๋ถ€ํŒ…ํ•˜์˜€๋‹ค.

$ make bzImage
$ cp arch/x86/boot/bzImage /boot/bzImage
$ reboot

dmesg๋กœ ๋ถ€ํŒ… ๋ฉ”์„ธ์ง€๋ฅผ ํ™•์ธํ•ด๋ณด์•˜๋‹ค.

rest_init์—์„œ์˜ PID๋Š” 0, kernel_init์—์„œ๋Š” 1์ด ์ถœ๋ ฅ๋˜์—ˆ๋‹ค.

rest_init ํ•จ์ˆ˜ ๋‚ด๋ถ€์—์„œ๋Š” kernel_thread ํ•จ์ˆ˜๋กœ kernel_init์ด๋ผ๋Š” task๋ฅผ ๋งŒ๋“ค๋ฉฐ PID๋ฅผ 1๋กœ ์ง€์ •ํ•œ๋‹ค. ์ด๋•Œ, kernel_init์€ ํ”„๋กœ์„ธ์Šค์ด๊ธฐ ๋•Œ๋ฌธ์— init_task์™€ process body๋ฅผ ๊ณต์œ ํ•œ๋‹ค.


๋‹ค์Œ์œผ๋กœ current->pid์˜ ์ถœ๋ ฅ์ด 2๊ฐ€ ๋˜๋Š” ๊ณณ์„ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด ps ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

$ ps -ef

kthreadd์˜ PID๊ฐ€ 2์ด๋ฏ€๋กœ current->pid์˜ ์ถœ๋ ฅ์ด 2๊ฐ€ ๋˜๋Š” ๊ณณ์€ kthreadd๊ฐ€ ์‹คํ–‰๋œ ์ดํ›„๋ผ๊ณ  ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋‹ค.

3) The last function call in start_kernel() is rest_init(). If you insert printk() after rest_init(), it is not displayed during the system booting. Explain the reason.

init/main.c :

    void start_kernel(){
        ............
        printk("before rest_init\n");  // this will be printed out
        rest_init();
        printk("after rest_init\n");   // but this will not.
    }

start_kernel()์—์„œ ๋งˆ์ง€๋ง‰์œผ๋กœ ํ˜ธ์ถœ๋˜๋Š” rest_init๋ฅผ ๋ณด๋ฉด ํ•จ์ˆ˜์˜ ์ •์˜ ์ค‘ cpu_idle ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.

arch/x86/kernel/process_64.c:
cpu_idle ํ•จ์ˆ˜๋Š” "login:"๋ฅผ ํ™”๋ฉด์— ์ถœ๋ ฅํ•œ ํ›„, ์‚ฌ์šฉ์ž์˜ ์ž…๋ ฅ ์ „๊นŒ์ง€ ๋ฌดํ•œ๋ฃจํ”„๋ฅผ ๋Œ๋ฉฐ ์Šค์ผ€์ฅด๋ง์„ ๊ธฐ๋‹ค๋ฆฐ๋‹ค. ๊ณ„์† ๋ฌดํ•œ ๋ฃจํ”„๋ฅผ ๋Œ๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๊ฐ™์€ ํ”„๋กœ์„ธ์Šค์˜ rest_init ์ดํ›„์˜ ์ฝ”๋“œ๋Š” ์‹คํ–‰๋˜์ง€ ์•Š๋Š”๋‹ค. ๋”ฐ๋ผ์„œ rest_init ์ดํ›„์˜ ์ฝ”๋“œ๋Š” ๋ถ€ํŒ… ๊ณผ์ •์—์„œ ์‹คํ–‰๋  ์ˆ˜ ์—†๋‹ค.

4) The CPU is either in some application program or in Linux kernel. You always should be able to say where is the CPU currently. Suppose we have a following program (ex1.c).

ex1.c:

   void main(){
      printf("korea\n");
   }

When the shell runs this, CPU could be in shell program or in ex1 or in kernel.
Explain where is CPU for each major step of this program. You should indicate the CPU location whenever the cpu changes its location among these three programs.
Start the tracing from the moment when the shell prints a prompt until it prints next prompt.

shell: printf(โ€œ$โ€);        // CPU๋Š” shell์— ์žˆ์œผ๋ฉฐ, shell์—์„œ ์ž…๋ ฅ ๊ฐ€๋Šฅ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฌธ์ž๋ฅผ ์ถœ๋ ฅ
    => write(1, โ€œ$โ€, 1);   // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, STDOUT_FILENO(=1)์— โ€œ$โ€์„ 1๊ธ€์ž ์ถœ๋ ฅ
    => INT 128             // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, ์‹œ์Šคํ…œ ์ฝœ ์ธํ„ฐ๋ŸฝํŠธ์ธ 128๋ฒˆ์„ ํ˜ธ์ถœ
    => mov eax 4           // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, write์˜ ์‹œ์Šคํ…œ ์ฝœ ๋ฒˆํ˜ธ๋Š” 4
kernel: sys_write()        // CPU๋Š” kernel์— ์žˆ์œผ๋ฉฐ, โ€œ$โ€ ๋ฌธ์ž์—ด์„ ์ถœ๋ ฅํ•˜๊ธฐ ์œ„ํ•œ ์‹œ์Šคํ…œ ์ฝœ ํ˜ธ์ถœ

shell: scanf(โ€œ%sโ€, buf);   // CPU๋Š” shell์— ์žˆ์œผ๋ฉฐ, ์‚ฌ์šฉ์ž๊ฐ€ ์ž…๋ ฅํ•œ ๋ฌธ์ž์—ด์„ ์ฝ์Œ
    => read(1, buf, n);    // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, STDIN_FILENO(=1)์—์„œ len ๋งŒํผ ์ฝ์–ด buf์— ์ €์žฅ
    => INT 128             // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, ์‹œ์Šคํ…œ ์ฝœ ์ธํ„ฐ๋ŸฝํŠธ์ธ 128๋ฒˆ์„ ํ˜ธ์ถœ
    => mov eax 3           // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, read์˜ ์‹œ์Šคํ…œ ์ฝœ ๋ฒˆํ˜ธ๋Š” 3
kernel: sys_read();        // CPU๋Š” kernel์— ์žˆ์œผ๋ฉฐ, ์ž…๋ ฅํ•œ ๋ฌธ์ž์—ด์„ ์ฝ์–ด์˜ค๋Š” ์‹œ์Šคํ…œ ์ฝœ ํ˜ธ์ถœ

/* (ํ‚ค๋ณด๋“œ ์ž…๋ ฅ ๋ฐœ์ƒ) */
shell: INT 33              // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, ํ‚ค๋ณด๋“œ ์ธํ„ฐ๋ŸฝํŠธ์ธ 33๋ฒˆ ํ˜ธ์ถœ
kernel: atkbd_interrupt    // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, ํ‚ค๋ณด๋“œ ๋ฒ„ํผ์— ์ž…๋ ฅํ•œ ๋ฌธ์ž ์ €์žฅ
/* (ENTER๊ฐ€ ์ž…๋ ฅ๋  ๋•Œ๊นŒ์ง€ ์œ„ ๊ณผ์ • ๋ฐ˜๋ณต) */

/* (ENTER ์ž…๋ ฅ) */         // CPU๊ฐ€ shell๋กœ ๋˜๋Œ์•„๊ฐ

shell: x=fork();           // CPU๋Š” shell์— ์žˆ์œผ๋ฉฐ, ํ”„๋กœ๊ทธ๋žจ์„ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ์ž์‹ ํ”„๋กœ์„ธ์Šค ์ƒ์„ฑ
    => INT 128             // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, ์‹œ์Šคํ…œ ์ฝœ ์ธํ„ฐ๋ŸฝํŠธ์ธ 128๋ฒˆ์„ ํ˜ธ์ถœ
kernel: sys_fork()         // CPU๋Š” kernel์— ์žˆ์œผ๋ฉฐ, fork์˜ ์‹œ์Šคํ…œ ์ฝœ ํ˜ธ์ถœ

shell: printf(โ€œI am child~%s\nโ€, buf); // CPU๋Š” shell์— ์žˆ์Œ
    => write(1, โ€œI am~โ€, n)            // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, STDOUT_FILENO(=1)์— โ€œI am~โ€์„ n๊ธ€์ž๋งŒํผ ์ถœ๋ ฅ
    => INT 128                         // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, ์‹œ์Šคํ…œ ์ฝœ ์ธํ„ฐ๋ŸฝํŠธ์ธ 128๋ฒˆ์„ ํ˜ธ์ถœ
kernel: sys_write()                    // CPU๋Š” kernel์— ์žˆ์œผ๋ฉฐ, โ€œI am~โ€๋ฌธ์ž์—ด์„ ์ถœ๋ ฅํ•˜๊ธฐ ์œ„ํ•œ ์‹œ์Šคํ…œ ์ฝœ ํ˜ธ์ถœ

shell: y=execve(buf, argv, 0);   // CPU๋Š” shell์— ์žˆ์œผ๋ฉฐ, ์ž…๋ ฅ๋ฐ›์€ ํ”„๋กœ๊ทธ๋žจ์„ ์‹คํ–‰
    => INT 128                   // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, ์‹œ์Šคํ…œ ์ฝœ ์ธํ„ฐ๋ŸฝํŠธ์ธ 128๋ฒˆ์„ ํ˜ธ์ถœ
kernel: sys_execve()             // CPU๋Š” kernel์— ์žˆ์œผ๋ฉฐ, execve์˜ ์‹œ์Šคํ…œ ์ฝœ ํ˜ธ์ถœ

ex1: printf(โ€œkorea\nโ€);          // CPU๋Š” ex1์— ์žˆ์Œ
    => write(1, โ€œkorea\nโ€, 6);   // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ,
    => INT 128                   // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, ์‹œ์Šคํ…œ ์ฝœ ์ธํ„ฐ๋ŸฝํŠธ์ธ 128๋ฒˆ์„ ํ˜ธ์ถœ
kernel: sys_write()              // CPU๋Š” kernel์— ์žˆ์œผ๋ฉฐ, โ€œkoreaโ€๋ฌธ์ž์—ด์„ ์ถœ๋ ฅํ•˜๊ธฐ ์œ„ํ•œ ์‹œ์Šคํ…œ ์ฝœ ํ˜ธ์ถœ
    => exit(0)

shell: else wait();        // CPU๋Š” shell์— ์žˆ์Œ
    => INT 128             // CPU๋Š” C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ์œผ๋ฉฐ, ์‹œ์Šคํ…œ ์ฝœ ์ธํ„ฐ๋ŸฝํŠธ์ธ 128๋ฒˆ์„ ํ˜ธ์ถœ
kernel: sys_wait4          // CPU๋Š” kernel์— ์žˆ์Œ

shell: printf("$");        // CPU๋Š” ๋‹ค์‹œ shell์— ์žˆ์œผ๋ฉฐ, ํ”„๋กœ๊ทธ๋žจ์ด ์ข…๋ฃŒ๋˜๋ฉด ๋‹ค์‹œ shell์—์„œ ์ž…๋ ฅ ๊ฐ€๋Šฅ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฌธ์ž๋ฅผ ์ถœ๋ ฅ

5) What happens if the kernel calls kernel_init directly instead of calling kernel_thread(kernel_init, ...) in rest_init()?
Call kernel_init with NULL argument and explain why the kenel falls into panic.

init/main.c :

rest_init ํ•จ์ˆ˜์˜ ์ •์˜์—์„œ kernel_thread ์ฝ”๋“œ๋ฅผ ์ฃผ์„ ์ฒ˜๋ฆฌํ•œ ํ›„ kernel_init(NULL);๋กœ ์ง์ ‘ ํ˜ธ์ถœํ•˜๋„๋ก ํ•˜์˜€๋‹ค.

์ดํ›„ recompile ๋ฐ ์žฌ๋ถ€ํŒ…ํ•˜์˜€๋‹ค.

๋งˆ์ง€๋ง‰ ์ค„์— "Kernel panic"์ด๋ผ๋Š” ์—๋Ÿฌ ๋ฉ”์„ธ์ง€๊ฐ€ ์ถœ๋ ฅ๋œ ํ›„ ๋ถ€ํŒ…์ด ๋” ์ด์ƒ ์ง„ํ–‰๋˜์ง€ ์•Š์•˜๋‹ค.

kernel_thread๋Š” ํ”„๋กœ์„ธ์Šค ๋””์Šคํฌ๋ฆฝํ„ฐ๋ฅผ ๋ณต์‚ฌํ•˜๋Š”๋ฐ, kernel_thread ์—†์ด kernel_init์„ ์‹คํ–‰ํ•˜๊ฒŒ ๋˜๋ฉด kernel_init ๋‚ด๋ถ€์˜ ๋ฌดํ•œ๋ฃจํ”„๋กœ ์ธํ•ด ์ดํ›„ ํ”„๋กœ์„ธ์Šค๊ฐ€ ์‹คํ–‰๋˜์ง€ ์•Š๋Š”๋‹ค.

๋˜ํ•œ, kernel_execve์œผ๋กœ ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์Šค๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ํ˜„์žฌ body๋ฅผ ์ œ๊ฑฐํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค.

6) Trace fork, exec, exit, wait system call to find the corresponding code for the major steps of each system call.

fork

fork๋Š” sys_fork ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.

arch/x86/kernel/process_32.c :

kernel/fork.c :

do_fork :




copy_process :


......


dup_task_struct :


.....

sys_fork -> do_fork -> copy_process -> dup_task_struct

do_fork์—์„œ returnํ•˜๋Š” nr์€ ๋ณต์‚ฌ๋œ ํ”„๋กœ์„ธ์Šค์˜ PID์ด๋‹ค.

exec

exec๋Š” sys_exec ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.

arch/x86/kernel/process_32.c :

fs/exec.c :


......

sys_execve -> do_execve -> open_exec -> sched_exec

do_execve์—์„œ ํŒŒ์ผ์„ ์—ด๊ณ , ์Šค์ผ€์ค„์— ๋“ฑ๋กํ•˜๊ณ , argv๋“ฑ์˜ ๊ฐ’์„ ๋„˜๊ฒจ์ค€๋‹ค.

exit

exit๋Š” sys_exit ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.

kernel/exit.c :

sys_exit :


do_exit :

......

......

......

sys_exit -> do_exit -> exit_signals -> exit_mm -> exit_thread -> exit_notify -> schedule

์‹œ๊ทธ๋„ ๋ณด๋‚ด๊ณ (๋“ฑ๋ก๋œ ํ•จ์ˆ˜ ํ˜ธ์ถœ), ๋ฉ”๋ชจ๋ฆฌ ํšŒ์ˆ˜ํ•˜๊ณ , ์Šค๋ ˆ๋“œ ์ข…๋ฃŒํ•˜๊ณ , ๋ถ€๋ชจ ํ”„๋กœ์„ธ์Šค์— ์•Œ๋ฆฌ๊ณ (์‹œ๊ทธ๋„ ์ „์†ก), ์Šค์ผ€์ค„๋ง์œผ๋กœ ์™„์ „ํžˆ ์ œ๊ฑฐํ•œ๋‹ค.

wait

wait(&wstatus)๋Š” waitpid(-1, &wstatus, 0)์ด๋‹ค. ๋”ฐ๋ผ์„œ waitpid๋ฅผ ์ฐพ์•„์•ผ ํ•œ๋‹ค.

waitpid๋Š” sys_waitpid ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.

kernel/exit.c :

sys_waitpid :

sys_waitpid๋ฅผ ์ฐพ์•„๊ฐ€๋ฉด,
sys_waitpid๋Š” ํ˜ธํ™˜์„ฑ์„ ์œ„ํ•ด ๋‚จ๊ฒจ๋‘์—ˆ์„๋ฟ, sys_wait4์œผ๋กœ ๊ตฌํ˜„๋œ๋‹ค๊ณ  ์ฃผ์„์ด ๋‚จ๊ฒจ์žˆ๋‹ค.
sys_wait4๋ฅผ ์ฐพ์•„๋ณด์ž.


sys_wait4 :


do_wait :




......

sys_waitpid -> sys_wait4 -> do_wait -> wait_task_stopped / wait_task_zombie / wait_task_continued

7) Explain the result of following:

$ ./startsys;./sysnum;./stopsys

where, startsys sets the kernel flag so that system call number can be displayed, stopsys resets it, and sysnum calls printf.

sysnum.c :

void main(){
   printf("hi\n");
}

startsys.c :

void main(){
   syscall(31); // start printing sysnum
}

stopsys.c :

void main(){
   syscall(32); // stop printing sysnum
}

8) When the shell runs

         execve(argv[0], argv, 0);

how the Linux knows the value of argv?

9) Write a program that causes divide-by-zero fault:

        int x,y;
        y=0;
        x=20/y;

This program, when run, will print:
Floating point exception
and dies. It dies because of divide-by-zero exception. Modify the kernel such that the system prints instead (when this program runs):
Divide-by-zero exception
Floating point exception

Tip) to make a call to a new function from entry.S, you need to protect registers as follows:

       SAVE_ALL
       call new_function
       RESTORE_REGS

11. exit

All programs end with exit() system call. Even If the programmer didn't put exit() in his code, the compiler will provide it in crtso (C run-time start-off function).

1) algorithm

  • remove body
  • make it a zombie
  • send SIGCHLD to the parent
  • adopt children to init process
  • schedule next process

2) kernel code

exit -> sys_exit -> kernel/exit.c:

do_exit() {
      struct  task_struct  *tsk = current;
      exit_mm(tsk);  // remove body
      exit_sem(tsk);
      __exit_files(tsk);
      __exit_fs(tsk);      // remove resouces
      exit_notify(tsk);  // send SIGCHLD to the parent, ask init to adopt my own child,
                       // set tsk->exit_state = EXIT_ZOMBIE to make it a zombie
      tsk->state = TASK_DEAD;
      schedule();     // call a scheduler
}

12. wait

The parent should wait in wait to collect the child; otherwise the child stays as a zombie consuming 8192 bytes of the memory.

1) algorithm

if child has exit first (that is, if the child is a zombie)
    let it die completely (remove its process descriptor)
else (child is not dead yet)
    block parent
    remove parent from the run queue
    schedule next process
When later the child exits, the parent will get SIGCHLD, wakes up, and be inserted into the run queue.

2) kernel code

wait -> sys_wait4 -> kernel/exit.c :

do_wait() {
   struct  task_struct *tsk;
   DECLARE_WAITQUEUE(wait, current);  // make a "wait" queue
   add_wait_queue(&current->signal->wait_chldexit, &wait);
   current->state = TASK_INTERRUPTIBLE; // block parent
   tsk = current;
   do{
      list_for_each(_p, &task-children){
         p = list_entry(...);
         if (p->exit_state == EXIT_ZOMBIE){ // if child has exit first
            wait_task_zombie(p, ...); // kill it good
            break;
         }
         // otherwise, it's still alive. wait here until it is dead
         wait_task_contiuned(p,...);
         break;
   }
   schedule();
}