Skip to content

Latest commit

ย 

History

History

07. On-memory File System

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
ย 
ย 
ย 
ย 
ย 
ย 

On-memory file system

on-disk, on-memory file system, mounting, process and file system, file system calls

0. Accessing a file in EXT2

x=open("/d1/d2/f1", .....);  // find the inode of "/d1/d2/f1"
  • read the super block and find the location of the group descriptor
  • read the group descriptor and find the location of the inode table
  • read the inode table, find inode 2, find the block locations of "/"
  • read the blocks of "/" and find the inode number of "d1"
  • find the inode of "/d1" and find the block locations of "/d1"
  • read the blocks of "/d1" and find the inode number of "d2"
  • find the inode of "/d1/d2" and find the block locations of "/d1/d2"
  • read the blocks of "/d1/d2" and find the inode number of f1
  • find the inode of "/d1/d2/f1"

1. on-disk, on-memory file system

1) on-disk file system: file system data structure on disks. example: EXT2, FAT, ....

2) on-memory file system

  • disk is slow => open, read, write take too much time
  • we cache frequently-used data (superblock, inode, group descriptor,...) into memory
  • when caching, some additional information is added
    • each disk has its own file system, and we need to know which meta block came from which disk

2.1) caching superblock

  • (1)
    • on-disk : ext2_super_block{}
    • on-mem: super_block{}
  • (2) additional info in super_block{} (include/linux/fs.h)
    • s_list : next superblock
    • s_dev: device number. which disk this superblock came from?
    • s_type: file system type?
    • s_op : operations on superblock
    • s_root : root directory of the file system of this superblock
    • s_files : link list of file{} belonging to this file system
    • s_id : device name of this super block
  • (3) all cached superblocks form a link-list pointed to by โ€œsuper_blocksโ€ (fs/super.c)

2.2) caching inode

Individual inode is cached when accessed by the system.

  • (1)
    • on-disk : ext2_inode{}
    • on-mem: inode{} (include/linux/fs.h)
  • (2) additional info
    • i_list : next inode
    • i_dentry: corresponding dentry list for this inode
    • i_ino : inode number
    • i_rdev: device this inode belongs to
    • i_count: usage counter
    • i_op: operations on this inode
    • i_sb: pointer to super_block{} this inode belongs to
    • i_pipe: used if a pipe
  • (3) all cached inodes form a linked-list pointed to by โ€œinode_in_useโ€ (fs/inode.c)

2.3) caching other blocks

  • (1) added info
    • a buffer_head{} structure is attached to each cached block: (include/linux/buffer_head.h)
    • b_blocknr : block number
    • b_bdev : device this block belongs to
    • b_size : block size
    • b_data : original block
  • (2) all cached blocks are attached to a hash table, โ€œhash_table_arrayโ€(linux 2.4)

2.4) dentry table

  • (1) for each cached directory entry, dentry{} structure is defined
    • For example, when reading โ€œ/aa/bbโ€, three dentry objects are created: one for โ€œ/โ€, another for โ€œaaโ€, and the last for โ€œbbโ€.
  • (2) dentry{} (include/linux/dcache.h)
    • d_inode: pointer to the corresponding inode
    • d_op : operations on this dentry
    • d_mounted: this inode is a mounting point if d_mounted > 0
    • d_name: corresponding file name (d_name.name is the actual file name)

2. mounting

All cached file systems are connected into one virtual file system through โ€œmountingโ€

1) root file system: the first file system cached into the system

  • other file systems are mounted on this root file system

2) mount(โ€œ/dev/xโ€, โ€œ/y/zโ€) or "mount /dev/x /y/z"

  • meaning: mount the file system in /dev/x on /y/z
    • mounted file system: /dev/x
    • mounting point: /y/z
  • mounting process:
    • cache the file system in /dev/x
    • cache superblock of /dev/x : sb
    • cache the root inode of /dev/x : rinode
    • sb->s_root = rinode
    • connect the new file system to the mounting point
      •   d_mounted of /y/z += 1
          allocate vfsmount{}and set
                  mnt_mountpoint=/y/z
                  mnt_root= rinode
                  mnt_sb=sb
          insert this vfsmount{} into mount_hashtable
        
      - ```c
          struct vfsmount{ // include/linux/mount.h. mounting info of this fs
              struct vfsmount *mnt_parent;  // parent vfsmount
              struct dentry *mnt_mountpoint; // mounting point
              struct dentry *mnt_root;       // root of this file system
              struct super_block *mnt_sb;    // super block of this file system
              char *mnt_devname;  // dev name
              .......
          };
      

3) example

Suppose we have two disks: dev0 and dev1. Suppose they have the file trees as below:

Assume dev0 is the root device (one which has the root file system).

(1) start_kernel() -> kernel_init() -> prepare_namespace()->mount_root()

  • mount_root() caches the root file system: - cache the superblock - cache the root inode After this, the system has:

(2) โ€œmount /dev/fd0 /d1โ€

  • cache the file system in /dev/fd0
  • cache the superblock of /dev/fd0
  • cache the root inode of /dev/fd0
  • cache the inode of /d1
  • cache the block of โ€œ/โ€
  • cache the inode of /d1
  • connect the root inode of /dev/fd0 to /d1

After caching the file system of /dev/fd0:

After caching the block of โ€œ/โ€:

After caching the inode of โ€œ/d1โ€ and connecting the new file system with this:

After mounting, the final tree looks like:

The above tree will look as below to the user:

3. process and file system

  • each process has โ€œrootโ€ and โ€œpwdโ€ to access the root of the file system and to access the current working directory, respectively.
    • example
      • p1's root is what p1 thinks as "root"
      • p1's pwd is the current location of p1
      • when p1 says "/aa/bb", the system starts at p1's root for the search
      • when p1 says "aa/bb", the system starts at p1's pwd for the search
    • chroot() changes โ€œrootโ€ to a โ€œnew rootโ€
    • chdir() changes โ€œpwdโ€ to a โ€œnew pwdโ€.
  • each process has โ€œfd tableโ€ for file accessing
  • the system has โ€œfile tableโ€ to control the file accessing by a process
  • the on-mem file system is represented by inode_in_use, super_blocks, hash_table_array

1) file table

  • for each opened file, we have file{} structure (include/linux/fs.h)
    • f_list: next file{}
    • f_dentry: link to the inode (actually dentry{}) of this file
    • f_op : operations on this file{ (open, read, write, ...)
    • f_pos : file read/write pointer. shows how much has been read/written
    • f_count: number of links to this file{}
    • ..........
  • super_block{}->s_files contains a link list of file{} for each file system

2) root, pwd, fd table

  • each process has (in task_struct) -- include/linux/sched.h
struct fs_struct  *fs;
struct files_struct  *files;
struct nsproxy    *nsproxy;  // namespace

struct nsproxy{ // include/linux/nsproxy.h
    struct mnt_namespace *mnt_ns;
    ......
};
struct mnt_namespace{ // include/linux/mnt_namespace.h
    struct vfsmount * root;   // vfsmount of this process
    .........
};
  • fs contains root, pwd info
struct fs_struct{  // include/linux/fs_struct.h
    struct path    root,  // the root inode of the file system
                    pwd;  // the present working directory
    .........
};
struct path { // include/linux/path.h
    struct vfsmount *mnt;
    struct denry *dentry;
};
  • files contains fd table
struct files_struct{ // include/linux/file.h
    struct fdtable *fdt;;
    ...........
};
struct fdtable{
    struct file **fd;  // fd table. file{} pointer array.
    .......
};
  • fork system call copies this fs, files structure, too โ€“ so, the child inherits the root, pwd, and fd table of the parent.

4. file system calls

1) open

x = open(โ€œ/aa/bbโ€, O_RDWR, 00777);
  • meaning: find the inode of /aa/bb and open it
  • algorithm:
    • find the inode of /aa/bb
    • cache into memory
    • connect to file table
      • allocate file{}, y, insert to sb->s_files linklist(sb is the superblock of this process)
      • y->f_dentry = inode of /aa/bb
      • y->f_pos=0
    • find an empty entry in fd table, z, and link to y
      • fd[z] = y
    • return z
  • Example:

2) read

y = read(x, buf, 10)
  • meaning: go to the file pointed to by fd[x] and read 10 bytes into buf with f_op->read()
  • algorithm:
    • go to file{} pointed to by fd[x]
    • go to inode{} pointed to by file{}->f_dentry
    • find the block location we want
    • find the block in hash_table_array
    • if not there, cache the block first
    • read max 10 bytes starting from file{}->f_pos into buf
    • increase file{}->f_pos by actual num of bytes read
    • return the actual num of bytes read

3) write

y = write(x, buf, 10)
  • meaning: go to the file pointed to by fd[x], write max 10 bytes starting from the corresponding f_pos, increase f_pos by the actual num of bytes written, and return the actual num of bytes written.

4) close

close(x);
  • meaning: close the file pointed to by fd[x]
  • algorithm:
    • fd[x]=0
    • file{}->f_count-- , where file{} is the one pointed to by fd[x]

5) lseek

lseek(x, 20, 0)
  • meaning: modify f_pos to 20, where f_pos is the file pointer of file x.
  • example:
x=open(โ€œ/aa/bbโ€, .......);  // open file /aa/bb
read(x, buf, 10);           // read first 10 bytes into โ€œbufโ€
lseek(x, 50, SEEK_SET);     // move f_pos to offset 50
read(x, buf, 10);           // read 10 bytes staring from offset 50

6) dup

y = dup(x);
  • meaning: copy fd[x] into fd[y]
  • example:
x = open(โ€œ/aa/bbโ€, ........);  // fd[x] points to /aa/bb
y = dup(x);               // fd[y] also points to /aa/bb
read(x, buf, 10);    // read first 10 bytes
read(y, buf, 10);    // read next 10 bytes

7) link

y = link(โ€œ/aa/bbโ€, โ€œ/aa/newbbโ€);
  • meaning: /aa/newbb is now pointing to the same file as /aa/bb
  • algorithm:
    • make file newbb in /aa directory
    • give it the same inode as /aa/bb

5. homework

1) Your Gentoo Linux has two disks: /dev/sda3 and /dev/sda1. Which one is the root file system? Where is the mounting point for the other one? Use mount command to answer this.

$ mount

/dev/sda3์€ /์— ์—ฐ๊ฒฐ๋˜์—ˆ๊ณ , /dev/sda1์€ /boot์— ์—ฐ๊ฒฐ๋˜์—ˆ๋‹ค. ๋”ฐ๋ผ์„œ /dev/sda3์€ ๋ฃจํŠธ ํŒŒ์ผ ํŒŒํ‹ฐ์…˜์ด๊ณ , /dev/sda1์€ ๋ถ€ํŒ… ํŒŒํ‹ฐ์…˜์ด๋‹ค.

1-1) Redo 1) after mounting myfd to temp directory as you did in hw3 in lecture6-fs.docx.

$ mkdir temp
$ mount -o loop myfd temp # connect myfd to temp direcotry, which is called mounting
$ mount

/root/linux-2.6.25.10/myfd์€ /root/linux-2.6.25.10/temp์— ์ถ”๊ฐ€๋กœ ์—ฐ๊ฒฐ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

2) Add another entry in /boot/grub/grub.conf as below. This boot selection does not use initrd directive to prevent initramfs loading (initramfs is a temporary in-ram file system used for performance improvement).

/boot/grub/grub.conf์— ์•„๋ž˜์™€ ๊ฐ™์ด entry๋ฅผ ์ถ”๊ฐ€ํ•˜์˜€๋‹ค.

$ vi /boot/grub/grub.conf
title=MyLinux3
root (hd0,0)
kernel /boot/bzImage root=/dev/sda3

๊ทธ ํ›„, ๋ณ€๊ฒฝ์‚ฌํ•ญ์„ ์ปดํŒŒ์ผ ํ•˜๊ณ  ์žฌ๋ถ€ํŒ…์‹œ์ผฐ๋‹ค.

$ cd linux-2.6.25.10
$ make bzImage
$ cp arch/x86/boot/bzImage /boot/bzImage
$ reboot

์žฌ๋ถ€ํŒ… ํ›„์—๋Š” My Linux3๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

3) The kernel calls mount_root to cache the root file system. Starting from start_kernel, find out the chain of intermediate functions that eventually calls mount_root. Confirm your prediction by printing out messge at each intermediate function of this chain until you reach mount_root().

init/main.c - start_kernel :

start_kernel์—์„œ๋Š” rest_init์„ ํ˜ธ์ถœํ•œ๋‹ค.

init/main.c - rest_init :

rest_init์—์„œ๋Š” kernel_init์„ ํ˜ธ์ถœํ•œ๋‹ค.

init/main.c - kernel_init :

kernel_init์—์„œ๋Š” init/do_mounts.c์— ์žˆ๋Š” prepare_namespace์„ ํ˜ธ์ถœํ•œ๋‹ค.

init/do_mounts.c - prepare_namespace :

prepare_namespace์—์„œ๋Š” mount_root์„ ํ˜ธ์ถœํ•œ๋‹ค.

init/do_mounts.c - mount_root :

mount_root๋Š” root file system์„ cachingํ•œ๋‹ค.

4) Find the data type for each added variable for super_block, inode, buffer_head, and dentry.

include/linux/fs.h:

struct super_block {
    struct list_head s_list; /* Keep this first */
    dev_t s_dev;             /* search index; _not_ kdev_t */
    unsigned long s_blocksize;
    unsigned char s_blocksize_bits;
    unsigned char s_dirt;
    unsigned long long s_maxbytes; /* Max file size */
    struct file_system_type *s_type;
    const struct super_operations *s_op;
    struct dquot_operations *dq_op;
    struct quotactl_ops *s_qcop;
    const struct export_operations *s_export_op;
    unsigned long s_flags;
    unsigned long s_magic;
    struct dentry *s_root;
    struct rw_semaphore s_umount;
    struct mutex s_lock;
    ...
};

struct inode {
    struct hlist_node i_hash;
    struct list_head i_list;
    struct list_head i_sb_list;
    struct list_head i_dentry;
    unsigned long i_ino;
    atomic_t i_count;
    unsigned int i_nlink;
    uid_t i_uid;
    gid_t i_gid;
    dev_t i_rdev;
    u64 i_version;
    loff_t i_size;
#ifdef __NEED_I_SIZE_ORDERED
    seqcount_t i_size_seqcount;
#endif
    struct timespec i_atime;
    struct timespec i_mtime;
    struct timespec i_ctime;
    unsigned int i_blkbits;
    blkcnt_t i_blocks;
    unsigned short i_bytes;
    umode_t i_mode;
    spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
    struct mutex i_mutex;
    ...
};

include/linux/buffer_head.h:

struct buffer_head {
    unsigned long b_state;           /* buffer state bitmap (see above) */
    struct buffer_head *b_this_page; /* circular list of page's buffers */
    struct page *b_page;             /* the page this bh is mapped to */

    sector_t b_blocknr; /* start block number */
    size_t b_size;      /* size of mapping */
    char *b_data;       /* pointer to data within the page */

    struct block_device *b_bdev;
    bh_end_io_t *b_end_io;             /* I/O completion */
    void *b_private;                   /* reserved for b_end_io */
    struct list_head b_assoc_buffers;  /* associated with another mapping */
    struct address_space *b_assoc_map; /* mapping this buffer is associated with */
    atomic_t b_count;                  /* users using this buffer_head */
};

include/linux/deache.h:

struct dentry {
    atomic_t d_count;
    unsigned int d_flags;  /* protected by d_lock */
    spinlock_t d_lock;     /* per dentry lock */
    struct inode *d_inode; /* Where the name belongs to - NULL is negative */
    /*
     * The next three fields are touched by __d_lookup.  Place them here
     * so they all fit in a cache line.
     */
    struct hlist_node d_hash; /* lookup hash list */
    struct dentry *d_parent;  /* parent directory */
    struct qstr d_name;

    struct list_head d_lru; /* LRU list */
    /*
     * d_child and d_rcu can share memory
     */
    union {
        struct list_head d_child; /* child of parent list */
        struct rcu_head d_rcu;
    } d_u;
    struct list_head d_subdirs; /* our children */
    struct list_head d_alias;   /* inode alias list */
    unsigned long d_time;       /* used by d_revalidate */
    struct dentry_operations *d_op;
    struct super_block *d_sb; /* The root of the dentry tree */
    void *d_fsdata;           /* fs-specific data */
#ifdef CONFIG_PROFILING
    struct dcookie_struct *d_cookie; /* cookie, if any */
#endif
    int d_mounted;
    unsigned char d_iname[DNAME_INLINE_LEN_MIN]; /* small names */
};

5) Change the kernel such that it displays all superblocks before it calls mount_root and after mount_root. Boot with "My Linux3" to see what happens.

๋ชจ๋“  superblocks๋ฅผ ํ‘œ์‹œํ•˜๊ธฐ ์œ„ํ•ด ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ prepare_namespace ํ•จ์ˆ˜ ์ •์˜ ์ „์— ์ถ”๊ฐ€ํ•ด์ฃผ์—ˆ๋‹ค.

void display_superblocks(){
    struct super_block *sb;
    list_for_each_entry(sb, &super_blocks, s_list) {
        printk("dev name:%s dev maj num:%d dev minor num:%d root ino:%d\n",
                sb->s_id, MAJOR(sb->s_dev), MINOR(sb->s_dev),
                sb->s_root->d_inode->i_ino);
    }
}

๊ทธ๋ฆฌ๊ณ , prepare_namespace ํ•จ์ˆ˜ ์ •์˜ ๋‚ด์—์„œ mount_root ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜๋Š” ๋ถ€๋ถ„์˜ ์•ž๋’ค๋กœ display_superblocks() ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•ด์ฃผ์—ˆ๋‹ค.

๋ณ€๊ฒฝ์‚ฌํ•ญ์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์ปดํŒŒ์ผํ•˜๊ณ , ์žฌ๋ถ€ํŒ…ํ•˜๋ฉฐ ๋ถ€ํŒ… ๋ฉ”์„ธ์ง€๋ฅผ ํ™•์ธํ•˜์˜€๋‹ค.

$ make bzImage
$ cp arch/x86/boot/bzImage /boot/bzImage
$ reboot
# Boot with "My Linux3"
$ dmesg > x
$ vi x

mount_root๊ฐ€ ํ˜ธ์ถœ๋œ ์ดํ›„์—๋Š” "dev name: sda3, dev major num: 8, dev minor num: 3, root ino: 2"๊ฐ€ ๋” ์ถœ๋ ฅ๋œ๋‹ค.

๋””๋ฐ”์ด์Šค ๋ฒˆํ˜ธ๋Š” ๊ฐ ๋””๋ฐ”์ด์Šค์˜ ๊ณ ์œ ๋ฒˆํ˜ธ์ด๋‹ค. "/dev"์— ๊ฐ ๋””๋ฐ”์ด์Šค์˜ ํŒŒ์ผ ์ด๋ฆ„์ด ์ ํ˜€ ์žˆ๊ณ , ls -l์„ ํ†ตํ•ด ๊ฐ ๋””๋ฐ”์ด์Šค์˜ major, minor ๋ฒˆํ˜ธ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. major ๋ฒˆํ˜ธ๋Š” ๊ทธ ๋””๋ฐ”์ด์Šค์˜ ๋ฒˆํ˜ธ์ด๊ณ , minor ๋ฒˆํ˜ธ๋Š” ๊ทธ ๋””๋ฐ”์ด์Šค ์ข…๋ฅ˜ ์•ˆ์—์„œ์˜ ๊ตฌ๋ณ„ ๋ฒˆํ˜ธ๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์œ„ ๋‚ด์šฉ์€ "Documentation/devices.txt"์œผ๋กœ ๊ฐ€๋ฉด ์ž์„ธํ•œ ์ •๋ณด๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

6) Change the kernel such that it displays all cached inodes before it calls mount_root and after mount_root. Boot with "My Linux3" to see what happens.

To display all cached indoes, use below.

extern struct list_head inode_in_use;
void display_all_inodes(){
    struct inode *in;
    list_for_each_entry(in, &inode_in_use, i_list){
        printk("dev maj num:%d dev minor num:%d inode num:%d sb dev:%s\n",
MAJOR(in->i_rdev), MINOR(in->i_rdev), in->i_ino, in->i_sb->s_id);
    }
}

6-1) Modify display_all_inodes such that it can also diplay the file name and file byte size of each file represented by the inode.

6-2) Make a system call that displays file name and file byte size of all inodes in use. Show only the first 100 files. Look at the result with dmesg command.

6-3) Modify your system call in 6-2) so that it can display mounting points. Mount myfd to temp directory and confirm your system call can detect it.

7) The pid=1 process (kernel_init) eventually execs to /sbin/init with run_init_process("/sbin/init"); by calling kernel_execve("/sbin/init", ....) in init/main.c/init_post(). Change the kernel such that it execs to /bin/sh. Boot the kernel, and you will find you cannot access /boot/grub/grub.conf. Explain why.

init/main.c :

์ปค๋„์ด ๋กœ๋“œ๋˜๋ฉด ๋ฉ”๋ชจ๋ฆฌ, ํ”„๋กœ์„ธ์„œ, I/O ๋“ฑ ์—ฌ๋Ÿฌ ํ•˜๋“œ์›จ์–ด๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  ์„ค์ •ํ•œ๋‹ค. ์••์ถ•๋œ initramfs ์ด๋ฏธ์ง€๋ฅผ ๋ฉ”๋ชจ๋ฆฌ์˜ ๋ฏธ๋ฆฌ ์ •ํ•ด์ง„ ์œ„์น˜๋กœ๋ถ€ํ„ฐ ์ฝ์–ด "/sysroot/"์— ์ง์ ‘ ํ’€๊ณ , ๋ชจ๋“  ํ•„์š”ํ•œ ๋“œ๋ผ์ด๋ฒ„๋ฅผ ๋กœ๋“œํ•œ๋‹ค. ๊ทธ ํ›„, ์ปค๋„์€ ๋ฃจํŠธ ์žฅ์น˜๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์ฝ๊ธฐ ์ „์šฉ์œผ๋กœ ๋ฃจํŠธ ํŒŒํ‹ฐ์…˜์„ ๋งˆ์šดํŠธํ•˜๊ณ  ์‚ฌ์šฉ๋˜์ง€ ์•Š๋Š” ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ•ด์ œํ•œ๋‹ค.

์ปค๋„์ด ๋กœ๋“œ๋˜๋ฉด ์‚ฌ์šฉ์ž ํ™˜๊ฒฝ์„ ์„ค์ •ํ•˜๊ธฐ ์œ„ํ•ด "/sbin/init" ํ”„๋กœ๊ทธ๋žจ์„ ์‹คํ–‰ํ•œ๋‹ค. "/sbin/init" ํ”„๋กœ๊ทธ๋žจ์€ ์ตœ์ƒ์œ„ ํ”„๋กœ์„ธ์Šค(pid = 1)๋กœ, ๋‚˜๋จธ์ง€ ๋ถ€ํŠธ ํ”„๋กœ์„ธ์Šค๋ฅผ ์ฃผ๊ด€ํ•˜๋ฉฐ ์‚ฌ์šฉ์ž๋ฅผ ์œ„ํ•œ ํ™˜๊ฒฝ์„ ์„ค์ •ํ•˜๋Š” ์—ญํ• ์„ ํ•œ๋‹ค.

"/sbin/init"๋Š” ํŒŒ์ผ ์‹œ์Šคํ…œ์˜ ๊ตฌ์กฐ๋ฅผ ๊ฒ€์‚ฌํ•˜๊ณ , ์‹œ์Šคํ…œ์„ ๋งˆ์šดํŠธํ•˜๊ณ , ์„œ๋ฒ„ ๋ฐ๋ชฌ์„ ๋„์šฐ๊ณ , ์‚ฌ์šฉ์ž ๋กœ๊ทธ์ธ์„ ๊ธฐ๋‹ค๋ฆฌ๋Š” ๋“ฑ์˜ ์—ญํ• ์„ ํ•œ๋‹ค. ๋งŒ์•ฝ "/sbin/init"์„ ์‹คํ–‰ํ•˜์ง€ ์•Š๊ณ  "/bin/sh"๋ฅผ ์‹คํ–‰ํ•˜๋ฉด, "/dev/sda1"๊ฐ€ "/boot"์— ์—ฐ๊ฒฐ๋˜์ง€ ์•Š์„ ๊ฒƒ์ด๋‹ค.

8) Try following code. Make /aa/bb and type some text with length longer than 50 bytes. Explain the result.

$ cd /      # cd /๋กœ /์— ๊ฐ€์„œ
$ mkdir aa  # mkdir aa๋กœ /aa ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ๋งŒ๋“ค๊ณ 
$ cd aa     # cd aa๋กœ aa์— ์ด๋™ํ•ด์„œ
$ vi bb     # vi bb๋กœ /aa/bb๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

/aa/bb

$ vi ex1.c

ex1.c :

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int main(void)
{
    char buf[100];
    int x = open("/aa/bb", O_RDONLY, 00777);
    int y = read(x, buf, 10);
    buf[y] = '\0';
    printf("we read %s\n", buf);

    lseek(x, 20, SEEK_SET);
    y = read(x, buf, 10);
    buf[y] = '\0';
    printf("we read %s\n", buf);

    int x1 = dup(x);
    y = read(x1, buf, 10);
    buf[y] = '\0';
    printf("we read %s\n", buf);

    link("/aa/bb", "/aa/newbb");
    int x2 = open("/aa/newbb", O_RDONLY, 00777);
    y = read(x2, buf, 10);
    buf[y] = '\0';
    printf("we read %s\n", buf);

    return 0;
}

์ฒซ๋ฒˆ์งธ printf ์‹œ์ ์˜ buf๋Š” /aa/bb์˜ 10๋ฐ”์ดํŠธ๋งŒํผ readํ•œ ๊ฒƒ์ด๋ฏ€๋กœ "0123456789"๊ฐ€ ์ถœ๋ ฅ๋˜์—ˆ๋‹ค.

๋‘๋ฒˆ์งธ printf ์‹œ์ ์— x์˜ ํฌ์ธํ„ฐ๋Š” lseek๋ฅผ ํ†ตํ•ด ๋ฌธ์ž์—ด์˜ ํ˜„์žฌ ์œ„์น˜๋กœ๋ถ€ํ„ฐ 20๋ฐ”์ดํŠธ๋งŒํผ ์˜ฎ๊ฒจ์กŒ๋‹ค. ์ด ๋•Œ์˜ ์ถœ๋ ฅ๋˜๋Š” buf๋Š” ํŒŒ์ผ์˜ 20๋ฐ”์ดํŠธ๋ถ€ํ„ฐ 10๋ฐ”์ดํŠธ๋งŒํผ readํ•œ ๊ฒƒ์ด๋ฏ€๋กœ "9876543210"์ด ์ถœ๋ ฅ๋˜์—ˆ๋‹ค.

์„ธ๋ฒˆ์งธ printf ์‹œ์ ์—๋Š” x1์ด dup๋ฅผ ํ†ตํ•ด x๋กœ๋ถ€ํ„ฐ ๋ณต์ œ๋˜์—ˆ๋‹ค. ์ด ๋•Œ์˜ ์ถœ๋ ฅ๋˜๋Š” buf๋Š” ๋‘๋ฒˆ์งธ printf ์‹œ์ ์—์„œ ๋งˆ์ง€๋ง‰์œผ๋กœ ์ฝ์€ ์œ„์น˜์˜ ๋‹ค์Œ ์œ„์น˜๋ถ€ํ„ฐ 10๋ฐ”์ดํŠธ๋งŒํผ readํ•œ ๊ฒƒ์ด๋ฏ€๋กœ "klmnopqrst"๊ฐ€ ์ถœ๋ ฅ๋˜์—ˆ๋‹ค.

๋„ค๋ฒˆ์งธ printf ์‹œ์ ์— link๋ฅผ ํ†ตํ•ด /aa/newbb๊ฐ€ ๊ฐ™์€ ํŒŒ์ผ์ธ /aa/bb๋ฅผ ๊ฐ€๋ฆฌํ‚ค๊ฒŒ ๋˜์—ˆ๋‹ค. buf๋Š” ์ƒˆ๋กœ์šด /aa/newbb์˜ 10๋ฐ”์ดํŠธ๋งŒํผ readํ•œ ๊ฒƒ์ด๋ฏ€๋กœ "0123456789"๊ฐ€ ์ถœ๋ ฅ๋˜์—ˆ๋‹ค.

9) Check the inode number of /aa/bb and /aa/newbb and confirm they are same.

$ ls โ€“i /aa/*

/aa/bb์™€ /aa/newbb์˜ inode number๋Š” "502947"๋กœ ๋™์ผํ•œ ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค.

10) Try fork() and confirm the parent and child can access the same file.

ex2.c :

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int main(void)
{
    char buf[100];
    int x = open("/aa/bb", O_RDONLY, 00777);
    int y = fork();
    int z;

    if (y == 0)
    {
        z = read(x, buf, 10);
        buf[z] = '\0';
        printf("child read %s\n", buf);
    }
    else
    {
        z = read(x, buf, 10);
        buf[z] = '\0';
        printf("parent read %s\n", buf);
    }

    return 0;
}

parent์™€ child๊ฐ€ ๋™์ผํ•œ ํŒŒ์ผ์— ์ ‘๊ทผํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ํ”„๋กœ์„ธ์Šค๊ฐ€ fork๋˜๋ฉด x์˜ f_pos๊ฐ€ ์ €์žฅ๋˜๋Š” ์œ„์น˜๋„ ๊ฐ™์ด ๋ณต์‚ฌ๋˜๋ฏ€๋กœ ๋‘ ํ”„๋กœ์„ธ์Šค๊ฐ€ ์ด๋ฅผ ๊ณต์œ ํ•˜๊ฒŒ ๋œ๋‹ค. ๋”ฐ๋ผ์„œ parent๋Š” child๊ฐ€ ์ฝ์—ˆ๋˜ ๋ถ€๋ถ„๋ถ€ํ„ฐ ๊ณ„์† ์ฝ๊ฒŒ ๋œ๋‹ค.

11) (Using chroot and chdir) Do following and explain the result of ex1.

a. Make f1 in several places with different content (in /, in /root, and in /root/d1) as follows.

$ cd  /
$ echo hello1 > f1
$ cd
$ echo hello2 > f1
$ mkdir d1
$ echo hello3 > d1/f1

b. Make ex3.c that will display "/f1" before and after chroot, and "f1" before and after chdir as follows.

ex3.c :

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

void display_root_f1(void) // display the content of "/f1"
{
    char buf[100];
    int x = open("/f1", O_RDONLY);
    int y = read(x, buf, 100);
    buf[y] = '\0';
    printf("%s\n", buf);
}

void display_f1(void)      // display the content of "f1"
{
    char buf[100];
    int x = open("f1", O_RDONLY);
    int y = read(x, buf, 100);
    buf[y] = '\0';
    printf("%s\n", buf);
}

int main(void)
{
    display_root_f1(); // display the content of "/f1"
    chroot(".");
    display_root_f1(); // display the content of "/f1"
    display_f1();      // display the content of "f1"
    chdir("d1");
    display_f1();      // display the content of "f1"
    return 0;
}

  • ์ฒซ display_root_f1์€ cd /๋กœ ์ด๋™ํ•ด์„œ ๋งŒ๋“  f1์˜ ๋‚ด์šฉ์„ ๋ณด์—ฌ์ค€๋‹ค.
  • chroot(".")๋ฅผ ํ†ตํ•ด ํ˜„์žฌ ๋””๋ ‰ํ† ๋ฆฌ๋กœ root๊ฐ€ ๋ณ€๊ฒฝ์ด ๋˜๋Š”๋ฐ, ํ˜„์žฌ ๋””๋ ‰ํ† ๋ฆฌ๋Š” ํ™ˆ ๋””๋ ‰ํ† ๋ฆฌ์ด๋‹ค.
  • root๊ฐ€ ๋ณ€๊ฒฝ๋œ ์ดํ›„๋กœ ๋‹ค์‹œ display_root_f1์„ ์‹คํ–‰ํ•˜๋ฉด ํ˜„์žฌ ๋””๋ ‰ํ† ๋ฆฌ๊ฐ€ root์ด๋ฏ€๋กœ ํ˜„์žฌ ๋””๋ ‰ํ† ๋ฆฌ์— ์žˆ๋Š” f1์˜ ๋‚ด์šฉ์ด ์ถœ๋ ฅ๋˜๋ฏ€๋กœ hello2๊ฐ€ ์ถœ๋ ฅ์ด ๋œ๋‹ค.
  • ์ฒซ display_f1์€ ํ˜„์žฌ ๋””๋ ‰ํ† ๋ฆฌ์˜ f1์˜ ๋‚ด์šฉ์ด ์ถœ๋ ฅ๋˜๋ฏ€๋กœ ๋˜‘๊ฐ™์ด hello2๊ฐ€ ์ถœ๋ ฅ๋œ๋‹ค.
  • chdir("d1")์œผ๋กœ ํ˜„์žฌ ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ d1์„ ๋ฐ”๊พผ ๋’ค ์‹คํ–‰ํ•˜๋ฉด, d1 ์•ˆ์ชฝ์— ๋งŒ๋“  f1์ด ์ถœ๋ ฅ๋˜๋ฏ€๋กœ hello3์ด ์ถœ๋ ฅ๋œ๋‹ค.

12) Make a new system call, my_show_fpos(), which will display the current process ID and the file position for fd=3 and fd=4 of the current process. Use this system call to examine file position as follows. (Use %lld to print the file position since f_pos is long long integer)

arch/x86/kernel/syscall_table_32.S :

56๋ฒˆ์— my_show_fpos ์‹œ์Šคํ…œ ์ฝœ์„ ๋“ฑ๋กํ•ด์ค€๋‹ค.

fs/read_write.c :

asmlinkage void my_show_fpos(void)
{
    printk("fd=3, f_pos=%lld\n", current->files->fdt->fd[3]->f_pos);
    printk("fd=4, f_pos=%lld\n", current->files->fdt->fd[4]->f_pos);
}

ex4.c :

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

void my_show_fpos()
{
    syscall(56);
}

int main(void)
{
    char buf[25];
    int x = open("f1", O_RDONLY);
    int y = open("f2", O_RDONLY);

    my_show_fpos(); // f_pos right after opening two files
    read(x, buf, 10);
    read(y, buf, 20);
    my_show_fpos(); // f_pos after reading some bytes

    return 0;
}
$ make bzImage
$ cp arch/x86/boot/bzImage /boot/bzImage
$ reboot
# after reboot
$ echo 8 > /proc/sys/kernel/printk
$ ./ex4

x์™€ y๋Š” ๊ฐ๊ฐ ํŒŒ์ผ ๋””์Šคํฌ๋ฆฝํ„ฐ 3๊ณผ 4๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ๊ฐ๊ฐ 10๊ธ€์ž, 20๊ธ€์ž๋ฅผ ์ฝ์—ˆ์œผ๋ฏ€๋กœ f_pos๊ฐ€ 0์—์„œ 10์ด, 10์—์„œ 20์ด ๋˜์—ˆ๋‹ค.

13) Modify your my_show_fpos() such that it also displays the address of f_op->read and f_op->write function for fd 0, fd 1, fd 2, fd 3, and fd 4, respectively. Find the corresponding function names in System.map. Why the system uses different functions for fd 0, 1, 2 and fd 3 or 4?

fs/read_write.c :

asmlinkage void my_show_fpos(void)
{
    printk("fd=3, f_pos=%lld\n", current->files->fdt->fd[3]->f_pos);
    printk("fd=4, f_pos=%lld\n", current->files->fdt->fd[4]->f_pos);

    // Update
    int i;
    for(i = 0; i < 5; i++) {
        printk("fd=%d, read=%p\n", i, current->files->fdt->fd[i]->f_op->read);
        printk("fd=%d, write=%p\n", i, current->files->fdt->fd[i]->f_op->write);
    }
}
$ make bzImage
$ cp arch/x86/boot/bzImage /boot/bzImage
$ reboot
# after reboot
$ echo 8 > /proc/sys/kernel/printk
$ ./ex4

read์™€ write ํ•จ์ˆ˜์˜ ์ฃผ์†Œ๋ฅผ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ–ˆ๋‹ค. ์ถœ๋ ฅ๋œ ์ฃผ์†Œ๋ฅผ ๋ฆฌ๋ˆ…์Šค ์ฝ”๋“œ์˜ System.map์—์„œ ์ฐพ์•„๋ณด๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ๋‚˜์˜จ๋‹ค. System.map์€ ์ปดํŒŒ์ผํ•  ๋•Œ๋งˆ๋‹ค ๋ฆฌ๋ˆ…์Šค ์ฝ”๋“œ ๋””๋ ‰ํ† ๋ฆฌ์— ์ƒ์„ฑ๋œ๋‹ค.

14) Use my_show_fpos() to explain the result of the following code. File f1 has โ€œabโ€ and File f2 has โ€œqโ€. When you run the program, File f2 will have โ€œbaโ€. Explain why f2 have โ€œbaโ€ after the execution.

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

void my_show_fpos()
{
    syscall(56);
}

int main(void)
{
    char buf[10];
    int f1 = open("./f1", O_RDONLY, 00777);
    int f2 = open("./f2", O_WRONLY, 00777);
    printf("f1 and f2 are %d %d\n", f1, f2); // make sure they are 3 and 4
    if (fork() == 0)
    {
        my_show_fpos();
        read(f1, buf, 1);
        sleep(2);
        my_show_fpos();
        write(f2, buf, 1);
    }
    else
    {
        sleep(1);
        my_show_fpos();
        read(f1, buf, 1);
        write(f2, buf, 1);
    }

    return 0;
}



fork์— ์˜ํ•ด f_pos๋ฅผ ๊ณต์œ ํ•˜๋Š” ํ”„๋กœ์„ธ์Šค 2๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด์ง„๋‹ค. ๊ฐ€์žฅ ๋จผ์ € ์ž์‹ ํ”„๋กœ์„ธ์Šค์—์„œ f1๊ณผ f2 ์ดˆ๊ธฐ ์ƒํƒœ๋ฅผ ์ถœ๋ ฅํ•˜๊ณ  ๋‘˜ ๋‹ค f_pos๋Š” 0์ด๋‹ค. ๊ทธํ›„ "f1" ํŒŒ์ผ์„ ์ฝ์–ด buf์— ์ €์žฅํ•œ๋‹ค. ํ˜„์žฌ buf์—์„œ๋Š” ['a']๊ฐ€ ์ €์žฅ๋˜์–ด ์žˆ๋‹ค.

์ž์‹ ํ”„๋กœ์„ธ์Šค๊ฐ€ 2์ดˆ๊ฐ„ ๋Œ€๊ธฐํ•˜๋Š” ์‚ฌ์ด์—, ๋ถ€๋ชจ ํ”„๋กœ์„ธ์Šค๋Š” f1๊ณผ f2 ์ƒํƒœ๋ฅผ ์ถœ๋ ฅํ•˜๊ณ  ์ด๋•Œ f1์˜ f_pos๊ฐ€ ์ฝ์€ ๊ธ€์ž ์ˆ˜๋งŒํผ ์ฆ๊ฐ€ํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‹ค์‹œ ํ•œ ๊ธ€์ž ์ฝ์–ด buf์— ์ €์žฅํ•˜๋ฉด buf์—๋Š” ['b']๊ฐ€ ์ €์žฅ๋˜๊ฒŒ ๋œ๋‹ค. ๋‘ ํ”„๋กœ์„ธ์Šค ์‚ฌ์ด์— buf์™€ ๊ฐ™์€ ์ง€์—ญ๋ณ€์ˆ˜๋Š” ๊ณต์œ ๋˜์ง€ ์•Š๋Š”๋‹ค.

๋ถ€๋ชจ ํ”„๋กœ์„ธ์Šค์˜ buf๋ฅผ "f2"์— ์ €์žฅํ•˜๊ณ , 1์ดˆ ํ›„ ์ž์‹ ํ”„๋กœ์„ธ์Šค์˜ buf๋ฅผ "f2"์— ์ €์žฅํ•˜๋ฉด "f2"๋Š” "ba"๊ฐ€ ๋œ๋‹ค.

15) Find corresponding kernel code for each step below in open and read system calls:

  • x=open(fpath, .......);
      1. find empty fd
      1. search the inode for "fpath"
      • 2-1) if "fpath" starts with "/", start from "fs->root" of the current process
      • 2-2) otherwise, start from "fs->pwd"
      • 2-3) visit each directory in "fpath" to find the inode of the "fpath"
      • 2-4) while following mounted file path if it is a mounting point.
      1. find empty file{} entry and fill-in relevant information.
      1. chaining
      1. return fd
  • read(x, buf, n);
      1. go to the inode for x
      1. read n bytes starting from the current file position
      1. save the data in buf
      1. increase the file position by n

16) Make a file, /f1. Write some text in it.

$ cd /
$ vi f1
..........
$

Try to read this file before โ€œmount_rootโ€, after โ€œmount_rootโ€, after sys_mount(โ€œ.โ€, โ€œ/โ€, ...), and after sys_chroot(โ€œ.โ€) in init/do_mounts.c/prepare_namespace(). Explain what happens and why. For this problem, the kernel_init process should exec to /sbin/init.