实验地址

Part A: User Environments and Exception Handling

通过 inc/env.h 得知一个运行环境 Env 定义:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
struct Env {
  struct Trapframe env_tf;	// Saved registers
  struct Env *env_link;		// Next free Env
  envid_t env_id;			// Unique environment identifier
  envid_t env_parent_id;		// env_id of this env's parent
  enum EnvType env_type;		// Indicates special system environments
  unsigned env_status;		// Status of the environment
  uint32_t env_runs;		// Number of times environment has run

  // Address space
  pde_t *env_pgdir;		// Kernel virtual address of page dir
};

运行环境 env_status 有如下状态

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
enum {
  ENV_FREE = 0,  // Indicates that the Env structure is inactive,
                 // and therefore on the env_free_list.
  ENV_DYING,  // Indicates that the Env structure represents
              // an environment that is waiting to run on the processor.
  ENV_RUNNABLE,  // Indicates that the Env structure represents the
                 // currently running environment.
  ENV_RUNNING,  // Indicates that the Env structure represents a
                // currently active environment, but it is not
                // currently ready to run: for example, because
                // it is waiting for an interprocess communication (IPC)
                // from another environment.
  ENV_NOT_RUNNABLE  // Indicates that the Env structure represents
                    // a zombie environment. A zombie environment
                    // will be freed the next time it traps to the kernel.
                    // We will not use this flag until Lab 4.
};

而整个运行环境都是通过一个链表来维护

1
2
3
struct Env *envs = NULL;		// All environments
struct Env *curenv = NULL;		// The current env
static struct Env *env_free_list;	// Free environment list

Allocating the Environments Array

Exercise 1

根据注释对 envs 进行初始化

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
void
mem_init(void)
{
  // ...
  //////////////////////////////////////////////////////////////////////
  // Make 'envs' point to an array of size 'NENV' of 'struct Env'.
  // LAB 3: Your code here.
  envs = (struct Env*)boot_alloc(NENV * sizeof(struct Env));
  memset(envs, 0, NENV * sizeof(struct Env));

  // ...

  //////////////////////////////////////////////////////////////////////
  // Map the 'envs' array read-only by the user at linear address UENVS
  // (ie. perm = PTE_U | PTE_P).
  // Permissions:
  //    - the new image at UENVS  -- kernel R, user R
  //    - envs itself -- kernel RW, user NONE
  // LAB 3: Your code here.
  boot_map_region(kern_pgdir, UENVS, PTSIZE, PADDR(envs), PTE_U|PTE_P);

  // ...
}

在重新编译并且运行后报下面的错误:

1
kernel panic at kern/env.c:154: PADDR called with invalid kva 00000000

memset() 莫名其妙被清空,以为是自己的代码问题,debug 了好久,最后通过 大佬方案 得以解决:

修改 kern/kernel.ld 内 .bss

1
2
3
4
5
6
7
.bss : {
    PROVIDE(edata = .);
    *(.dynbss)
    *(.bss .bss.*)
    *(COMMON)
    PROVIDE(end = .);
}

重新编译和运行后就能看到如下结果:

1
2
3
4
5
6
7
check_page_free_list() succeeded!
check_page_alloc() succeeded!
check_page() succeeded!
check_kern_pgdir() succeeded!
check_page_free_list() succeeded!
check_page_installed_pgdir() succeeded!
kernel panic at kern/env.c:461: env_run not yet implemented

Creating and Running Environments

因为 JOS 还没有文件系统,所以目前将可执行程序以 ELF 可执行镜像方式嵌入至内核中

Exercise 2

  • env_init()

    根据注释对 envs 进行初始化并插入 env_free_list 中,由于需要保持相同的次序,所以这里使用头插法插入 env_free_list

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    
    // Mark all environments in 'envs' as free, set their env_ids to 0,
    // and insert them into the env_free_list.
    // Make sure the environments are in the free list in the same order
    // they are in the envs array (i.e., so that the first call to
    // env_alloc() returns envs[0]).
    //
    void
    env_init(void)
    {
      // Set up envs array
      // LAB 3: Your code here.
      int i;
      env_free_list = NULL;
      for (i = NENV - 1; i >= 0; --i) {
        envs[i].env_id = 0;
        envs[i].env_status = ENV_FREE;
        envs[i].env_link = env_free_list;
        env_free_list = &envs[i];
      }
    
      // Per-CPU part of the initialization
      env_init_percpu();
    }
  • env_setup_vm()

    env_setup_vm() 主要是为新环境初始化一个 page directory

    注释里面提示到:把 kern_pgdir 作为一个 template ,其实就是直接复制过去就行…

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    
    //
    // Initialize the kernel virtual memory layout for environment e.
    // Allocate a page directory, set e->env_pgdir accordingly,
    // and initialize the kernel portion of the new environment's address space.
    // Do NOT (yet) map anything into the user portion
    // of the environment's virtual address space.
    //
    // Returns 0 on success, < 0 on error.  Errors include:
    //  -E_NO_MEM if page directory or table could not be allocated.
    //
    static int
    env_setup_vm(struct Env *e)
    {
      int i;
      struct PageInfo *p = NULL;
    
      // Allocate a page for the page directory
      if (!(p = page_alloc(ALLOC_ZERO)))
        return -E_NO_MEM;
    
      // Now, set e->env_pgdir and initialize the page directory.
      //
      // Hint:
      //    - The VA space of all envs is identical above UTOP
      //    (except at UVPT, which we've set below).
      //    See inc/memlayout.h for permissions and layout.
      //    Can you use kern_pgdir as a template?  Hint: Yes.
      //    (Make sure you got the permissions right in Lab 2.)
      //    - The initial VA below UTOP is empty.
      //    - You do not need to make any more calls to page_alloc.
      //    - Note: In general, pp_ref is not maintained for
      //    physical pages mapped only above UTOP, but env_pgdir
      //    is an exception -- you need to increment env_pgdir's
      //    pp_ref for env_free to work correctly.
      //    - The functions in kern/pmap.h are handy.
    
      // LAB 3: Your code here.
      e->env_pgdir = (pte_t *)page2kva(p);
      p->pp_ref++;
      memcpy(e->env_pgdir, kern_pgdir, PGSIZE);
    
      // UVPT maps the env's own page table read-only.
      // Permissions: kernel R, user R
      e->env_pgdir[PDX(UVPT)] = PADDR(e->env_pgdir) | PTE_P | PTE_U;
    
      return 0;
    }
  • region_alloc()

    region_alloc() 主要把物理内存映射到环境内

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    
    //
    // Allocate len bytes of physical memory for environment env,
    // and map it at virtual address va in the environment's address space.
    // Does not zero or otherwise initialize the mapped pages in any way.
    // Pages should be writable by user and kernel.
    // Panic if any allocation attempt fails.
    //
    static void
    region_alloc(struct Env *e, void *va, size_t len)
    {
      // LAB 3: Your code here.
      // (But only if you need it for load_icode.)
      //
      // Hint: It is easier to use region_alloc if the caller can pass
      //   'va' and 'len' values that are not page-aligned.
      //   You should round va down, and round (va + len) up.
      //   (Watch out for corner-cases!)
    
      uintptr_t va_start = ROUNDDOWN((uintptr_t)va, PGSIZE);
      uintptr_t va_end = ROUNDUP((uintptr_t)(va + len), PGSIZE);
      uintptr_t cur_va;
    
      for (cur_va = va_start; cur_va < va_end; cur_va += PGSIZE) {
        struct PageInfo *p;
    
        if (!(p = page_alloc(ALLOC_ZERO))) {
          panic("region_alloc: page_alloc failed.\n");
        }
    
        if (page_insert(e->env_pgdir, p, (void *)cur_va, PTE_U | PTE_W | PTE_P)) {
          panic("region_alloc: page_insert failed.\n");
        }
      }
    
    }
  • load_icode()

    整个流程主要参考了 boot/main.c 的 bootmain()lab3 抢占式调度,而由于在读取程序 segment 的时候需要加载到用户环境中,而根据 env_run()env_pop_tf() 两个函数的注释可以知道用 lcr3() 进行切换

    最后还需要指定环境入口点,参考了 bootmain() 内相关语句进行了修改

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    
    //
    // Set up the initial program binary, stack, and processor flags
    // for a user process.
    // This function is ONLY called during kernel initialization,
    // before running the first user-mode environment.
    //
    // This function loads all loadable segments from the ELF binary image
    // into the environment's user memory, starting at the appropriate
    // virtual addresses indicated in the ELF program header.
    // At the same time it clears to zero any portions of these segments
    // that are marked in the program header as being mapped
    // but not actually present in the ELF file - i.e., the program's bss section.
    //
    // All this is very similar to what our boot loader does, except the boot
    // loader also needs to read the code from disk.  Take a look at
    // boot/main.c to get ideas.
    //
    // Finally, this function maps one page for the program's initial stack.
    //
    // load_icode panics if it encounters problems.
    //  - How might load_icode fail?  What might be wrong with the given input?
    //
    static void
    load_icode(struct Env *e, uint8_t *binary)
    {
      // Hints:
      //  Load each program segment into virtual memory
      //  at the address specified in the ELF segment header.
      //  You should only load segments with ph->p_type == ELF_PROG_LOAD.
      //  Each segment's virtual address can be found in ph->p_va
      //  and its size in memory can be found in ph->p_memsz.
      //  The ph->p_filesz bytes from the ELF binary, starting at
      //  'binary + ph->p_offset', should be copied to virtual address
      //  ph->p_va.  Any remaining memory bytes should be cleared to zero.
      //  (The ELF header should have ph->p_filesz <= ph->p_memsz.)
      //  Use functions from the previous lab to allocate and map pages.
      //
      //  All page protection bits should be user read/write for now.
      //  ELF segments are not necessarily page-aligned, but you can
      //  assume for this function that no two segments will touch
      //  the same virtual page.
      //
      //  You may find a function like region_alloc useful.
      //
      //  Loading the segments is much simpler if you can move data
      //  directly into the virtual addresses stored in the ELF binary.
      //  So which page directory should be in force during
      //  this function?
      //
      //  You must also do something with the program's entry point,
      //  to make sure that the environment starts executing there.
      //  What?  (See env_run() and env_pop_tf() below.)
    
      // LAB 3: Your code here.
      struct Proghdr *ph, *eph;
      struct Elf *ELFHDR = (struct Elf *)binary;
    
      // is this a valid ELF?
      if (ELFHDR->e_magic != ELF_MAGIC) {
        panic("load_icode: ELF binary image error.");
      }
    
      // load each program segment (ignores ph flags)
      ph = (struct Proghdr *)((uint8_t *)ELFHDR + ELFHDR->e_phoff);
      eph = ph + ELFHDR->e_phnum;
    
      // switch to user mode
      lcr3(PADDR(e->env_pgdir));
    
      for (; ph < eph; ph++) {
        // load segments with ph->p_type == ELF_PROG_LOAD.
        if (ph->p_type != ELF_PROG_LOAD) {
          continue;
        }
    
        region_alloc(e, (void *)(ph->p_va), ph->p_memsz);
    
        memcpy((void *)ph->p_va, (void *)binary + ph->p_offset, (size_t)(ph->p_filesz));
    
        // Any remaining memory bytes should be cleared to zero.
        // (The ELF header should have ph->p_filesz <= ph->p_memsz.)
    
        if (ph->p_filesz < ph->p_memsz) {
          memset((void *)ph->p_va + ph->p_filesz, 0,
                 ph->p_memsz - ph->filesz);
        }
      }
    
      // switch to kernel mode
      lcr3(PADDR(kern_pgdir));
    
      // switch the entry point from the ELF header
      e->env_tf.tf_eip = elf->e_entry;
    
      // Now map one page for the program's initial stack
      // at virtual address USTACKTOP - PGSIZE.
    
      // LAB 3: Your code here.
      region_alloc(e, (void *)USTACKTOP - PGSIZE, PGSIZE);
    }
  • env_create()

    env_create() 主要申请一个新的环境并加载二进制文件,根据注释写就行

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
    //
    // Allocates a new env with env_alloc, loads the named elf
    // binary into it with load_icode, and sets its env_type.
    // This function is ONLY called during kernel initialization,
    // before running the first user-mode environment.
    // The new env's parent ID is set to 0.
    //
    void
    env_create(uint8_t *binary, enum EnvType type)
    {
      // LAB 3: Your code here.
      struct Env *env;
      if (env_alloc(&env, 0) < 0) {
        panic("env_create: env_alloc error.");
      }
    
      load_icode(env, binary);
      env->env_type = type;
    }
  • env_run()

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    
    //
    // Context switch from curenv to env e.
    // Note: if this is the first call to env_run, curenv is NULL.
    //
    // This function does not return.
    //
    void
    env_run(struct Env *e)
    {
      // Step 1: If this is a context switch (a new environment is running):
      //       1. Set the current environment (if any) back to
      //          ENV_RUNNABLE if it is ENV_RUNNING (think about
      //          what other states it can be in),
      //       2. Set 'curenv' to the new environment,
      //       3. Set its status to ENV_RUNNING,
      //       4. Update its 'env_runs' counter,
      //       5. Use lcr3() to switch to its address space.
      // Step 2: Use env_pop_tf() to restore the environment's
      //       registers and drop into user mode in the
      //       environment.
    
      // Hint: This function loads the new environment's state from
      //    e->env_tf.  Go back through the code you wrote above
      //    and make sure you have set the relevant parts of
      //    e->env_tf to sensible values.
    
      // LAB 3: Your code here.
    
      if (curenv && curenv->env_status == ENV_RUNNING) {
        curenv->env_status = ENV_RUNNABLE;
      }
    
      curenv = e;
      curenv->env_status = ENV_RUNNING;
      curenv->env_runs++;
      lcr3(PADDR(curenv->env_pgdir));
    
      env_pop_tf(&curenv->env_tf);
    
      panic("env_run not yet implemented");
    }
  • 验证

    实现 Exercise 2 练习后直接 crash 了

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    
    ***
    *** Use Ctrl-a x to exit qemu
    ***
    qemu-system-i386 -nographic -drive file=obj/kern/kernel.img,index=0,media=disk,format=raw -serial mon:stdio -gdb tcp::25000 -D qemu.log
    6828 decimal is 15254 octal!
    Physical memory: 131072K available, base = 640K, extended = 130432K
    check_page_free_list() succeeded!
    check_page_alloc() succeeded!
    check_page() succeeded!
    check_kern_pgdir() succeeded!
    check_page_free_list() succeeded!
    check_page_installed_pgdir() succeeded!
    [00000000] new env 00001000
    EAX=00000000 EBX=00000000 ECX=0000000d EDX=eebfde88
    ESI=00000000 EDI=00000000 EBP=eebfde60 ESP=eebfde54
    EIP=00800bc3 EFL=00000092 [--S-A--] CPL=3 II=0 A20=1 SMM=0 HLT=0
    ES =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
    CS =001b 00000000 ffffffff 00cffa00 DPL=3 CS32 [-R-]
    SS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
    DS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
    FS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
    GS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
    LDT=0000 00000000 00000000 00008200 DPL=0 LDT
    TR =0028 f018db80 00000067 00408900 DPL=0 TSS32-avl
    GDT=     f011b300 0000002f
    IDT=     f018d360 000007ff
    CR0=80050033 CR2=00000000 CR3=003bc000 CR4=00000000
    DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
    DR6=ffff0ff0 DR7=00000400
    EFER=0000000000000000
    Triple fault.  Halting for inspection via QEMU monitor.
    QEMU: Terminated

    发生 crash 的原因是:由于 CPU 无法处理 hello 二进制中的 init 指令,CPU 无法处理系统终端,从而引发保护异常,但是该异常也无法处理从而又引发二次错误异常,但二次错误异常也无法处理,所以 CPU 放弃处理从而引导至 Triple fault

    通过 make gdbenv_pop_tf 设置断点,单步执行,观察能否进入 hello.asm 并执行至 sys_cputs() 内的 int $0x30 指令

    通过 GDB 看到相关记录

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    
    (gdb) b env_pop_tf
    Breakpoint 1 at 0xf010390c: file kern/env.c, line 496.
    (gdb) c
    Continuing.
    The target architecture is assumed to be i386
    => 0xf010390c <env_pop_tf>:     push   %ebp
    
    Breakpoint 1, env_pop_tf (tf=0xf01d0000) at kern/env.c:496
    496     {
    (gdb) s
    => 0xf010391e <env_pop_tf+18>:  mov    0x8(%ebp),%esp
    497             asm volatile(
    (gdb) si
    => 0xf0103921 <env_pop_tf+21>:  popa
    0xf0103921      497             asm volatile(
    (gdb)
    => 0xf0103922 <env_pop_tf+22>:  pop    %es
    0xf0103922 in env_pop_tf (
        tf=<error reading variable: Unknown argument list address for `tf'.>) at kern/env.c:497
    497             asm volatile(
    (gdb)
    => 0xf0103923 <env_pop_tf+23>:  pop    %ds
    0xf0103923      497             asm volatile(
    (gdb)
    => 0xf0103924 <env_pop_tf+24>:  add    $0x8,%esp
    0xf0103924      497             asm volatile(
    (gdb)
    => 0xf0103927 <env_pop_tf+27>:  iret
    0xf0103927      497             asm volatile(
    (gdb)
    => 0x800020:    cmp    $0xeebfe000,%esp # 0xeebfe000 USTACKTOP
    0x00800020 in ?? ()
    (gdb)
    => 0x800026:    jne    0x80002c
    0x00800026 in ?? ()
    (gdb)
    => 0x800028:    push   $0x0
    0x00800028 in ?? ()
    (gdb)
    => 0x80002a:    push   $0x0
    0x0080002a in ?? ()

Handling Interrupts and Exceptions

处理中断和异常主要参考 Chapter 9, Exceptions and Interrupts

Basics of Protected Control Transfer

保护性控制转移的两种机制:

The Interrupt Descriptor Table(中断描述符表)
CPU 通过读取中断描述符表(IDT)对不同情况进行处理:
  • 把指向处理异常的内核代码指针加载至 EIP
  • 把特权值 0~1 加载到 CS
The Task State Segment(任务状态表)
CPU 用于保存中断异常前的旧的 CPU 寄存器情况,比如原始的 EIPCS 等,且保存的位置必须受到保护,否则错误或恶意代码会破坏内核,因此在处理中断异常的特权从用户切换到内核,也导致将堆切换至内核内存中,而堆内存在任务状态表(TSS)用于保存旧的 CPU 情况

Types of Exceptions and Interrupts

  • An Example

    通过除零例子描述处理过程:

    1. 处理器切换到 TSS 的 SS0ESP0 字段定义的堆,JOS 分别对应与 GD_KDKSTACKTOP
    2. 将异常参数压栈,放到 KSTACKTOP

      1
      2
      3
      4
      5
      6
      7
      
      +--------------------+ KSTACKTOP
      | 0x00000 | old SS   |     " - 4
      |      old ESP       |     " - 8
      |     old EFLAGS     |     " - 12
      | 0x00000 | old CS   |     " - 16
      |      old EIP       |     " - 20 <---- ESP
      +--------------------+
    3. 根据 IDT 定义除零异常的位置,处理器将 CS:EIP 指向中断处理函数地址

    4. 处理函数负责处理异常

    x86 还会根据中断插入错误码

    1
    2
    3
    4
    5
    6
    7
    8
    
    +--------------------+ KSTACKTOP
    | 0x00000 | old SS   |     " - 4
    |      old ESP       |     " - 8
    |     old EFLAGS     |     " - 12
    | 0x00000 | old CS   |     " - 16
    |      old EIP       |     " - 20
    |     error code     |     " - 24 <---- ESP
    +--------------------+
  • Nested Exceptions and Interrupts

    在处理中断异常的时候仅当从用户态切换到内核,CPU 才会进行切换堆栈然后触发异常

    嵌套中断异常:要是 CPU 已经在内核模式下触发了中断或异常只需压栈存入更多值就行;在处理嵌套情况下不会保存 SSESP 值,结构如下:

    1
    2
    3
    4
    5
    
    +--------------------+ <---- old ESP
    |     old EFLAGS     |     " - 4
    | 0x00000 | old CS   |     " - 8
    |      old EIP       |     " - 12
    +--------------------+
  • Setting Up the IDT

    IDT 结构

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    
          IDT                   trapentry.S         trap.c
    
    +----------------+
    |   &handler1    |---------> handler1:          trap (struct Trapframe *tf)
    |                |             // do stuff      {
    |                |             call trap          // handle the exception/interrupt
    |                |             // ...           }
    +----------------+
    |   &handler2    |--------> handler2:
    |                |            // do stuff
    |                |            call trap
    |                |            // ...
    +----------------+
           .
           .
           .
    +----------------+
    |   &handlerX    |--------> handlerX:
    |                |             // do stuff
    |                |             call trap
    |                |             // ...
    +----------------+
    Each exception or interrupt sh

Exercise 4

主要在 trapentry.S 中给每个 trap 添加入口点

_alltraps 结构:

  1. 值压入堆栈,让其与 Trapframe 相似
  2. GD_KD 加载到 %ds%es
  3. pushl %esp 将指向 Trapframe 的指针作为 trap() 参数
  4. 调用 trap()

首先需要给不同的中断生成入口, TRAPHANDLERTRAPHANDLER_NOCE 主要区别是:在中断异常是否有错误码的情况下,有的话调用 TRAPHANDLER 把错误信息压栈,否则调用 TRAPHANDLER_NOCE 压入一个 0

Table 1: 错误码定义
Description Number Interrupt Error Code
Divide error 0 No
Debug exceptions 1 No
Breakpoint 3 No
Overflow 4 No
Bounds check 5 No
Invalid opcode 6 No
Coprocessor not available 7 No
System error 8 Yes (always 0)
Coprocessor Segment Overrun 9 No
Invalid TSS 10 Yes
Segment not present 11 Yes
Stack exception 12 Yes
General protection fault 13 Yes
Page fault 14 Yes
Coprocessor error 16 No
Two-byte SW interrupt 0-255 No

根据上面的表格进行定义,修改 kern/trapentry.S

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
.text

/*
 * Lab 3: Your code here for generating entry points for the different traps.
 */
  TRAPHANDLER_NOEC(divide_entry, T_DIVIDE);
  TRAPHANDLER_NOEC(debug_entry, T_DEBUG);
  TRAPHANDLER_NOEC(nmi_entry, T_NMI);
  TRAPHANDLER_NOEC(brkpt_entry, T_BRKPT);
  TRAPHANDLER_NOEC(oflow_entry, T_OFLOW);
  TRAPHANDLER_NOEC(bound_entry, T_BOUND);
  TRAPHANDLER_NOEC(illop_entry, T_ILLOP);
  TRAPHANDLER_NOEC(device_entry, T_DEVICE);
  TRAPHANDLER(dblflt_entry, T_DBLFLT);

  TRAPHANDLER(tss_entry, T_TSS);
  TRAPHANDLER(segnp_entry, T_SEGNP);
  TRAPHANDLER(stack_entry, T_STACK);
  TRAPHANDLER(gpflt_entry, T_GPFLT);
  TRAPHANDLER(pgflt_entry, T_PGFLT);
  TRAPHANDLER_NOEC(fperr_entry, T_FPERR);

下面是 _alltraps 部分,其中 pushal 是压入寄存器的值 80386 Programmer’s Reference Manual – Opcode PUSHA

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
/*
 * Lab 3: Your code here for _alltraps
 */
_alltraps:
  // 根据 ~Trapframe~ 进行压栈
  pushl %ds
  pushl %es
  pushal

  //  ~GD_KD~ 加载到 ~%ds~  ~%es~
  mov $GD_KD, %ax
  mov %ax, %dx
  mov %ax, %es

  // 将指向 ~Trapframe~ 的指针作为 ~trap()~ 参数
  pushl %esp

  // 调用 ~trap()~
  call trap

最后处理 trap.c 中 trap_init() 完成 IDT 初始化

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
void
trap_init(void)
{
  extern struct Segdesc gdt[];

  // LAB 3: Your code here.
  void divide_entry();
  void debug_entry();
  void nmi_entry();
  void brkpt_entry();
  void oflow_entry();
  void bound_entry();
  void illop_entry();
  void device_entry();
  void dblflt_entry();
  void tss_entry();
  void segnp_entry();
  void stack_entry();
  void gpflt_entry();
  void pgflt_entry();
  void fperr_entry();

  SETGATE(idt[T_DIVIDE], 1, GD_KT, divide_entry, 0);
  SETGATE(idt[T_DEBUG], 1, GD_KT, debug_entry, 3);
  SETGATE(idt[T_NMI], 1, GD_KT, nmi_entry, 0);
  SETGATE(idt[T_BRKPT], 1, GD_KT, brkpt_entry, 3);
  SETGATE(idt[T_OFLOW], 1, GD_KT, oflow_entry, 0);
  SETGATE(idt[T_BOUND], 1, GD_KT, bound_entry, 0);
  SETGATE(idt[T_ILLOP], 1, GD_KT, illop_entry, 0);
  SETGATE(idt[T_DEVICE], 1, GD_KT, device_entry, 0);
  SETGATE(idt[T_DBLFLT], 1, GD_KT, dblflt_entry, 0);

  SETGATE(idt[T_TSS], 1, GD_KT, tss_entry, 0);
  SETGATE(idt[T_SEGNP], 1, GD_KT, segnp_entry, 0);
  SETGATE(idt[T_STACK], 1, GD_KT, stack_entry, 0);
  SETGATE(idt[T_GPFLT], 1, GD_KT, gpflt_entry, 0);
  SETGATE(idt[T_PGFLT], 1, GD_KT, pgflt_entry, 0);
  SETGATE(idt[T_FPERR], 1, GD_KT, fperr_entry, 0);

  // Per-CPU setup
  trap_init_percpu();
}

Q1

大部分 handler 逻辑相似,区别在于处理中断是否带有错误码,如果只能设置一个 handler 那么都需要压入错误码

Q2

需要把 IDT 中 DPL = 0 才能保证

int $14 的权限级别是 kernel descriptor privilege level, 在用户态中不能正常运行,所以当运行到这个语句时会执行更低权限的异常,即 general protection fault

Part B: Page Faults, Breakpoints Exceptions, and System Calls

Handling Page Faults

发生 page faults 的时候会把导致异常的地址保存至 CR2 寄存器中

Exercise 5

只要把 page_fault 分发到 page_fault_handler() 就可以了,具体根据题意实现就行

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
static void
trap_dispatch(struct Trapframe *tf)
{
  // Handle processor exceptions.
  // LAB 3: Your code here.

  // Handling Page Faults
  if (tf->tf_trapno == T_PGFLT) {
    page_fault_handler(tf);
    return ;
  }

  // ...

}

The Breakpoint Exception

Exercise 6

Exercise 5 类似只要把 Trapframe 作为参数调用 monitor() 即可

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
static void
trap_dispatch(struct Trapframe *tf)
{
  // ...

  // The Breakpoint Exception
  if (tf->tf_trapno == T_BRKPT) {
    monitor(tf);
    return ;
  }

  // ...
}

Q3

SETGATE(idt[T_BRKPT], 1, GD_KT, brkpt_entry, 3) 中将 privilege level 设置为 3 ,从而使用户在运用 int 3 指令生成断点的时候能在用户态调试;如果把 privilege level 设置为 0,那么当用户在执行 int 3 指令时会因为 general protection fault 导致无法运行

Q4

保护系统内核,隔离用户代码与内核代码

System calls

用户进程调用系统调用的时候,进程进入内核态,进程和内核同时保存用户进程的状态,内核运行对应的系统调用并在结束后恢复用户进程

Exercise 7

根据题意分别对 trap_init(), trap_dispatch()syscall() 三个函数进行补充

  • trap_init()

    kern/trapentry.S 中 .text 需要添加

    1
    2
    3
    
    .text
      // ...
      TRAPHANDLER_NOEC(syscall_entry, T_SYSCALL);

    由于系统调用可以被用户态调用,所以 prvilege level 为 3

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    void
    trap_init(void)
    {
      // ...
      void syscall_entry();
    
      // ...
      SETGATE(idt[T_SYSCALL], 1, GD_KT, syscall_entry, 3);
    
    }
  • trap_dispatch()

    trap_dispatch() 用于调用 syscall()

    这里可以参考上文提示:

    The system call number will go in %eax, and the arguments (up to five of them) will go in %edx, %ecx, %ebx, %edi, and %esi, respectively. The kernel passes the return value back in %eax.

    syscall() 的原型在 lib/syscall.c

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    
    static inline int32_t
    syscall(int num, int check, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4, uint32_t a5)
    {
      int32_t ret;
    
      // Generic system call: pass system call number in AX,
      // up to five parameters in DX, CX, BX, DI, SI.
      // Interrupt kernel with T_SYSCALL.
      //
      // The "volatile" tells the assembler not to optimize
      // this instruction away just because we don't use the
      // return value.
      //
      // The last clause tells the assembler that this can
      // potentially change the condition codes and arbitrary
      // memory locations.
    
      asm volatile("int %1\n"
                   : "=a" (ret)
                   : "i" (T_SYSCALL),
                     "a" (num),
                     "d" (a1),
                     "c" (a2),
                     "b" (a3),
                     "D" (a4),
                     "S" (a5)
                   : "cc", "memory");
    
      if(check && ret > 0)
        panic("syscall %d returned %d (> 0)", num, ret);
    
      return ret;
    }

    这里涉及到 GCC 内联汇编,可以参考 CS:APP3e Web Aside ASM:EASM:Combining Assembly Code with C Programs

    内联汇编固定语法为:

    1
    2
    3
    4
    
    asm volatile("asm code"
                 : output
                 : input
                 : changed);
    Table 2: 内联汇编限定符
    限定符 意义
    “m”,“v”,“o” 内存单元
    “r” 任何寄存器
    “q” 寄存器 eax,ebx,ecx,edx 之一
    “i”,“h” 直接操作数
    “E”,“F” 浮点数
    “g” 任意
    “a”,“b”,“c”,“d” 分别表示寄存器 eax,ebx,ecx,edx
    “S”,“D” 寄存器 esi,edi
    “I” 常数(0 至 31)
    Table 3: 内联汇编输出修饰符
    输出修饰符 描述
    + 可以读取和写入操作数
    = 只能写入操作数
    % 如果有必要操作数可以和下一个操作数切换
    & 在内联函数完成之前, 可以删除和重新使用操作数
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    asm volatile("int %1\n"
                 : "=a" (ret)
                 : "i" (T_SYSCALL),
                   "a" (num),
                   "d" (a1),
                   "c" (a2),
                   "b" (a3),
                   "D" (a4),
                   "S" (a5)
                 : "cc", "memory");

    所以这段代码可以理解为引发一个 int 中断,中断向量为 T_SYSCALL 同时对寄存器进行读写

    最后 trap_dispatch() 实现如下

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    
    static void
    trap_dispatch(struct Trapframe *tf)
    {
      // ...
    
      // System calls
      if (tf->tf_trapno == T_SYSCALL) {
        int32_t ret;
        struct PushRegs *regs = &(tf->tf_regs);
        ret = syscall(regs->reg_eax, regs->reg_edx, regs->reg_ecx,
                      regs->reg_ebx, regs->reg_edi, regs->reg_esi);
        regs->reg_eax = (uint32_t)ret;
        return ;
      }
    
      // ...
    }
  • syscall()

    主要根据 inc/syscall.h 内的系统调用号对不同情况进行分发

    1
    2
    3
    4
    5
    6
    7
    8
    
    /* system call numbers */
    enum {
      SYS_cputs = 0,
      SYS_cgetc,
      SYS_getenvid,
      SYS_env_destroy,
      NSYSCALLS
    };
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    
    // Dispatches to the correct kernel function, passing the arguments.
    int32_t
    syscall(uint32_t syscallno, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4, uint32_t a5)
    {
      // Call the function corresponding to the 'syscallno' parameter.
      // Return any appropriate return value.
      // LAB 3: Your code here.
    
      int ret = 0;
    
      // panic("syscall not implemented");
    
      switch (syscallno) {
        case SYS_cputs:
          sys_cputs((const char *)a1, (size_t)a2);
          break;
        case SYS_cgetc:
          ret = sys_cgetc();
          break;
        case SYS_getenvid:
          ret = sys_getenvid();
          break;
        case SYS_env_destroy:
          ret = sys_env_destroy((envid_t)a1);
          break;
        case NSYSCALLS:
          break;
        default:
          ret = -E_INVAL;
      }
    
      return ret;
    }

User-mode startup

Exercise 8

You should modify libmain() to initialize the global pointer thisenv to point at this environment’s struct Env in the envs[] array. (Note that lib/entry.S has already defined envs to point at the UENVS mapping you set up in Part A.) Hint: look in inc/env.h and use sys_getenvid.

根据描述修改就行

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
void
libmain(int argc, char **argv)
{
  // set thisenv to point at our Env structure in envs[].
  // LAB 3: Your code here.
  envid_t envid = sys_getenvid();
  thisenv = envs + ENVX(envid);

  // ...
}

Page faults and memory protection

操作系统通常依赖硬件实现内存保护,当程序使用一个非法内存地址或者对指定虚拟地址无访问权限,处理器会中断当前指令并产生一个异常,随后进入内核态处理。如果可修复,内核进行修复并让程序继续运行;否则摧毁程序

一种常见的修复方式是栈增长。大多数操作系统初始化一个进程时只分配一个 stack 页;当程序发生 page faults 时会向下访问 stack,内核会自动分配相应页并让程序继续运行

大多数系统调用让程序向内核传递一个指向用户内存空间中读写缓存的指针,内核通过解引用这个指针与程序进行交互,但存在两个问题:

  1. 内核态出现的 page fault 可能比用户态更严重,当内核发生 page fault 需要 panic ;当内核解引用程序的指针需要确定指针是属于程序的
  2. 内核权限通常高于程序,程序可传递一个内核可读写但自身不能读写的内存地址。为了防止造成隐私信息泄漏和破坏内核完整性,内核需要检查这些地址

Exercise 9, 10

  • page_fault_handler()

    从注释和提示中知道通过判断 tf_cs 低位确定用户态还是内核

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    
    void
    page_fault_handler(struct Trapframe *tf)
    {
      // ...
    
      // Handle kernel-mode page faults.
      // Hint: to determine whether a fault happened in user mode or in kernel
      // mode, check the low bits of the tf_cs.
    
      // LAB 3: Your code here.
      if ((tf->tf_cs & 3) == 0) {
        panic("page_fault_handler: page fault in kernel-mode %08x.\n", fault_va);
      }
    
      // ...
    }
  • user_mem_check()

    user_mem_check() 需要检查环境访问 [va, va+len) 内存中 perm | PTE_P 的权限

    根据注释提示对 perm 进行补充: perm = perm | PTE_P | PTE_U;

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    
    //
    // Check that an environment is allowed to access the range of memory
    // [va, va+len) with permissions 'perm | PTE_P'.
    // Normally 'perm' will contain PTE_U at least, but this is not required.
    // 'va' and 'len' need not be page-aligned; you must test every page that
    // contains any of that range.  You will test either 'len/PGSIZE',
    // 'len/PGSIZE + 1', or 'len/PGSIZE + 2' pages.
    //
    // A user program can access a virtual address if (1) the address is below
    // ULIM, and (2) the page table gives it permission.  These are exactly
    // the tests you should implement here.
    //
    // If there is an error, set the 'user_mem_check_addr' variable to the first
    // erroneous virtual address.
    //
    // Returns 0 if the user program can access this range of addresses,
    // and -E_FAULT otherwise.
    //
    int
    user_mem_check(struct Env *env, const void *va, size_t len, int perm)
    {
      // LAB 3: Your code here.
    
      // the address is below ULIM
      if ((uintptr_t)va >= ULIM) {
        user_mem_check_addr = (uintptr_t)va;
        return -E_FAULT;
      }
    
      // Range
      uintptr_t start = ROUNDDOWN((uint32_t)va, PGSIZE);
      uintptr_t end = ROUNDUP((uint32_t)(va + len), PGSIZE);
      uintptr_t i;
      pte_t *pte_addr = NULL;
    
      // Permissions
      perm = perm | PTE_P | PTE_U;
    
      for (i = start; i < end; i += PGSIZE) {
        if ((page_lookup(env->env_pgdir, (void *)i, &pte_addr) == NULL) ||
            !(*pte_addr & perm)) {
          user_mem_check_addr = i < (uintptr_t)va ? (uintptr_t)va : i;
          return -E_FAULT;
        }
      }
    
      return 0;
    }
  • sys_cputs()

    同样根据注释,调用 user_mem_assert() 检查用户是否有访问 [s, s+len) 的权限

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
    // Print a string to the system console.
    // The string is exactly 'len' characters long.
    // Destroys the environment on memory errors.
    static void
    sys_cputs(const char *s, size_t len)
    {
      // Check that the user has permission to read memory [s, s+len).
      // Destroy the environment if not.
    
      // LAB 3: Your code here.
      user_mem_assert(curenv, s, len, PTE_U);
    
      // Print the string supplied by the user.
      cprintf("%.*s", len, s);
    }
  • debuginfo_eip()

    最后还需要在 debuginfo_eip() 中调用 user_mem_check() 检查 usd, stabstabstr

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    
    int
    debuginfo_eip(uintptr_t addr, struct Eipdebuginfo *info)
    {
      // ...
      // Find the relevant set of stabs
      if (addr >= ULIM) {
        // ...
      } else {
        // The user-application linker script, user/user.ld,
        // puts information about the application's stabs (equivalent
        // to __STAB_BEGIN__, __STAB_END__, __STABSTR_BEGIN__, and
        // __STABSTR_END__) in a structure located at virtual address
        // USTABDATA.
        const struct UserStabData *usd = (const struct UserStabData *) USTABDATA;
    
        // Make sure this memory is valid.
        // Return -1 if it is not.  Hint: Call user_mem_check.
        // LAB 3: Your code here.
        if (!user_mem_check(curenv, usd, sizeof(struct UserStabData), PTE_U | PTE_P)) {
          return -1;
        }
    
        stabs = usd->stabs;
        stab_end = usd->stab_end;
        stabstr = usd->stabstr;
        stabstr_end = usd->stabstr_end;
    
        // Make sure the STABS and string table memory is valid.
        // LAB 3: Your code here.
        if (!user_mem_check(curenv, stabs, stab_end - stabs, PTE_U | PTE_P)) {
          return -1;
        }
    
        if (!user_mem_check(curenv, stabstr, stabstr_end - stabstr, PTE_U |PTE_P)) {
          return -1;
        }
      }
    
      // ...
    }
  • Q

    If you now run user/breakpoint, you should be able to run backtrace from the kernel monitor and see the backtrace traverse into lib/libmain.c before the kernel panics with a page fault. What causes this page fault? You don’t need to fix it, but you should understand why it happens.

    根据步骤得到下面的异常

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    
    K> backtrace
    Stack backtrace:
    ebp efffff00  eip f0100aa8  args 00000001 efffff28 f01d2000 f01068d8 f01066d5
    kern/monitor.c:156: monitor+353
    ebp efffff80  eip f01042ed  args f01d2000 efffffbc f0150508 00000092 f011afd8
    kern/trap.c:190: trap+312
    ebp efffffb0  eip f01043a0  args efffffbc 00000000 00000000 eebfdfc0 efffffdc
    kern/syscall.c:69: syscall+0
    ebp eebfdfc0  eip 00800087  args 00000000 00000000 eebfdff0 00800058 00000000
    failed to get debuginfo for eip effffed0.
    ebp eebfdff0  eip 00800031  args 00000000 00000000Incoming TRAP frame at 0xeffffe64
    kernel panic at kern/trap.c:265: page_fault_handler: page fault in kernel-mode eebfe000.

    导致 page fault 是由于访问了 0xeebfe000 ,而这个地址是在用户栈外