Page Table Deduplication and Copy-on-Write (CoW)
In this homework, we implement two advanced virtual memory features in xv6.
First, we introduce a deduplication system, wherein the kernel scans the system for duplicated pages and frees up memory by reusing identical physical memory page frames in several virtual memory locations (Part 1).
This, however, introduces a major problem should a process try to write to one of these shared pages. Thus, the second part of our homework is to handle this problem using a copy-on-write approach: write-protect the shared pages, and make a new writable copy for any process that needs it (Part 2).
Similar to hw1, pull the latest code from https://github.com/sysec-uic/xv6-public/, and switch to hw3.
git pull origin
git checkout hw3
You may want to check what has been modified in hw3 using this link: https://github.com/sysec-uic/xv6-public/compare/master...hw3
Part 1: Page table deduplication (8 pt)
The changes mostly live in kalloc.c. Pay attention to the new functions kretain() and krelease(), as well as a new array of struct frameinfo. Here, kretain() increases the reference count on a page (indicated using the virtual address of the kernel's mapping of the physical frame, a.k.a. P2V(framenumber<<12), just like how kalloc() and kfree() do it. krelease() decreases the count, and if the count reaches zero, it calls kfree() on the page. kalloc() sets the reference count of a freshly allocated page to 1. Finally, all uses of kfree() throughout xv6 were replaced with krelease().
- Attention: whenever you see the term "frame" in xv6, it's about physical pages.
What primarily missing is the implementation of dedup and copy-on-write, for which placeholders exist in vm.c. You only need to edit vm.c to achieve a fully-functional implementation. However, it's okay to edit any other files.
The program dedup_reader allocates a lot of memory and fills it with lots of identical content. This uses up a lot of physical RAM. It then calls the new system call sys_dedup(), which identifies duplicated pages and frees up most of the memory through virtual memory deduplication. The program then reads from the memory to make sure it still works as expected.
A correct solution finishes dedup in less than 1 second without crashing, and shows an increase in the number of free system pages commensurate with the size of the large allocation at the beginning of the program. A sample output:
$ dedup_reader
Freepages at start: 56805
Freepages after malloc: 54359
Freepages after: 56798
The skeleton dedup() function has been provided at the bottom of vm.c. The new system call "sys_dedup()" will come here and print out that message. You only need to implement dedup() to finish this part. Two helper functions (update_checksum() and frames_are_identical() in kalloc.c) are provided for finding the duplicates quickly. Use these functions at your discretion. You can get full points without using them, as long as it works as expected.
If your implementation runs correctly but takes a while to execute, dedup() might be repeatedly performing some expensive computations.
Submission
Please follow instructions on Gradescope to submit your solution.
Part 2: Copy-on-Write (CoW) (8 pt)
The second part of this homework asks you to implement the copy-on-write (CoW) mechanism. You need to build on top of your hw3 (part 1) code, and later push your new code changes (with a new git commit) to the same GitHub remote repository.
The program dedup_writer is similar to dedup_reader, except it then writes to some parts of the memory, checks that other parts of memory are unaffected, calls sys_dedup(), writes some more, and calls sys_dedup() again.
For this to work correctly, you need to write-protect your deduped page table entries so that a page fault is triggered when a write occurs. This is then caught by a new case in the big trap.c switch statement, which leads to the copyonwrite() skeleton function in vm.c. You need to implement copyonwrite() to allow writing on the deduped pages by creating a private copy of that page and mapping it there with write enabled.
A sample output:
$ dedup_writer
Freepages at start: 56804
Freepages after malloc: 54358
Freepages after dedup: 56798
Freepages after writing a little: 56554
Freepages after dedup: 56796
Freepages after writing the rest: 54600
Freepages after dedup: 56797
Submission
Please follow instructions on Gradescope to submit your solution.