Computing Intelligently: February 2017

Saturday, February 18, 2017

Pipelines in Rust

A classic Operating Systems assignment is to implement a Unix shell. It's a great assignment for several reasons. Students have to:

Get a solid understanding of the concept of a process.
Master some of the more subtle aspects of the file system.
Make direct system calls with subtle semantics.

Since I am teaching my course in Rust, I needed to figure out how I wanted the students to implement pipelines and I/O redirection. The options are:

Rely entirely on the standard library.
Interact directly with libc.

The benefit of relying on the standard library is, first of all, that the resulting code is portable across platforms. Second, there is no need for unsafe blocks. The problem is that the pipe from the standard library only connects a parent and child. The Command object is great, but it doesn't allow for the separation between fork() and execv() that is essential for implementing proper I/O redirection.

The trouble with the standard libc crate is that all of the function calls are unsafe. As far as I am concerned, this is bad news pedagogically because I do not want my students using unsafe blocks unless absolutely necessary. They are still learning the language, and I want them to learn to operate within the safety parameters it gives. Unsafe blocks, while necessary, are for experts, and my students will not be experts after one semester.

Fortunately, I recently discovered the nix crate. (Update: Starting in version 0.19.0, fork() is now marked unsafe. The code below works unmodified up to version 0.18.0. The nature of POSIX requirements for these functions makes them intrinsically unsafe in Rust.) It is a safe alternative to libc that provides everyone's favorite system calls in a Rust-friendly form. The program below executes a simple three-command pipeline. Here are the key concepts for understanding the program:

The fork() system call creates a new process. The new process is a perfect duplicate of the original process. The only difference between them is that fork() returns a different value if it is in the child process. If it is in the parent process, it returns the child's PID.
A parent and child process are connected using a pipe. A pipe is accessed using two file descriptors: one for the pipe's input, one for the pipe's output. The output of the pipe serves as an input stream for a process, and likewise the input of the pipe serves as an output stream for a process.
When a parent and child are connected by a pipe, the parent will typically be receiving data from the child. Therefore, the parent's input will be the pipe's output, and the child's output will be the pipe's input.
When the child is finished generating output, it exits. Then, the parent can finish processing the data it received from the child before it in turn exits as well. It is essential that the child exit first; if the parent exits first, unless other provisions are made both processes end.
To redirect standard input from the pipe instead of from the keyboard, we need to close the file descriptor for the keyboard (always file descriptor 0) and replace it with the file descriptor for the output of the pipe. The dup2() system call accomplishes this.
Similarly, to redirect standard output into the pipe instead of to the monitor, we need to close the monitor's file descriptor (always file descriptor 1) and replace it with the file descriptor for the input of the pipe. Again, we use dup2() for this purpose.
Remember that fork() completely duplicates a process. This includes its open file descriptors. In our case, that includes both ends of the pipe. We need to explicitly close the end of the pipe that the process is not using.
The execvp() system call loads a program and executes it. In doing so, the data for the current process is overwritten completely (except for the open file descriptors). The first argument is the name of the program. The PATH environment variable is searched for that program. The second argument is the complete list of command-line arguments, including the program name.
These parameters to execvp() need to be formatted as fixed-size arrays of C-style strings (which are very different from Rust strings). The program below shows how this conversion can be done.

If you'd like an alternative explanation of how the program works, feel free to view a 15-minute video in which I explain the details of this program.

use nix::unistd::{pipe, fork, ForkResult, close, dup2, execvp};
use std::ffi::CString;
use nix::sys::wait::waitpid;

// Add dependency nix = "0.19" to Cargo.toml

fn main() {
    println!("Executes ls -l | grep Cargo | wc");

    let ls:   Vec<CString> = vec![CString::new("ls").unwrap(),
                         CString::new("-l").unwrap()];
    let grep: Vec<CString> = vec![CString::new("grep").unwrap(),
                         CString::new("Cargo").unwrap()];
    let wc:   Vec<CString> = vec![CString::new("wc").unwrap()];

    match unsafe {fork()}.unwrap() {
        ForkResult::Parent{child} => {
            println!("wc pid is {}", child);
            waitpid(child, Option::None).unwrap();
            println!("Finished! Exiting...");
        },

        ForkResult::Child => {
            let (wc_in, grep_out) = pipe().unwrap();

            match unsafe {fork()}.unwrap() {
                ForkResult::Parent{child} => {
                    close(grep_out).unwrap();

                    println!("grep pid is {}", child);
                    dup2(wc_in, 0).unwrap();
                    let array = wc.into_boxed_slice();
                    execvp(&array[0], &*array).unwrap();
                },

                ForkResult::Child => {
                    close(wc_in).unwrap();

                    let (grep_in, ls_out) = pipe().unwrap();

                    match unsafe {fork()}.unwrap() {
                        ForkResult::Parent {child} => {
                            close(ls_out).unwrap();

                            println!("ls pid is {}", child);
                            dup2(grep_out, 1).unwrap();
                            dup2(grep_in, 0).unwrap();
                            let array = grep.into_boxed_slice();
                            execvp(&array[0], &*array).unwrap();
                        },

                        ForkResult::Child => {
                            close(grep_in).unwrap();

                            dup2(ls_out, 1).unwrap();
                            let array = ls.into_boxed_slice();
                            execvp(&array[0], &*array).unwrap();
                        }
                    }
                }
            }
        }
    }
}

Threads, locks, and deadlocks in Rust

A very compelling feature of the Rust language is compile-time protection against data races. This makes it much easier to write correct threaded programs that share mutable state. The following program illustrates how this works. The program runs eight parallel simulations of a coin-flipping experiment.

extern crate rand;
use std::thread;
use std::sync::{Arc, Mutex};

fn main() {
    let total_flips = Arc::new(Mutex::new(0));
    let completed = Arc::new(Mutex::new(0));
    let runs = 8;
    let target_flips = 10;

    for _ in 0..runs {
        let total_flips = total_flips.clone();
        let completed = completed.clone();
        thread::spawn(move || {
            simulate(target_flips, total_flips);
            let mut completed = completed.lock().unwrap();
            *completed += 1;
        });
    }

    loop {
        let completed = completed.lock().unwrap();
        if *completed == runs {
            let total_flips = total_flips.lock().unwrap();
            println!("Final average: {}", *total_flips / *completed);
            break;
        }
    }
}

fn simulate(target_flips: u64, total_flips: Arc<Mutex<u64>>) {
    let mut consecutive = 0;
    let mut iterations = 0;
    while consecutive < target_flips {
        iterations += 1;
        if rand::random() {
            consecutive += 1;
        } else {
            consecutive = 0;
        }
    }
    println!("iterations: {}", iterations);
    let mut total_flips = total_flips.lock().unwrap();
    *total_flips += iterations;
}

The main() function creates two mutable variables that can be shared among threads. One represents the total flips across all experiments. The other represents the total number of experiments that have completed. They are Arc (atomic reference counted pointers) objects that each contain a Mutex (lockable resource).

As each thread is created, it needs its own reference to these shared variables. To create a new shared reference, we need to clone() each variable. This clone is then moved into the thread as it is created, so that the thread is now the owner of the cloned reference.

To access the shared variables, a thread must acquire a lock. The thread blocks if the lock is already held by another thread. A thread releases a lock it holds when the locked variable goes out of scope. The total_flips variable is locked within the simulate() function, while the completed variable is locked after that function completes. The main thread also locks completed periodically as it checks for completion.

The program works fine as written above. But it would be easy to make a mistake that leads to a deadlock. Consider the following variation.

extern crate rand;
use std::thread;
use std::sync::{Arc, Mutex};

fn main() {
    let total_flips = Arc::new(Mutex::new(0));
    let completed = Arc::new(Mutex::new(0));
    let runs = 8;
    let target_flips = 10;

    for _ in 0..runs {
        let total_flips = total_flips.clone();
        let completed = completed.clone();
        thread::spawn(move || {
            simulate(target_flips, total_flips);
            let mut completed = completed.lock().unwrap();
            *completed += 1;
        });
    }
    
    let completed = completed.lock().unwrap();
    while *completed < runs {}
    let total_flips = total_flips.lock().unwrap();
    println!("Final average: {}", *total_flips / *completed);
}

fn simulate(target_flips: u64, total_flips: Arc<Mutex<u64>>) {
    let mut consecutive = 0;
    let mut iterations = 0;
    while consecutive < target_flips {
        iterations += 1;
        if rand::random() {
            consecutive += 1;
        } else {
            consecutive = 0;
        }
    }
    println!("iterations: {}", iterations);
    let mut total_flips = total_flips.lock().unwrap();
    *total_flips += iterations;
}

In this variation, the main thread retains ownership of the lock on completed. This causes all of the simulating threads to block. Unfortunately, the simulating threads cannot update completed because they are blocked. In the meantime, the main thread is waiting for the simulating threads to complete! This is an example of a deadlock. Although Rust prevents lots of concurrency problems, it can't prevent all of them! When writing multithreaded Rust code, it is always essential to retain locks for the smallest possible scope in order to avoid deadlocks.

I'd like to point out that there are alternatives to locks available in Rust. One intriguing option involves using lock-free data structures. I'd like to explore that library in more depth when time permits.

Strings in Rust

One concept in Rust that those learning the language can find very confusing is the existence of two different types of strings: String and &str. (This post assumes at least a basic understanding of the Rust ownership system.)

The String type represents a block of memory that is owned. Mutation is possible if the variable corresponding to the object is appropriately designated.

The &str type is (generally speaking) borrowed from elsewhere rather than owned. This can happen in a couple of ways. String literals are all &str objects. Methods like split() return iterators into their source strings. The iterated items are then borrowed references to substrings within the source string.

The program below is intended to illustrate these distinctions.

use std::io;
use std::io::Write;

fn main() {
    match run() {
        Ok(_) => println!("Goodbye."),
        Err(e) => println!("Error: {}", e),
    }
}

fn run() -> io::Result<()> {
    let stdin = io::stdin();
    let mut longest_overall = String::from("");
    loop {
        print!("> ");
        io::stdout().flush()?;
        let mut line = String::new();
        stdin.read_line(&mut line)?;

        if line.trim().len() == 0 {
            break;
        }

        let mut line_longest: &str = "";
        for word in line.split_whitespace() {
            if word.len() > line_longest.len() {
                line_longest = word;
            }
            if word.len() > longest_overall.len() {
                longest_overall = String::from(word);
            }
        }
        println!("Longest word in this line: {}", line_longest);
    }
    println!("Longest overall word: {}", longest_overall);
    Ok(())
}

This program accepts command-line input from the user. Each line of input is split up to find the constituent words. The program reports the longest word from the current line. When the program ends, it also reports the longest overall word.

The variable longest_overall is a String. I chose to make it a String because I need for that variable to maintain long-term ownership of the value it is storing, and that's not possible with a &str.

The variable line is a String. It has to be a String because it gets mutated when input is read from the user.

The variable line_longest is a &str. It borrows a substring from line. Because it has the same lifetime as line, borrowing is all we need. Note that both of their lifetimes end when they go out of scope at the end of each iteration of the loop.

This brings us back to the need for longest_overall to be a String. Ultimately, the longest word overall is simply a substring of line, which goes out of scope at the end of each loop iteration. That makes it absolutely necessary to make a long-term copy of that longest word, in order to preserve it in existence after its source is deallocated.

To sum up, when figuring out which type of Rust string to use, pay attention to the lifetime. Remember that &str objects are always borrowed from somewhere else. If the string is needed beyond the lifetime of the source of the borrow, you need a String.

This example does not illustrate this, but many methods and functions expect a &str. If you have a String, and a &str is demanded, just use the as_str() method to lend that String to the function.

Teaching Operating Systems using Rust

This semester (inspired by the example and compelling rationale of David Evans) I have been teaching Operating Systems using the Rust language. (I have explicitly rejected the possibility of teaching in C.) Rust is a great choice for this course for several reasons:

It is sufficiently low-level that one can implement a complete operating system. In particular, it compiles to a binary and does not require a run-time due to its lack of a garbage collector.
Concurrency is a major theme in operating systems. In Rust, a program that experiences data races will not compile. This enables the compiler to assist in teaching the students about safe mutable data structures when running concurrent threads.
In our curriculum, the programming languages we employ in two or more courses are Python, Java, and Haskell. Something that all three of these languages have in common is a heavy reliance on heap allocation. Most stack-allocated variables in those languages are pointers into the heap. In contrast, Rust explicitly provides for both stack-allocated and heap-allocated data, with arguably a bias towards stack-allocation.

My overall strategy for the course is twofold:

Spend the first half of the semester learning the Rust language and getting familiar with OS concepts. To this end, I'm following the outline from David Evans's class, where the students implement a web server and a Unix shell. To get accustomed to Rust, I began the semester by having students write lots of short command-line programs.
Spend the second half of the semester exploring the source code of Redox, an open-source microkernel OS written entirely in Rust. Students will be assigned sections of the source code to present to the class. This will be a way to flesh out the concepts explored in the first half of the course in terms of a concrete implementation.

For a textbook, I am using the freely-available online book Operating Systems: Three Easy Pieces by Remzi and Andrea Arpaci-Dusseau. I first learned about the book from (again) David Evans, and I later learned that my friend and colleague Mark Goadrich had taken their OS course while in graduate school. What I like about this book is how it lives up to its name. The "three easy pieces" are virtualization, concurrency, and persistence. It has been a great tool for conceptually organizing my lectures. I constantly make reference to how each topic we explore relates to these three ideas.

I plan to write some additional posts about the course and our experience with Rust as it proceeds.

What's wrong with C?

(The following post is adapted from an answer I wrote in response to this question.)

I started programming in C++ when I was a sophomore in college. Later, I also learned to program in C (without the ++) when working as a student Unix systems administrator. I wrote somewhere between 10,000 and 20,000 lines of C++ when working on my doctoral dissertation. Early in my career as a professor, I taught both the C++ and C languages in different courses at different times.

I've since given up entirely on the C and C++ languages. I no longer teach them or write any code in them.

So what do I use now? For quick scripting, I use Python. For building something with a GUI, I usually program in Java. For robot programming, I use both Python and Java, depending on the situation and the available libraries. (I have ambitions to begin using Rust for robot programming as well.) I like Haskell a lot, but I have not really found a niche for it in the projects I have been pursuing.

My goal in this post is to outline my rationale for giving up on the C family. Some of these comments apply specifically to C, others to both C and C++. None of these critiques are original; all of these points have been made by other people. I am assembling them here to give a concise rationale as to why one long-time C/C++ programmer decided to move on.

What is great about C is that it is well-suited to low-level programming that directly addresses hardware. Unlike Python, Java, any .NET language, or anything with a garbage collector, it does not have a run-time. The code you write in C is translated directly into assembly language, and aside from libraries you explicitly invoke, the only code that runs is the code you write. If you are writing an operating system or code for an embedded device with real-time constraints, it can be an appealing option. (I now believe that Rust is a strictly superior option in these situations, but that is a subject for a future blog post.)

However, there are numerous problems with C that often make it unproductive or impractical. Problems that also apply to C++ are noted explicitly:

Minimal facilities for data abstraction (C). Higher-level languages like Python and Java provide many advanced mechanisms for data abstraction, including classes, objects, polymorphism, lambdas, and so forth. While all of the above can be imitated using C (especially through clever use of function pointers), it is quite painful.
Minimal data-structure libraries (C). Other languages include data-structure libraries as part of their standard libraries. While third-party data-structure libraries are available for C, it can be a significant inconvenience in comparison to standard libraries shipped with every implementation.
Poor string-handling (C). Using strings in C requires lots of low-level operations. Equivalent work can be done in languages like Python with much less code.
Lack of safety (C, C++). In a safe language, like Java or Python, errors in programs are trapped as they happen. C is an unsafe language. That is, a program with an error results in undefined behavior. This can make C programs insanely difficult to debug. In a safe language, the program throws an exception; you trace the exception, find the problem, and fix it. In an unsafe language, all bets are off. Get very familiar with valgrind.
Poorly defined semantics (C, C++). According to the C standard, the compiler can generate any code it wants in the presence of undefined behavior. Furthermore, even extremely experienced C programmers can fail to understand situations that result in undefined behavior. As compiler writers discover more opportunities to optimize away code that results in undefined behavior, existing programs that are believed to work suddenly start failing simply by updating the compiler. As bug submitter felix-gcc put it, “Guys, your obligation is not just to implement the C standard. Your obligation is also not to break apps that depend on you.” And this isn’t just one developer; even gcc itself and the Linux kernel frequently execute code with undefined behavior. The fact that this can happen and is so pervasive reveals fatal flaws in the C standard itself. For more details about these problems, I recommend the following:

A Guide to Undefined Behavior in C and C++
Proposal for a Friendly Dialect of C. A key quote: “…we are trying rescue the predictable little language that we all know is hiding within the C standard. This language generates tight code and doesn’t make you feel like the compiler is your enemy.”
Proposal for a Boring C compiler. This proposal was written by a cryptographer who is tired of the consequences of undefined-behavior optimizations.
The satirical essay It’s Time to Get Serious About Exploiting Undefined Behavior
The outstanding survey With Undefined Behavior, Anything is Possible

All of that said, what really brings this issue to be important to me is the fact that I am now my department's Operating Systems instructor. I had to conclude that I did not want to teach that course using the C language. I have decided to teach it using Rust, the details of which I will describe in a series of future blog posts.