Saturday, February 18, 2017

Strings in Rust

One concept in Rust that those learning the language can find very confusing is the existence of two different types of strings: String and &str. (This post assumes at least a basic understanding of the Rust ownership system.)

The String type represents a block of memory that is owned. Mutation is possible if the variable corresponding to the object is appropriately designated.

The &str type is (generally speaking) borrowed from elsewhere rather than owned.  This can happen in a couple of ways. String literals are all &str objects. Methods like split() return iterators into their source strings. The iterated items are then borrowed references to substrings within the source string.

The program below is intended to illustrate these distinctions.

use std::io;
use std::io::Write;

fn main() {
    match run() {
        Ok(_) => println!("Goodbye."),
        Err(e) => println!("Error: {}", e),
    }
}

fn run() -> io::Result<()> {
    let stdin = io::stdin();
    let mut longest_overall = String::from("");
    loop {
        print!("> ");
        io::stdout().flush()?;
        let mut line = String::new();
        stdin.read_line(&mut line)?;

        if line.trim().len() == 0 {
            break;
        }

        let mut line_longest: &str = "";
        for word in line.split_whitespace() {
            if word.len() > line_longest.len() {
                line_longest = word;
            }
            if word.len() > longest_overall.len() {
                longest_overall = String::from(word);
            }
        }
        println!("Longest word in this line: {}", line_longest);
    }
    println!("Longest overall word: {}", longest_overall);
    Ok(())
}

This program accepts command-line input from the user. Each line of input is split up to find the constituent words. The program reports the longest word from the current line. When the program ends, it also reports the longest overall word.

The variable longest_overall is a String. I chose to make it a String because I need for that variable to maintain long-term ownership of the value it is storing, and that's not possible with a &str.

The variable line is a String. It has to be a String because it gets mutated when input is read from the user.

The variable line_longest is a &str. It borrows a substring from line. Because it has the same lifetime as line, borrowing is all we need. Note that both of their lifetimes end when they go out of scope at the end of each iteration of the loop.

This brings us back to the need for longest_overall to be a String. Ultimately, the longest word overall is simply a substring of line, which goes out of scope at the end of each loop iteration. That makes it absolutely necessary to make a long-term copy of that longest word, in order to preserve it in existence after its source is deallocated.

To sum up, when figuring out which type of Rust string to use, pay attention to the lifetime. Remember that &str objects are always borrowed from somewhere else. If the string is needed beyond the lifetime of the source of the borrow, you need a String.

This example does not illustrate this, but many methods and functions expect a &str. If you have a String, and a &str is demanded, just use the as_str() method to lend that String to the function.

2 comments:

  1. This is a big improvement over the previous Strings in Rust tutorial (http://www.computingbook.org/dorina/pubs/strings.html)

    ReplyDelete
  2. LOL! What a talented daughter you have! :)

    ReplyDelete