Monday, January 26, 2015

Java 8: Processing character streams

A common operation in a functional language like Haskell involves doing some processing on every character in a string.  For example:

Prelude Data.Char> let flip c = if (isUpper c) then (toLower c) else (toUpper c) 
Prelude Data.Char> map flip "This is a TEST"
"tHIS IS A test"

Using the Java 8 stream libraries to do a similar task is a little tricky:

public class StringDemo1 {
  public static String invertCapitals(String other) {
    return other.chars()
      .mapToObj(StringDemo1::flipCap)
      .map(c -> Character.toString(c))
      .reduce("", (s, c) -> s + c);
  }
 
  public static Character flipCap(int c) {
    if (c >= 'A' && c <= 'Z') {
       return (char)(c - 'A' + 'a');
    } else if (c >= 'a' && c <= 'z') {
       return (char)(c - 'a' + 'A');
    } else {
       return (char)c;
    }
  }
}

First, we need to convert the string to a stream. That is what the chars() method does. Unfortunately, it creates an IntStream. We use the mapToObj() method to turn the IntStream into a Stream<Character>. Having done this, we use map() to turn it into a Stream<String>, and finally we can use reduce() to combine it all into a single string.

While this does get the job done, it is very inefficient, as a new String object must be allocated for each reduction.  The following variation uses collect() to use a StringBuilder to accumulate the new String efficiently:

public class StringDemo2 {
  public static String invertCapitals(String other) {
    return other.chars()
      .mapToObj(StringDemo1::flipCap)
      .map(c -> Character.toString(c))
      .collect(StringBuilder::new,StringBuilder::append,StringBuilder::append)
      .toString();
  }
}


Using collect() is arguably not as aesthetically pleasing as reduce().  Here is an explanation of the arguments:
  • The first argument generates the collection that will be the accumulation target.
  • The second argument appends an element to the collection.
  • The third argument joins two collections.
This particular example is odd because StringBuilder::append is an overloaded static method.  The first one appends a String; the second one appends a CharSequence, an interface that StringBuilder implements.

Having compared the aesthetics, what about performance?

I found that the version with collect() could process a 100,000 character string in 19 milliseconds, while the version with reduce() requires 6520 milliseconds.

My test program is below.  It provides a nice demonstration of passing functions as parameter values.

import java.util.function.Function;
import java.util.stream.IntStream;

public class StringDemoComparison {
  public static void main(String[] args) {
    String input = 
      IntStream.iterate(1, x -> 1 + x)
               .mapToObj(x -> Character.toString((char)(x % 58 + 65)))
               .limit(100000)
               .collect(StringBuilder::new, StringBuilder::append, StringBuilder::append)
               .toString();
  
    runDemo(StringDemo1::invertCapitals, input);
    runDemo(StringDemo2::invertCapitals, input);
  }
 
  public static void runDemo(Function func, String input) {
    long start = System.currentTimeMillis();
    String result = func.apply(input);
    long duration = System.currentTimeMillis() - start;
    System.out.println(result.length());
    System.out.println("Duration for: " + func.toString() + " is: " + duration);
  }
}


No comments:

Post a Comment