Level 2 · 20 min

Streams & Functional

The Java Stream API (Java 8+) enables declarative, pipeline-style data processing. Understanding lazy evaluation, when computation actually happens, and the pitfalls of parallel streams is essential for writing correct and efficient collection-processing code.

Stream Pipeline: Source → Intermediate → Terminal

A stream pipeline has three parts: 1) Source (Collection.stream(), Stream.of(), Files.lines(), Stream.generate()), 2) Intermediate operations (filter, map, flatMap, sorted, distinct, limit, skip) — these are lazy and return a new Stream, 3) Terminal operations (collect, forEach, count, reduce, findFirst, anyMatch) — these trigger execution of the entire pipeline. No computation happens until a terminal operation is invoked. This laziness enables short-circuit optimization: Stream.of(1,2,3,4,5).filter(x -> x > 3).findFirst() evaluates only until the first match, not the whole list.

Lazy Evaluation and Short-Circuit Operations

Short-circuit intermediate operations: limit(n) stops after n elements, takeWhile (Java 9) stops when predicate fails. Short-circuit terminal operations: findFirst(), findAny(), anyMatch(), allMatch(), noneMatch(). The pipeline processes one element at a time through all stages, not stage-by-stage. So filter().map().collect() does: for each element, filter → if passes, map → then collect. This enables early exit and avoids materializing intermediate collections. Effective Java (Item 45) precisely defines stream pipeline evaluation semantics: 'Stream pipelines are evaluated lazily: evaluation doesn't start until the terminal operation is invoked, and data elements that aren't required in order to complete the terminal operation are never computed. This lazy evaluation is what makes it possible to work with infinite streams. Note that a stream pipeline without a terminal operation is a silent no-op, so don't forget to include one.' Parallel streams (myList.parallelStream()) split the source into chunks processed by the ForkJoinPool.commonPool(). Effective Java cautions that parallel streams are rarely beneficial for ordered sources with stateful intermediate operations (sorted(), distinct()) — the merge cost outweighs the parallelism gain. Parallel streams show meaningful speedups primarily on computationally expensive per-element operations over unordered, large (10,000+) data sets of ArrayList or array sources.

Parallel Streams and Collector API

Parallel streams split the source, process chunks concurrently on the ForkJoinPool.commonPool(), and merge results. Requirements for safe parallelization: 1) Non-interfering operations (don't modify the source during processing), 2) Stateless lambdas (no shared mutable state), 3) Associative operations for reduction (a+(b+c) == (a+b)+c). Parallel streams have overhead from splitting and merging — they only win on large data sets (> 10,000 elements) with CPU-intensive operations. The Collector API (Collectors.toList(), toMap(), groupingBy(), partitioningBy()) provides reusable reduction strategies. Method references (Class::method) are more readable than equivalent lambdas and avoid capturing this.

Key Takeaways

Streams are lazy — no computation happens until a terminal operation is called. This enables short-circuit optimization.
Parallel streams use ForkJoinPool.commonPool(). Do not use them for I/O-bound operations or when lambda captures shared mutable state.
Collectors.groupingBy() is a powerful reduction — group elements by a classifier and apply a downstream collector to each group.

Code example

// Lazy pipeline — no computation until collect()
List<String> result = employees.stream()
    .filter(e -> e.getSalary() > 50_000)
    .map(Employee::getName)
    .sorted()
    .collect(Collectors.toList());

// Short-circuit: stops at first match
Optional<Employee> first = employees.stream()
    .filter(e -> e.getDept().equals("Engineering"))
    .findFirst(); // pipeline stops after first found

// Collectors.groupingBy with downstream
Map<String, Long> countByDept = employees.stream()
    .collect(Collectors.groupingBy(
        Employee::getDept,
        Collectors.counting()));

// Method references
employees.stream()
    .map(Employee::getName)   // instance method ref
    .forEach(System.out::println); // static method ref