How to add elements of a Java 8 stream into an existing List

Javadoc of Collector shows how to collect elements of a stream into a new List. Is there an one-liner that adds the results into an existing ArrayList?

   

shareeditflag

edited Jan 11 '18 at 6:10

Wolfgang Fahl

6,52064497

asked Mar 31 '14 at 4:40

codefx

2,60772848

 

5 Answers

NOTE: nosid's answer shows how to add to an existing collection using forEachOrdered(). This is a useful and effective technique for mutating existing collections. My answer addresses why you shouldn't use a Collector to mutate an existing collection.

The short answer is no, at least, not in general, you shouldn't use a Collector to modify an existing collection.

The reason is that collectors are designed to support parallelism, even over collections that aren't thread-safe. The way they do this is to have each thread operate independently on its own collection of intermediate results. The way each thread gets its own collection is to call the Collector.supplier() which is required to return a new collection each time.

These collections of intermediate results are then merged, again in a thread-confined fashion, until there is a single result collection. This is the final result of the collect() operation.

A couple answers from Balder and assylias have suggested using Collectors.toCollection() and then passing a supplier that returns an existing list instead of a new list. This violates the requirement on the supplier, which is that it return a new, empty collection each time.

This will work for simple cases, as the examples in their answers demonstrate. However, it will fail, particularly if the stream is run in parallel. (A future version of the library might change in some unforeseen way that will cause it to fail, even in the sequential case.)

Let's take a simple example:

List<String> destList = new ArrayList<>(Arrays.asList("foo"));
List<String> newList = Arrays.asList("0", "1", "2", "3", "4", "5");
newList.parallelStream()
       .collect(Collectors.toCollection(() -> destList));
System.out.println(destList);

When I run this program, I often get an ArrayIndexOutOfBoundsException. This is because multiple threads are operating on ArrayList, a thread-unsafe data structure. OK, let's make it synchronized:

List<String> destList =
    Collections.synchronizedList(new ArrayList<>(Arrays.asList("foo")));

This will no longer fail with an exception. But instead of the expected result:

[foo, 0, 1, 2, 3]

it gives weird results like this:

[foo, 2, 3, foo, 2, 3, 1, 0, foo, 2, 3, foo, 2, 3, 1, 0, foo, 2, 3, foo, 2, 3, 1, 0, foo, 2, 3, foo, 2, 3, 1, 0]

This is the result of the thread-confined accumulation/merging operations I described above. With a parallel stream, each thread calls the supplier to get its own collection for intermediate accumulation. If you pass a supplier that returns the same collection, each thread appends its results to that collection. Since there is no ordering among the threads, results will be appended in some arbitrary order.

Then, when these intermediate collections are merged, this basically merges the list with itself. Lists are merged using List.addAll(), which says that the results are undefined if the source collection is modified during the operation. In this case, ArrayList.addAll() does an array-copy operation, so it ends up duplicating itself, which is sort-of what one would expect, I guess. (Note that other List implementations might have completely different behavior.) Anyway, this explains the weird results and duplicated elements in the destination.

You might say, "I'll just make sure to run my stream sequentially" and go ahead and write code like this

stream.collect(Collectors.toCollection(() -> existingList))

anyway. I'd recommend against doing this. If you control the stream, sure, you can guarantee that it won't run in parallel. I expect that a style of programming will emerge where streams get handed around instead of collections. If somebody hands you a stream and you use this code, it'll fail if the stream happens to be parallel. Worse, somebody might hand you a sequential stream and this code will work fine for a while, pass all tests, etc. Then, some arbitrary amount of time later, code elsewhere in the system might change to use parallel streams which will cause your code to break.

OK, then just make sure to remember to call sequential() on any stream before you use this code:

stream.sequential().collect(Collectors.toCollection(() -> existingList))

Of course, you'll remember to do this every time, right? :-) Let's say you do. Then, the performance team will be wondering why all their carefully crafted parallel implementations aren't providing any speedup. And once again they'll trace it down to your code which is forcing the entire stream to run sequentially.

Don't do it.

shareeditflag

edited May 23 '17 at 12:34

Community

11

answered Mar 31 '14 at 7:40

Stuart Marks

81.6k26137208

  • Great explanation! - thanks for clarifying this. I'll edit my answer to recommend never doing this with possible parallel streams. – Balder Mar 31 '14 at 7:49

  • +1 always good to read your answers! – assylias Mar 31 '14 at 9:55

  • If the question is, if there is a one-liner to add elements of a stream into an existing list, then the short answer is yes. See my answer. However, I agree with you, that using Collectors.toCollection() in combination with an existing list is the wrong way. – nosid Mar 31 '14 at 10:00

  • True. I guess the rest of us were all thinking of collectors. – Stuart Marks Mar 31 '14 at 14:45

  • Great answer! I am very tempted to use the sequential solution even if you clearly advise against because as stated it must work well. But the fact that the javadoc requires that the supplier argument of the toCollection method should return a new and empty collection each time convince me not to. I really to want to break the javadoc contract of core Java classes. – zoom May 25 '16 at 10:52

 

 

The short answer is no (or should be no). EDIT: yeah, it's possible (see assylias' answer below), but keep reading. EDIT2: but see Stuart Marks' answer for yet another reason why you still shouldn't do it!

The longer answer:

The purpose of these constructs in Java 8 is to introduce some concepts of Functional Programmingto the language; in Functional Programming, data structures are not typically modified, instead, new ones are created out of old ones by means of transformations such as map, filter, fold/reduce and many others.

If you must modify the old list, simply collect the mapped items into a fresh list:

final List<Integer> newList = list.stream()
                                  .filter(n -> n % 2 == 0)
                                  .collect(Collectors.toList());

and then do list.addAll(newList) — again: if you really must.

(or construct a new list concatenating the old one and the new one, and assign it back to the listvariable—this is a little bit more in the spirit of FP than addAll)

As to the API: even though the API allows it (again, see assylias' answer) you should try to avoid doing that regardless, at least in general. It's best not to fight the paradigm (FP) and try to learn it rather than fight it (even though Java generally isn't a FP language), and only resort to "dirtier" tactics if absolutely needed.

The really long answer: (i.e. if you include the effort of actually finding and reading an FP intro/book as suggested)

To find out why modifying existing lists is in general a bad idea and leads to less maintainable code—unless you're modifying a local variable and your algorithm is short and/or trivial, which is out of the scope of the question of code maintainability—find a good introduction to Functional Programming (there are hundreds) and start reading. A "preview" explanation would be something like: it's more mathematically sound and easier to reason about to not modify data (in most parts of your program) and leads to higher level and less technical (as well as more human friendly, once your brain transitions away from the old-style imperative thinking) definitions of program logic.

shareeditflag

edited Jul 10 '17 at 7:41

Andrii Abramov

4,22643147

answered Mar 31 '14 at 5:04

Erik Allik

25.1k97789

  • The short answer is wrong. – assylias Mar 31 '14 at 6:23

  • @assylias: logically, it wasn't wrong because there was the or part; anyway, added a note. – Erik Allik Mar 31 '14 at 6:29 

  • The short answer is right. The one-liners proposed will succeed in simple cases but fail in the general case.– Stuart Marks Mar 31 '14 at 6:42

  • The longer answer is mostly right, but the design of the API is mainly about parallelism and is less about functional programming. Though of course there are many things about FP that are amenable to parallelism, so these two concepts are well aligned. – Stuart Marks Mar 31 '14 at 6:44 

  • @StuartMarks: Interesting: in which cases does the solution provided in assylias' answer break down? (and good points about parallelism—I guess I got too eager to advocate FP) – Erik Allik Mar 31 '14 at 6:53 

 

You just have to refer your original list to be the one that the Collectors.toList() returns.

Here's a demo:

import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class Reference {

  public static void main(String[] args) {
    List<Integer> list = Arrays.asList(1, 2, 3, 4, 5);
    System.out.println(list);

    // Just collect even numbers and start referring the new list as the original one.
    list = list.stream()
               .filter(n -> n % 2 == 0)
               .collect(Collectors.toList());
    System.out.println(list);
  }
}

And here's how you can add the newly created elements to your original list in just one line.

List<Integer> list = ...;
// add even numbers from the list to the list again.
list.addAll(list.stream()
                .filter(n -> n % 2 == 0)
                .collect(Collectors.toList())
);

That's what this Functional Programming Paradigm provides.

shareeditflag

edited Jul 10 '17 at 7:22

Andrii Abramov

4,22643147

answered Mar 31 '14 at 4:51

Aman Agnihotri

2,58011019

  • I meant to say how to add/collect into an existing list not just reassign. – codefx Mar 31 '14 at 4:58

  • Well, technically you can't do that kind of stuff in Functional Programming paradigm, which streams is all about. In Functional Programming, the state isn't modified, instead, new states are created in persistent data structures, making it safe for concurrency purposes, and more functional like. The approach I mentioned is what you can do, or you can resort to the old style object-oriented approach where you iterate over each element, and keep or remove the elements as you see fit. – Aman Agnihotri Mar 31 '14 at 5:01 

 

As far as I can see, all other answers so far used a collector to add elements to an existing stream. However, there's a shorter solution, and it works for both sequential and parallel streams. You can simply use the method forEachOrdered in combination with a method reference.

List<String> source = ...;
List<Integer> target = ...;

source.stream()
      .map(String::length)
      .forEachOrdered(target::add);

The only restriction is, that source and target are different lists, because you are not allowed to make changes to the source of a stream as long as it is processed.

Note that this solution works for both sequential and parallel streams. However, it does not benefit from concurrency. The method reference passed to forEachOrdered will always be executed sequentially.

shareeditflag

edited Jul 10 '17 at 7:21

Andrii Abramov

4,22643147

answered Mar 31 '14 at 6:53

nosid

38.8k786124

  • +1 It’s funny how so many people claim that there is no possibility when there is one. Btw. I included forEach(existing::add) as a possibility in an answer two month ago. I should have added forEachOrdered as well… – Holger Apr 2 '14 at 14:19

  • +1 this is the actual answer to this exact question – ssedano Sep 1 '14 at 15:21

  • Is there any reason you used forEachOrdered instead of forEach? – membersound Mar 25 '15 at 10:11

  • @membersound: forEachOrdered works for both sequential and parallel streams. In contrast, forEachmight execute the passed function object concurrently for parallel streams. In this case, the function object must be synchronized properly, e.g. by using a Vector<Integer>. – nosid Mar 25 '15 at 11:09

  • @BrianGoetz: I have to admit, that the documentation of Stream.forEachOrdered is a bit imprecise. However, I can't see any reasonable interpretation of this specification, in which there is no happens-beforerelationship between any two calls of target::add. Regardless from which threads the method is invoked, there is no data race. I would have expected you to know that. – nosid Jul 26 '15 at 22:10 

 

Erik Allik already gave very good reasons, why you will most likely not want to collect elements of a stream into an existing List.

Anyway, you can use the following one-liner, if you really need this functionality.

EDIT: But as Stuart Marks explains in his answer, you should never do this, if the streams might be parallel streams - use at your own risk...

list.stream().collect(Collectors.toCollection(() -> myExistingList));

shareeditflag

edited May 23 '17 at 12:18

Community

11

answered Mar 31 '14 at 6:20

Balder

6,68723251

  • ahh, that's a shame :P – Erik Allik Mar 31 '14 at 6:27

  • This technique will fail horribly if the stream is run in parallel. – Stuart Marks Mar 31 '14 at 6:34

  • It would be the responsibility of the collection provider to make sure it doesn't fail - e.g. by providing a concurrent collection. – Balder Mar 31 '14 at 6:37

  • No, this code violates the requirement of toCollection(), which is that the supplier return a new, empty collection of the appropriate type. Even if the destination is thread-safe, the merging that is done for the parallel case will give rise to incorrect results. – Stuart Marks Mar 31 '14 at 6:39

  • @Balder I've added an answer that should clarify this. – Stuart Marks Mar 31 '14 at 7:44

CORE JAVA JAVA-8