了解Java 8和Java 9中的顺序流和并行流分裂器
关于分裂者的问题乍一看并不简单。
A question about spliterators that at first glance is not straightforward.
在溪流中, .parallel()
更改流处理的行为。但是我期望顺序和并行流创建的分裂器是相同的。例如,顺序流中的通常不会调用 .trySplit()
,而在并行流中,它是为了移交将spliterator拆分为另一个线程。
In streams, .parallel()
changes the behaviour that the stream is processed. However I was expecting the spliterators created from sequential and parallel streams to be the same. For example, in sequential streams typically, the .trySplit()
is never invoked, while in parallel streams it is, in order to hand over the split spliterator to another thread.
stream.spliterator()
vs stream之间的差异。 parallel()。spliterator()
:
-
它们可能有不同的特征:
They may have different characteristics:
Stream.of(1L, 2L, 3L).limit(2); // ORDERED
Stream.of(1L, 2L, 3L).limit(2).parallel(); // SUBSIZED, SIZED, ORDERED
似乎另一个无意义的流分裂器特征策略(并行似乎更好地计算)在这里讨论:深入了解java 8和java 9中的分裂器特征
It seems another nonsense stream spliterator characteristics policy (in parallel seems better calculated) discussed here: Understanding deeply spliterator characteristics in java 8 and java 9
-
他们可能在使用
.trySplit()
进行拆分方面有不同的行为:
They may have different behaviour in terms of splitting using
.trySplit()
:
Stream.of(1L, 2L, 3L); // NON NULL
Stream.of(1L, 2L, 3L).limit(2); // NULL
Stream.of(1L, 2L, 3L).limit(2).parallel(); // NON NULL
为什么最后两个有不同的行为?如果我愿意,为什么我不能拆分连续流? (例如,丢弃其中一个分割以进行快速处理可能很有用。)
Why do the last two have different behaviours? Why I can't I split a sequential stream if I want to? (Could be useful to discard one of the splits for fast processing, for example).
-
大影响将分裂器转换为流时:
Big impacts when transforming a spliterators to a stream:
spliterator = Stream.of(1L, 2L, 3L).limit(2).spliterator();
stream = StreamSupport.stream(spliterator, true); // No parallel processing!
在这种情况下,分裂器是从一个禁用拆分能力的顺序流( .trySplit()
返回null)。稍后,需要转换回流,该流不会受益于并行处理。很遗憾。
In this case, a spliterator was created from a sequential stream which disables the ability to split (.trySplit()
returns null). When later, there is a need to transform back to a stream, that stream won't benefit from parallel processing. A shame.
最大的问题:作为一种解决方法,的主要影响是什么? 在调用 .spliterator()
之前将流转换为并行?
The big question: As a workaround, what are the major impacts of always transforming a stream to parallel before invoking .spliterator()
?
// Supports activation of parallel processing later
public static <T> Stream<T> myOperation(Stream<T> stream) {
boolean isParallel = stream.isParallel();
Spliterator<T> spliterator = stream.parallel().spliterator();
return StreamSupport.stream(new Spliterator<T>() {
// My implementation of the interface here (omitted for clarity)
}, isParallel).onClose(stream::close);
}
// Now I have the option to use parallel processing when needed:
myOperation(stream).skip(1).parallel()...
这不是分裂者的一般属性,只有包装分裂器封装流管道。
This is not a general property of spliterators, but only of wrapping spliterators encapsulating a stream pipeline.
当您在流上调用 spliterator()
时已经从spliterator生成并且没有链接操作,你将获得源分裂器,它可能支持 trySplit
,无论流 parallel
state。
When you are calling spliterator()
on a stream that has been generated from a spliterator and has no chained operation, you’ll get the source spliterator which may or may not support trySplit
, regardless of the stream parallel
state.
ArrayList<String> list = new ArrayList<>();
Collections.addAll(list, "foo", "bar", "baz");
Spliterator<String> sp1 = list.spliterator(), sp2=list.stream().spliterator();
// true
System.out.println(sp1.getClass()==sp2.getClass());
// not null
System.out.println(sp2.trySplit());
同样
Spliterator<String> sp = Stream.of("foo", "bar", "baz").spliterator();
// not null
System.out.println(sp.trySplit());
但是在调用 spliterator()$之前连接操作c $ c>,你将获得一个包装流管道的分裂器。现在,可以实现执行相关操作的专用分裂器,如
LimitSpliterator
或 MappingSpliterator
,但这尚未完成,因为当其他终端操作不适合时,将流转换回分裂器已被视为最后的手段,而不是高优先级用例。相反,您将始终获得单个实现类的实例,该实现类尝试将流管道实现的内部工作转换为spliterator API。
But as soon as you chain operations before calling spliterator()
, you will get a spliterator wrapping the stream pipeline. Now, it would be possible to implement dedicated spliterators performing the associated operation, like a LimitSpliterator
or a MappingSpliterator
, but this has not been done, as converting a stream back to a spliterator has been considered as last resort when the other terminal operations do not fit, not a high priority use case. Instead, you will always get an instance of the single implementation class that tries to translate the inner workings of the stream pipeline implementation to the spliterator API.
这可能很安静复杂对于有状态操作,最值得注意的是,已排序
, distinct
或 skip
& 限制
表示非 SIZED
流。对于琐碎的无状态操作,例如 map
或 filter
,提供支持会更容易,因为甚至在代码注释中评论
This can be quiet complicated for stateful operations, most notably, sorted
, distinct
or skip
&limit
for a non-SIZED
stream. For trivial stateless operations, like map
or filter
, it would be much easier to provide support, as has been even remarked in a code comment
抽象包装spliterator,它在第一次操作时绑定到管道帮助器的spliterator。
这个spliterator没有后期绑定,并且在第一次操作时将绑定到源spliterator。
如果存在有状态操作,则无法拆分从顺序流生成的包装拆分器。
Abstract wrapping spliterator that binds to the spliterator of a pipeline helper on first operation. This spliterator is not late-binding and will bind to the source spliterator when first operated on. A wrapping spliterator produced from a sequential stream cannot be split if there are stateful operations present.
…
// @@@ Detect if stateful operations are present or not
// If not then can split otherwise cannot
/**
* True if this spliterator supports splitting
*/
final boolean isParallel;
但似乎目前此检测尚未实施且所有中间操作被视为有状态操作。
but it seems that currently this detection has not been implemented and all intermediate operations are treated like stateful operations.
Spliterator<String> sp = Stream.of("foo", "bar", "baz").map(x -> x).spliterator();
// null
System.out.println(sp.trySplit());
当你试图通过总是解决这个问题调用 parallel
,当流管道仅由无状态操作组成时,不会产生任何影响。但是当进行有状态操作时,它可能会显着改变行为。例如,当你有一个 sorted
步骤时,所有元素都必须进行缓冲和排序,然后才能使用第一个元素。对于并行流,它可能会使用 parallelSort
,即使您从未调用 trySplit
。
When you try to work-around this by always calling parallel
, there will be no impact when the stream pipeline consists of stateless operations only. But when having a stateful operation, it might change the behavior significantly. E.g., when you have a sorted
step, all elements have to be buffered and sorted, before you can consume the first element. For a parallel stream, it will likely use a parallelSort
, even when you never invoke trySplit
.