您的位置: 首页 > IT文章 > 流，用声明性的方式处理数据集

流，用声明性的方式处理数据集

分类: IT文章 • 2022-04-06 11:42:42

流，用声明性的方式处理数据集

引入流

Stream API的代码

声明性更简洁,更易读
可复合更灵活
可并行性能更好

流是什么？

它允许以声明方式处理数据集合
遍历数据集的高级迭代器
透明地并行处理
简短定义：从支持数据处理操作的源生成的元素序列
特点：流水线和内部迭代

返回低热量菜肴的名称

List<Dish> menu = FakeDb.getMenu();

List<String> lowCaloricDishesName =
    // menu.stream()
    menu.parallelStream() // 并行处理
    .filter(d -> d.getCalories() < 400) // 筛选低卡路里
    .sorted(comparing(Dish::getCalories)) // 按卡路里排序
    .map(Dish::getName) // 提取菜肴名称
    .collect(toList()); // 返回list

System.out.println(lowCaloricDishesName);

流，用声明性的方式处理数据集

流与集合？

流只能遍历一次
流是内部迭代，集合是外部迭代

流的操作？

流的使用：一个数据源，一个中间操作链，一个终端操作
流的操作有两类：中间操作与终端操作

使用流

筛选、切片

谓词筛选 filter
去重 distinct
截短流 limit
跳过元素 skip

// 筛选前2个素菜
FakeDb.getMenu().stream()
    .filter(d -> d.isVegetarian()) // 这就是谓词筛选
    .limit(2) // 截断流，返回前n个元素
    .forEach(System.out::println);

Arrays.asList(1, 2, 2, 3, 3, 3).stream()
    .distinct() // 去重
    .skip(1) // 跳过元素
    .forEach(System.out::println);

映射

对每个元素应用函数 map
扁平化 flatMap 把一个流中的每个值都换成另一个流，然后把所有的流连接起来成为一个流

// 提取菜名
FakeDb.getMenu().stream()
    .map(Dish::getName)
    .forEach(System.out::println);

// 求每个字符串长度
Arrays.asList("Java 8", "Lambdas", "In", "Action").stream()
    .map(String::length)
    .forEach(System.out::println);

流的扁平化 flatMap

单词列表返回字符列表

// 单词列表返回字符列表 如输入列表["Hello", "World"]，输出列表["H", "e", "l", "o", "W", "r", "d"]
// 尝试一：不成功
Arrays.asList("Hello", "World").stream()
    .map(w -> w.split("")) // 将每个单词转为字符数组  Stream<String[]>
    .distinct()
    .forEach(System.out::println);

流，用声明性的方式处理数据集

Arrays.stream()可以将数组转换成流

// 尝试二：不成功
Arrays.asList("Hello", "World").stream()
    .map(w -> w.split("")) // 将每个单词转为字符数组  Stream<String[]>
    .map(Arrays::stream) // 将每个数组转换成一个单独的流 Stream<Stream<String>>
    .distinct()
    .forEach(System.out::println);

使用flatMap

Arrays.asList("Hello", "World").stream()
    .map(w -> w.split("")) // 将每个单词转为字符数组
    .flatMap(Arrays::stream)// 将各个生成流扁平化为单个流 Stream<String>
    // 各个数组并不是分别映射成一个流，而是映射成流的内容
    .distinct()
    .forEach(System.out::println);

流，用声明性的方式处理数据集

flatMap方法让你先把一个流中的每个值都换成另一个流，然后把所有的流连接起来成为一个流

flatMap使用举例

// 数字列表的平方列表
Arrays.asList(1, 2, 3, 4, 5, 6).stream()
    .map(n -> n * n)
    .forEach(System.out::println);
System.out.println("----------------------");
// 两个数组列表的数对
List<Integer> numbers1 = Arrays.asList(1, 2, 3);
List<Integer> numbers2 = Arrays.asList(3, 4);
numbers1.stream()
    .flatMap(i -> numbers2.stream().map(j -> new int[]{i, j}))
    .forEach(a -> System.out.println(a[0] + "	" + a[1]));
System.out.println("----------------------");
// 总和能被3整除的数对
numbers1.stream()
    .flatMap(i -> numbers2.stream()
             .filter(j -> (i + j) % 3 == 0)
             .map(j -> new int[]{i, j}))
    .forEach(a -> System.out.println(a[0] + "	" + a[1]));

查找和匹配

从单词列表中返回所用到的字符

allMatch/anyMatch/noneMatch/findFirst/findAny

匹配

// 至少匹配一个元素
boolean match = FakeDb.getMenu().stream()
    .anyMatch(Dish::isVegetarian);
System.out.println(match);

// 匹配所有元素
match = FakeDb.getMenu().stream()
    .allMatch(d -> d.getCalories() > 1000);
System.out.println(match);

// 所有都不匹配
match = FakeDb.getMenu().stream()
    .noneMatch(d -> d.getCalories() > 1000);
System.out.println(match);

查找

Optional<Dish> dish = FakeDb.getMenu().stream()
    .filter(Dish::isVegetarian)
    .findAny();

// 关于返回值 Optional<T> 的用法：
System.out.println(dish.isPresent()); // 存在返回true,否则返回false
dish.ifPresent(System.out::println); // 存在就执行代码块
System.out.println(dish.get()); // 存在返回值,不存在抛异常
Dish defaultDish = new Dish("pork", false, 800, Dish.Type.MEAT);
System.out.println(dish.orElse(defaultDish)); // 存在返回值,否则返回默认值

// 查找第一个元素
Optional<Dish> dish2 = FakeDb.getMenu().stream()
    .filter(Dish::isVegetarian)
    .findFirst();

归约

// 终端操作的返回值
// allMatch boolean
// forEach void
// findAny Optional<T>
// collect R eg:List<T>

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);
// 元素求和,有初始值的情况
int sum = numbers.stream()
    .reduce(0, Integer::sum);
System.out.println(sum);

// 元素求和,无初始值
Optional<Integer> sum2 = numbers.stream()
    .reduce(Integer::sum);
System.out.println(sum2.get());

// 最大值
int maxValue = numbers.stream().reduce(0, Integer::max);
System.out.println(maxValue);

// 最小值
int minValue = numbers.stream().reduce(0, Integer::min);
System.out.println(minValue);

// 菜单中菜的个数
int dishCount = FakeDb.getMenu().stream()
    .map(d -> 1)
    .reduce(0, (a, b) -> a + b);
System.out.println(dishCount);
// 内置count
System.out.println(FakeDb.getMenu().stream().count());

无状态操作：诸如map或filter等操作会从输入流中获取每一个元素，并在输出流中得到0或1个结果。

有状态操作：从流中排序和删除重复项时都需要知道先前的历史

归约应用举例

List<Transaction> tList = FakeDb.getTransactions();
//    (1) 找出2011年发生的所有交易，并按交易额排序（从低到高）。
List<Transaction> list01 = tList.stream()
    .filter(t -> t.getYear() == 2011) // 2011年的交易
    .sorted(Comparator.comparing(Transaction::getValue)) // 交易额从低到高
    .collect(Collectors.toList());
System.out.println(list01);

//    (2) 交易员都在哪些不同的城市工作过？
List<String> cityList = tList.stream()
    .map(t -> t.getTrader().getCity()) // 交易员所在城市
    .distinct() // 去重
    .collect(Collectors.toList());
System.out.println(cityList);

//    (3) 查找所有来自于剑桥的交易员，并按姓名排序。
List<Trader> list3 = tList.stream()
    .map(Transaction::getTrader) // 交易员
    .filter(t -> "Cambridge".equals(t.getCity())) // 来自剑桥
    .distinct() // 去重
    .sorted(Comparator.comparing(Trader::getName)) // 按姓名排序
    .collect(Collectors.toList());
System.out.println(list3);

//    (4) 返回所有交易员的姓名字符串，按字母顺序排序。
String nameStr = tList.stream()
    .map(t -> t.getTrader().getName()) // 交易员的姓名
    .distinct() // 去重
    // .sorted(Comparator.comparing(String::toString)) // 排序
    .sorted() // 排序，可以简写
    // .collect(Collectors.toList());
    // .reduce("", (n1, n2) -> n1 + n2 + " "); // 这种写法效率不高
	.collect(Collectors.joining(" "));
System.out.println(nameStr);

//    (5) 有没有交易员是在米兰工作的？
//        Optional<String> optional5 = tList.stream()
//                .map(Transaction::getTrader)
//                .map(Trader::getCity)
//                .filter(city -> "Milan".equals(city))
//                .findAny();
//        System.out.println(optional5.isPresent());
boolean milanBased = tList.stream()
    .anyMatch(t -> t.getTrader().getCity().equals("Milan"));
System.out.println(milanBased);

//    (6) 打印生活在剑桥的交易员的所有交易额。
//        Optional<Integer> optional6 = tList.stream()
//                .filter(t -> "Cambridge".equals(t.getTrader().getCity()))
//                .map(Transaction::getValue)
//                .reduce((a, b) -> a + b);
//        System.out.println(optional6.get());
tList.stream()
    .filter(t -> "Cambridge".equals(t.getTrader().getCity()))
    .map(Transaction::getValue)
    .forEach(System.out::println);

//    (7) 所有交易中，最高的交易额是多少？
Optional<Integer> optional7 = tList.stream()
    .map(Transaction::getValue)
    .reduce(Integer::max);
System.out.println(optional7.get());

//    (8) 找到交易额最小的交易。
Optional<Integer> optional8 = tList.stream()
    .map(Transaction::getValue)
    .reduce(Integer::min);
System.out.println(optional8.get());

数值流与对象流

// 计算菜单中的卡路里
int calories = FakeDb.getMenu().stream()
    .map(Dish::getCalories)
    .reduce(0, Integer::sum);
// 它有一个暗含的装箱成本，每个Integer都必须拆箱成一个原始类型，再进行求和
System.out.println(calories);

// 原始类型流特化
// IntStream，DoubleStream，LongStream分别将流中的元素特化为int，long，double，从而避免暗含的装箱成本

// 映射到数值流
int sum = FakeDb.getMenu().stream() // 返回 Stream<Dish>
    .mapToInt(Dish::getCalories) // 返回 IntStream
    .sum(); // sum,max,min,average
// sum,如果流是空的,sum默认返回0
System.out.println(sum);

// 转换回对象流
FakeDb.getMenu().stream()
    .mapToInt(Dish::getCalories)
    .boxed(); // 转换为Stream<Integer>

// 默认值OptionalInt OptionalDouble OptionalLong
// sum有默认值0,max,min,average没有默认值
OptionalInt maxCalories = FakeDb.getMenu().stream()
    .mapToInt(Dish::getCalories)
    .max();
System.out.println(sum);

数值范围

// [1, 100)
IntStream numbers = IntStream.range(1, 100);
// [1 100]
IntStream numbers2 = IntStream.rangeClosed(1, 100);

// 求勾股数
IntStream.rangeClosed(1, 100).boxed()
    .flatMap(a -> IntStream.rangeClosed(a, 100)
             .filter(b -> Math.sqrt(a*a + b*b )%1==0)
             .mapToObj(b -> new int[]{a, b, (int)Math.sqrt(a * a + b * b)}))
    .forEach(a -> System.out.println(a[0] + ", " + a[1] + ", " + a[2]));

// 求勾股数 优化
IntStream.rangeClosed(1, 100).boxed()
    .flatMap(a -> IntStream.rangeClosed(a, 100)
             .mapToObj(b -> new double[]{a, b, Math.sqrt(a * a + b * b)}))
    .filter(t -> t[2] % 1 == 0)
    .forEach(a -> System.out.println(a[0] + ", " + a[1] + ", " + a[2]));

生成流

// 由值创建流
Stream.of("java 8", "Lambdas", "In", "Action")
    .map(String::toUpperCase)
    .forEach(System.out::println);

// 由数组创建流
int[] numbers = {2, 3, 5, 7, 11, 13};
int sum = Arrays.stream(numbers).sum();
System.out.println(sum);

// 由文件生成流：统计文件中的不同单词数
long uniqueWords = 0;
try(Stream<String> lines = Files.lines(Paths.get("data.txt"), Charset.defaultCharset())){

    uniqueWords = lines.flatMap(line -> Arrays.stream(line.split(" ")))
        .distinct()
        .count();

} catch (IOException e) {
    e.printStackTrace();
};
System.out.println(uniqueWords);

// 由函数生成流  无限流
// Stream.iterate
// Stream.generate
Stream.iterate(0, n -> n + 2)
    .limit(10)
    .forEach(System.out::println);

// 迭代:斐波纳契元组数列
Stream.iterate(new int[]{0, 1}, t -> new int[]{t[1], t[0] + t[1]})
    .limit(20)
    .forEach(t -> System.out.println("(" + t[0] + ", " + t[1] + ")"));

// 生成
Stream.generate(Math::random)
    .limit(5)
    .forEach(System.out::println);

// 生成一个全是1的无限流
IntStream.generate(() -> 1).limit(11).forEach(System.out::println);

// 生成：斐波纳契元组数列
IntStream.generate(new IntSupplier() {
    private int previous = 0;
    private int current = 1;
    @Override
    public int getAsInt() {
        int oldPrevious = this.previous;
        int nextValue = this.previous + this.current;
        this.previous = this.current;
        this.current = nextValue;
        return oldPrevious;
    }
}).limit(11).forEach(System.out::println);

其他，备注

遍历数组取索引值

https://stackoverflow.com/questions/18552005/is-there-a-concise-way-to-iterate-over-a-stream-with-indices-in-java-8

https://www.zhihu.com/question/51841706

遍历数组取索引值

// import static java.util.stream.Collectors.toList;

String[] names = {"Sam", "Pamela", "Dave", "Pascal", "Erik"};
IntStream.range(0, names.length)
    .filter(i -> names[i].length() <= i)
    .mapToObj(i -> names[i])
    .collect(toList()); // List<String>

// 方案一：
IntStream.range(0, names.length)
    .forEach(i -> System.out.println(i + ", " + names[i]));

// 方案二：
AtomicInteger index = new AtomicInteger();
List<String> list = Arrays.stream(names)
    .filter(n -> n.length() <= index.incrementAndGet())
    .collect(Collectors.toList());

// 方案三：使用Guava
Streams.mapWithIndex(Arrays.stream(names)
                     ,(str, index) -> index + ", " + str)
    .forEach(System.out::println);

// 方案四：未看懂
Seq.seq(Stream.of(names)).zipWithIndex()
    .filter( namesWithIndex -> namesWithIndex.v1.length() <= namesWithIndex.v2 + 1)
    .toList();

// 方案五：未看懂
LazyFutureStream.of(names)
                 .zipWithIndex()
                 .filter( namesWithIndex -> namesWithIndex.v1.length() <= namesWithIndex.v2 + 1)
                 .toList();
// 方案六：
int[] idx = {-1};
Arrays.stream(names)
    .forEach(i -> System.out.println(++idx[0] + ", " + names[idx[0]]));

第三方库

Guava

Apache

lambdaj

Joda-Time

流，用声明性的方式处理数据集

引入流

流是什么？

流与集合？

流的操作？

使用流

筛选、切片

映射

流的扁平化 flatMap

查找和匹配

归约

归约应用举例

数值流与对象流

数值范围

生成流

其他，备注

遍历数组取索引值

第三方库

相关推荐