使用 Java 将 spark RDD 保存到本地文件系统
我有一个使用 Spark 生成的 RDD.现在,如果我将此 RDD 写入 csv 文件,则会提供一些方法,例如saveAsTextFile()",该方法将 csv 文件输出到 HDFS.
I have a RDD that is generated using Spark. Now if I write this RDD to a csv file, I am provided with some methods like "saveAsTextFile()" which outputs a csv file to the HDFS.
我想将文件写入我的本地文件系统,以便我的 SSIS 进程可以从系统中选取文件并将它们加载到数据库中.
I want to write the file to my local file system so that my SSIS process can pick the files from the system and load them into the DB.
我目前无法使用 sqoop.
I am currently unable to use sqoop.
除了编写 shell 脚本之外,在 Java 中是否有可能做到这一点.
Is it somewhere possible in Java other than writing shell scripts to do that.
需要任何说明,请告知.
Any clarity needed, please let know.
saveAsTextFile
能够接收本地文件系统路径(例如 file:///tmp/magic/...
).但是,如果您在分布式集群上运行,您很可能希望 collect()
将数据返回到集群,然后使用标准文件操作将其保存.
saveAsTextFile
is able to take in local file system paths (e.g. file:///tmp/magic/...
). However, if your running on a distributed cluster, you most likely want to collect()
the data back to the cluster and then save it with standard file operations.