spark集群中运行SparkPi的示范
spark集群中运行SparkPi的示例
1、SparkPi.scala源码(官网例子)
import scala.math.random import org.apache.spark._ /** Computes an approximation to pi */ object SparkPi { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Spark Pi") val spark = new SparkContext(conf) val slices = if (args.length > 0) args(0).toInt else 2 val n = 100000 * slices val count = spark.parallelize(1 to n, slices).map { i => val x = random * 2 - 1 val y = random * 2 - 1 if (x*x + y*y < 1) 1 else 0 }.reduce(_ + _) println("Pi is roughly " + 4.0 * count / n) spark.stop() } }
2、在Intellij IDE集成开发环境中运行,会出错,需要修改代码,增加
val conf = new SparkConf().setAppName("Spark Pi").setMaster("spark://master:7077")
3、利用IDE环境,把代码打成jar,只需要源码程序即可(其它的引用包去掉)
4、然后在IDE的代码中增加
spark.addJar("/home/cec/spark-1.2.0-bin-hadoop2.4/helloworld.jar")
把helloworld.jar分发到各个worker中
5、运行即可
14/12/31 15:28:57 INFO DAGScheduler: Stage 0 (reduce at SparkPi.scala:21) finished in 4.500 s
14/12/31 15:28:58 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:21, took 8.608873 s
Pi is roughly 3.14468
修改后的运行代码如下:
import scala.math.random import org.apache.spark.{SparkConf, SparkContext} /** * Created by cec on 12/31/14. */ object SparkPi { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Spark Pi").setMaster("spark://master:7077") val spark = new SparkContext(conf) spark.addJar("/home/cec/spark-1.2.0-bin-hadoop2.4/helloworld.jar") val slices = if (args.length > 0) args(0).toInt else 2 val n = 100000 * slices val count = spark.parallelize(1 to n, slices).map { i => val x = random * 2 - 1 val y = random * 2 - 1 if (x*x + y*y < 1) 1 else 0 }.reduce(_ + _) println("Pi is roughly " + 4.0 * count / n) spark.stop() } }