spark集群中运行SparkPi的示范

spark集群中运行SparkPi的示例

1、SparkPi.scala源码（官网例子）

import scala.math.random

import org.apache.spark._

/** Computes an approximation to pi */
object SparkPi {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Spark Pi")
    val spark = new SparkContext(conf)
    val slices = if (args.length > 0) args(0).toInt else 2
    val n = 100000 * slices
    val count = spark.parallelize(1 to n, slices).map { i =>
      val x = random * 2 - 1
      val y = random * 2 - 1
      if (x*x + y*y < 1) 1 else 0
    }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / n)
    spark.stop()
  }
}

2、在Intellij IDE集成开发环境中运行，会出错，需要修改代码，增加

val conf = new SparkConf().setAppName("Spark Pi").setMaster("spark://master:7077")

3、利用IDE环境，把代码打成jar，只需要源码程序即可（其它的引用包去掉）

4、然后在IDE的代码中增加

spark.addJar("/home/cec/spark-1.2.0-bin-hadoop2.4/helloworld.jar")

把helloworld.jar分发到各个worker中

5、运行即可

14/12/31 15:28:57 INFO DAGScheduler: Stage 0 (reduce at SparkPi.scala:21) finished in 4.500 s

14/12/31 15:28:58 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:21, took 8.608873 s

Pi is roughly 3.14468

修改后的运行代码如下：

import scala.math.random
import org.apache.spark.{SparkConf, SparkContext}

/**
 * Created by cec on 12/31/14.
 */


object SparkPi {

  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Spark Pi").setMaster("spark://master:7077")
    val spark = new SparkContext(conf)
    spark.addJar("/home/cec/spark-1.2.0-bin-hadoop2.4/helloworld.jar")
    val slices = if (args.length > 0) args(0).toInt else 2
    val n = 100000 * slices
    val count = spark.parallelize(1 to n, slices).map { i =>
      val x = random * 2 - 1
      val y = random * 2 - 1
      if (x*x + y*y < 1) 1 else 0
    }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / n)
    spark.stop()
  }

}

spark集群中运行SparkPi的示范

相关推荐