如何在Jupyter Notebook调试中PySpark代码

问题描述:

我想知道我可以在Jpyter笔记本中调试pyspark代码吗?我已经尝试使用ipdb模块在Jupyter中针对常规python代码解决方案。

I am wondering I can debug pyspark codes in Jpyter notebook? I have tried the solution for regular python codes in Jupyter using ipdb module.

在iPython笔记本中调试的正确方法是什么?

但是它不能与带有pyspark内核的笔记本一起使用。.

But it is not working in with notebook with pyspark kernel..

请注意:我的问题是关于在Jupypter笔记本中而不是在ItelliJ IDE或任何其他python IDE中调试pyspark。

Please note that: My question is about debugging pyspark within Jupypter notebook and not in ItelliJ IDE or any other python IDEs.

背景:


  • 我在MacOS优胜美地上。

  • 我的Spark版本是1.6.2

  • Jupyter内核是:Apache Toree PySpark

  • 我有ipdb

  • I am on MacOS yosemite.
  • My spark version is 1.6.2
  • Jupyter kernel is:Apache Toree PySpark
  • I have ipdb installed.

任何帮助将不胜感激。

如果您想在Jyupter笔记本中玩耍并调试PySpark代码,请在安装并设置Spark之后(在此处向您展示如何操作的好指南: https://blog.sicara.com/get-started-pyspark-jupyter-guide-tutorial-ae2fe84f594f ),您可以导入SparkSession并创建本地实例:

In Jyupter notebook if you want to play around and debug PySpark code, once Spark is installed and set up (good guide to show you how here: https://blog.sicara.com/get-started-pyspark-jupyter-guide-tutorial-ae2fe84f594f) you can import SparkSession and create a local instance:

from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[1]").appName("pyspark-test").getOrCreate()
df = spark.read.csv("test.csv", header=True)