在DataFrame中将新的派生列从布尔值转换为整数
问题描述:
假设我有一个具有以下架构的DataFrame x
:
Suppose I have a DataFrame x
with this schema:
xSchema = StructType([ \
StructField("a", DoubleType(), True), \
StructField("b", DoubleType(), True), \
StructField("c", DoubleType(), True)])
然后我有了DataFrame:
I then have the DataFrame:
DataFrame[a :double, b:double, c:double]
我想有一个整数派生的列.我可以创建一个布尔列:
I would like to have an integer derived column. I am able to create a boolean column:
x = x.withColumn('y', (x.a-x.b)/x.c > 1)
我的新架构是:
DataFrame[a :double, b:double, c:double, y: boolean]
但是,我希望列y
的False包含0,True包含1.
However, I would like column y
to contain 0 for False and 1 for True.
cast
函数只能在列上运行,而不能在DataFrame
上运行,而withColumn
函数只能在DataFrame
上运行.如何添加新列并将其同时转换为整数?
The cast
function can only operate on a column and not a DataFrame
and the withColumn
function can only operate on a DataFrame
. How to I add a new column and cast it to integer at the same time?
答
您使用的表达式求值到列,因此您可以像这样直接转换:
Expression you use evaluates to column so you can cast directly like this:
x.withColumn('y', ((x.a-x.b) / x.c > 1).cast('integer')) # Or IntegerType()