根据日期过滤火花数据框
我有一个
date, string, string
我想选择某个时间段之前的日期.我尝试了以下但没有运气
I want to select dates before a certain period. I have tried the following with no luck
data.filter(data("date") < new java.sql.Date(format.parse("2015-03-14").getTime))
我收到一个错误说明以下内容
I'm getting an error stating the following
org.apache.spark.sql.AnalysisException: resolved attribute(s) date#75 missing from date#72,uid#73,iid#74 in operator !Filter (date#75 < 16508);
据我所知,查询是不正确的.谁能告诉我查询的格式应该如何?
As far as I can guess the query is incorrect. Can anyone show me what way the query should be formatted?
我检查了数据框中的所有条目是否都有值 - 他们确实有.
I checked that all enteries in the dataframe have values - they do.
以下解决方案自 spark 1.5 起适用:
The following solutions are applicable since spark 1.5 :
对于低于:
// filter data where the date is lesser than 2015-03-14
data.filter(data("date").lt(lit("2015-03-14")))
大于:
// filter data where the date is greater than 2015-03-14
data.filter(data("date").gt(lit("2015-03-14")))
对于相等,您可以使用 equalTo
或 ===
:
For equality, you can use either equalTo
or ===
:
data.filter(data("date") === lit("2015-03-14"))
如果您的 DataFrame
日期列是 StringType
类型,您可以使用 to_date
函数转换它:
If your DataFrame
date column is of type StringType
, you can convert it using the to_date
function :
// filter data where the date is greater than 2015-03-14
data.filter(to_date(data("date")).gt(lit("2015-03-14")))
您还可以使用 year
函数根据年份进行过滤:
You can also filter according to a year using the year
function :
// filter data where year is greater or equal to 2016
data.filter(year($"date").geq(lit(2016)))