无法在databricks社区版集群中显示dbfs文件.FileNotFoundError:[错误2]没有这样的文件或目录:
尝试读取 databricks社区版集群中的增量日志文件.(databricks-7.2版)
Trying to read delta log file in databricks community edition cluster. (databricks-7.2 version)
df=spark.range(100).toDF("id")
df.show()
df.repartition(1).write.mode("append").format("delta").save("/user/delta_test")
with open('/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
for l in f:
print(l)
找不到文件错误:
Getting file not found error:
FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<command-1759925981994211> in <module>
----> 1 with open('/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
2 for l in f:
3 print(l)
FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'
我尝试添加/dbfs/
, dbfs:/
都没有结果,仍然出现相同的错误.
I have tried with adding /dbfs/
,dbfs:/
nothing got worked out,Still getting same error.
with open('/dbfs/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
for l in f:
print(l)
但是使用 dbutils.fs.head
,我可以读取文件.
But using dbutils.fs.head
i was able to read the file.
dbutils.fs.head("/user/delta_test/_delta_log/00000000000000000000.json")
'{"commitInfo":{"timestamp":1598224183331,"userId":"284520831744638","userName":"","operation":"WRITE","operationParameters":{"mode":"Append","partitionBy":"[]"},"notebook":{"","isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"1","numOutputBytes":"1171","numOutputRows":"100"}}}\n{"protocol":{"minReaderVersi...etc
我们如何使用 python打开方法
在数据块中读取/捕获 dbfs文件
?
How can we read/cat a dbfs file
in databricks with python open method
?
默认情况下,此数据位于DBFS上,并且您的代码需要了解如何访问它.Python对此一无所知-这就是它失败的原因.
By default, this data is on the DBFS, and your code need to understand how to access it. Python doesn't know about it - that's why it's failing.
但是有一种解决方法-DBFS安装在/dbfs
的节点上,因此您只需要将DBFS附加到您的文件名:而不是/user/delta_test/_delta_log/00000000000000000000.json
,使用/dbfs/user/delta_test/_delta_log/00000000000000000000.json
But there is a workaround - DBFS is mounted to the nodes at /dbfs
, so you just need to append it to your file name: instead of /user/delta_test/_delta_log/00000000000000000000.json
, use /dbfs/user/delta_test/_delta_log/00000000000000000000.json
更新:在社区版中,在DBR 7+中,此安装被禁用.解决方法是使用 dbutils.fs.cp
命令将文件从DBFS复制到本地目录,例如/tmp
或/var/tmp
,然后从中读取.
update: on community edition, in DBR 7+, this mount is disabled. The workaround would be to use dbutils.fs.cp
command to copy file from DBFS to local directory, like, /tmp
, or /var/tmp
, and then read from it.