Google云端存储:如何在Python中(递归)删除文件夹

问题描述:

我正在尝试使用其Python库删除GCS中的文件夹及其所有内容(包括子目录).我也知道GCS确实没有文件夹(但有前缀吗?),但是我想知道如何做到这一点?

I am trying to delete a folder in GCS and its all content (including sub-directories) with its Python library. Also I understand GCS doesn't really have folders (but prefix?) but I am wondering how I can do that?

我测试了这段代码:

from google.cloud import storage

def delete_blob(bucket_name, blob_name):
    """Deletes a blob from the bucket."""
    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.blob(blob_name)

    blob.delete()

delete_blob('mybucket', 'top_folder/sub_folder/test.txt')
delete_blob('mybucket', 'top_folder/sub_folder/')

对delete_blob的第一次调用有效,但第二次调用无效.我可以递归删除文件夹吗?

The first call to delete_blob worked but not the 2nd one. What can I delete a folder recursively?

要删除以某个前缀(例如,目录名)开头的所有内容,可以遍历列表:

To delete everything starting with a certain prefix (for example, a directory name), you can iterate over a list:

storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blobs = bucket.list_blobs(prefix='some/directory')
for blob in blobs:
  blob.delete()

请注意,对于具有数百万或数十亿个对象的非常大的存储桶,这可能不是一个很快的过程.为此,您需要做一些更复杂的事情,例如在多个线程中删除或使用生命周期配置规则来安排要删除的对象.

Note that for very large buckets with millions or billions of objects, this may not be a very fast process. For that, you'll want to do something more complex, such as deleting in multiple threads or using lifecycle configuration rules to arrange for the objects to be deleted.