Google Cloud Storage + Python:是否可以在GCS的某些文件夹中列出obj?
我将编写一个Python程序来检查文件是否在我的Google Cloud Storage的某些文件夹中,基本思想是获取文件夹中所有对象的list
,文件名list
,然后检查文件abc.txt
是否在文件名list
中.
I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the list
of all objects in a folder, a file name list
, then check if the file abc.txt
is in the file name list
.
现在的问题是,看来Google只提供一种获取obj
list
的方法,即uri.get_bucket()
,请参见下面的代码,该代码来自
Now the problem is, it looks Google only provide the one way to get obj
list
, which is uri.get_bucket()
, see below code which is from https://developers.google.com/storage/docs/gspythonlibrary#listing-objects
uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for obj in uri.get_bucket():
print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name)
print ' "%s"' % obj.get_contents_as_string()
uri.get_bucket()
的缺陷是,看起来它首先获取了所有对象,这是我不想要的,我只需要获取特定文件夹的obj
名称list
(例如,
The defect of uri.get_bucket()
is, it looks it is getting all of the object first, this is what I don't want, I just need get the obj
name list
of particular folder(e.g gs//mybucket/abc/myfolder
) , which should be much quickly.
有人可以帮忙回答吗?感谢每个答案!
Could someone help answer? Appreciate every answer!
更新:对于适用于Python的旧版"Google API客户端库",以下内容适用,但如果您不使用该功能客户端,请选择适用于Python的较新的"Google Cloud Client库"( https://googleapis.dev/python/storage/latest/index.html ).对于较新的库,等效于以下代码的是:
Update: the below is true for the older "Google API Client Libraries" for Python, but if you're not using that client, prefer the newer "Google Cloud Client Library" for Python ( https://googleapis.dev/python/storage/latest/index.html ). For the newer library, the equivalent to the below code is:
from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs('bucketname', prefix='abc/myfolder'):
print(str(blob))
以下是针对较老客户的答案.
Answer for older client follows.
您可能会发现使用具有完整功能的Python客户端的JSON API更容易.它具有列出带有前缀参数的对象的功能,您可以通过以下方式使用该参数检查某个目录及其子目录:
You may find it easier to work with the JSON API, which has a full-featured Python client. It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:
from apiclient import discovery
# Auth goes here if necessary. Create authorized http object...
client = discovery.build('storage', 'v1') # add http=whatever param if auth
request = client.objects().list(
bucket="mybucket",
prefix="abc/myfolder")
while request is not None:
response = request.execute()
print json.dumps(response, indent=2)
request = request.list_next(request, response)
列表调用的完整文档位于此处: https://developers .google.com/storage/docs/json_api/v1/objects/list
Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list
此处记录了Google Python API客户端: https://code.google.com/p/google-api-python-客户/
And the Google Python API client is documented here: https://code.google.com/p/google-api-python-client/