使用PYTHON运行Google数据流模板
我想使用PYTHON执行Google数据流模板.实际上,我一直在使用Dataflow REST API
或Cloud Functions
集成来执行数据流模板.这是我在Postman中执行的Dataflow模板:
I want to execute a Google Dataflow Template using PYTHON. Actually, I have been executing Dataflow Templates using the Dataflow REST API
or the Cloud Functions
Integration. This is my Dataflow template execution in Postman:
URL: https://dataflow.googleapis.com/v1b3/projects/{{my-project-id}}/templates:launch?gcsPath=gs://{{my-cloud-storage-bucket}}/temp/cloud-dataprep-template
{
"jobName": "test-datfalow-job",
"parameters": {
"inputLocations" : "{\"location1\":\"gs://{{my-cloud-storage-bucket}}/my-folder/**/*\"}",
"outputLocations": "{\"location1\":\"gs://{{my-cloud-storage-bucket}}/my-output/output.csv\"}"
},
"environment": {
"tempLocation": "gs://{{my-cloud-storage-bucket}}/tmp",
"zone": "us-central1-f"
}
}
我不知道是否有使用google-api-python-client的机会,还是我必须使用python的request.post和Google Cloud Authentication执行此HTTP POST
I don't know if there's any chance to use the google-api-python-client or I have to execute this HTTP POST using python's requests.post and Google Cloud Authentication
您可以使用模板启动方法/python/apis/dataflow/v1b3"rel =" nofollow noreferrer>适用于Python的数据流API客户端库,如下所示:
You can do that using the template launch method from the Dataflow API Client Library for Python like so:
import googleapiclient.discovery
from oauth2client.client import GoogleCredentials
project = PROJECT_ID
location = LOCATION
credentials = GoogleCredentials.get_application_default()
dataflow = googleapiclient.discovery.build('dataflow', 'v1b3', credentials=credentials)
result = dataflow.projects().templates().launch(
projectId=project,
body={
"environment": {
"zone": "us-central1-f",
"tempLocation": "gs://{{my-cloud-storage-bucket}}/tmp"
},
"parameters": {
"inputLocations" : "{\"location1\":\"gs://{{my-cloud-storage-bucket}}/my-folder/**/*\"}",
"outputLocations": "{\"location1\":\"gs://{{my-cloud-storage-bucket}}/my-output/output.csv\"}"
},
"jobName": SOME_NAME
},
gcsPath = PATH_TO_TEMPLATE
).execute()