以编程方式将文件从Azure Blob存储传输到Google Cloud Storage

以编程方式将文件从Azure Blob存储传输到Google Cloud Storage

问题描述:

我有许多文件是通过Azure数据工厂传输到Azure Blob存储中的.不幸的是,该工具似乎没有为任何值设置Content-MD5值,因此当我从Blob Storage API中提取该值时,它为空.

I have a number of files that I transferred into Azure Blob Storage via the Azure Data Factory. Unfortunately, this tool doesn't appear to set the Content-MD5 value for any of the values, so when I pull that value from the Blob Storage API, it's empty.

我的目标是将这些文件从Azure Blob存储传输到Google存储.我在 https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec#HttpData 表示,如果我提供文件列表及其URL,字节长度和每个对象的MD5哈希值.

I'm aiming to transfer these files out of Azure Blob Storage and into Google Storage. The documentation I'm seeing for Google's Storagetransfer service at https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec#HttpData indicates that I can easily initiate such a transfer if I supply a list of the files with their URL, length in bytes and an MD5 hash of each.

好吧,我可以轻松地从Azure存储中提取前两个,但是第三个似乎并没有自动被Azure存储填充,也找不到任何方法可以做到这一点.

Well, I can easily pull the first two from Azure Storage, but the third doesn't appear to automatically get populated by Azure Storage, nor can I find any way to get it to do so.

不幸的是,我的其他选择看起来很有限.到目前为止的可能性:

Unfortunately, my other options look limited. In the possibilities so far:

  1. 将文件下载到本地计算机,确定哈希并更新Blob MD5值
  2. 查看我是否无法在可以计算哈希值并将其写入容器中每个对象的blob的同一区域中编写Azure Functions应用程序
  3. 根据
  1. Download file to local machine, determine the hash and update the Blob MD5 value
  2. See if I can't write an Azure Functions app in the same region that can calculate the hash value and write it to the blob for each in the container
  3. Use an Amazon S3 egress from Data Factory and then use Google's support for importing from S3 to pull it from there, per https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec#AwsS3Data but this really seems like a waste of bandwidth (and I'd have to set up an Amazon account).

理想情况下,我希望能够编写脚本,按一下go键,然后将其保留.我没有从Azure最快的下载速度,因此#1将不如所愿,因为它需要很长时间.

Ideally, I want to be able to write a script, hit go and leave it alone. I don't have the fastest download rate from Azure, so #1 would be less than desireable as it'd take a long time.

还有其他方法吗?

2020年5月更新:Google Cloud Data Transfer现在支持将Azure Blob存储作为源.这是一种无代码解决方案.

May 2020 update: Google Cloud Data Transfer now supports Azure Blob storage as a source. This is a no-code solution.

我们用它来将大约1TB的文件从Azure Blob存储传输到Google Cloud Storage.我们每天都有刷新,因此Azure Blob中的所有新文件都会自动复制到Cloud Storage.

We used this to transfer ~ 1TB of files from Azure Blob storage to Google Cloud Storage. We also have a daily refresh so any new files in Azure Blob are automatically copied to Cloud Storage.