将木条地板从AWS Kinesis firehose写入AWS S3

将木条地板从AWS Kinesis firehose写入AWS S3

问题描述:

我想从kinesis firehose格式化为实木复合地板的数据中提取数据到s3中.到目前为止,我只是找到了一个暗示着创建EMR的解决方案,但是我正在寻找更便宜,更快捷的方法,例如直接从firehose或使用Lambda函数将接收到的json存储为实木复合地板.

I would like to ingest data into s3 from kinesis firehose formatted as parquet. So far I have just find a solution that implies creating an EMR, but I am looking for something cheaper and faster like store the received json as parquet directly from firehose or use a Lambda function.

非常感谢你, 哈维

好消息,此功能已于今天发布!

Good news, this feature was released today!

Amazon Kinesis Data Firehose可以转换您输入数据的格式 从JSON到Apache Parquet或Apache ORC,然后再将数据存储在 亚马逊S3. Parquet和ORC是节省空间的柱状数据格式 并启用更快的查询

Amazon Kinesis Data Firehose can convert the format of your input data from JSON to Apache Parquet or Apache ORC before storing the data in Amazon S3. Parquet and ORC are columnar data formats that save space and enable faster queries

要启用,请转到您的Firehose流,然后单击编辑.您应该看到记录格式转换部分,如下面的屏幕截图所示:

To enable, go to your Firehose stream and click Edit. You should see Record format conversion section as on screenshot below:

有关详细信息,请参见文档: https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html

See the documentation for details: https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html