如何区分HDF5数据集和具有h5py的组?

问题描述:

我使用Python软件包h5py(版本2.5.0)访问我的hdf5文件.

I use the Python package h5py (version 2.5.0) to access my hdf5 files.

我想遍历文件的内容并对每个数据集执行一些操作.

I want to traverse the content of a file and do something with every dataset.

使用visit方法:

import h5py

def print_it(name):
    dset = f[name]
    print(dset)
    print(type(dset))


with h5py.File('test.hdf5', 'r') as f:
    f.visit(print_it)

对于获得的测试文件:

<HDF5 group "/x" (1 members)>
<class 'h5py._hl.group.Group'>
<HDF5 dataset "y": shape (100, 100, 100), type "<f8">
<class 'h5py._hl.dataset.Dataset'>

这告诉我文件中有一个数据集和一个组.但是,除了使用type()区分数据集和组外,没有其他明显的方法.不幸的是, h5py文档并未对此主题发表任何评论.他们总是假设您事先知道什么是组,什么是数据集,例如因为他们自己创建了数据集.

which tells me that there is a dataset and a group in the file. However there is no obvious way except for using type() to differentiate between the datasets and the groups. The h5py documentation unfortunately does not say anything about this topic. They always assume that you know beforehand what are the groups and what are the datasets, for example because they created the datasets themselves.

我想要类似的东西:

f = h5py.File(..)
for key in f.keys():
    x = f[key]
    print(x.is_group(), x.is_dataset()) # does not exist

使用h5py在Python中读取未知的hdf5文件时,如何区分组和数据集?如何获得所有数据集,所有组,所有链接的列表?

How can I differentiate between groups and datasets when reading an unknown hdf5 file in Python with h5py? How can I get a list of all datasets, of all groups, of all links?

不幸的是,h5py api中没有内置的方法可以对此进行检查,但是您可以简单地使用is_dataset = isinstance(item, h5py.Dataset)检查项目的类型.

Unfortunately, there is no builtin way in the h5py api to check this, but you can simply check the type of the item with is_dataset = isinstance(item, h5py.Dataset).

要列出文件的所有内容(尽管文件属性除外),可以使用 Group.visititems ,带有可调用项,可调用项的名称和实例.

To list all the content of the file (except the file's attributes though) you can use Group.visititems with a callable which takes the name and instance of a item.