使用 `tf.audio.decode_wav` 读取 `wav` 文件

使用 `tf.audio.decode_wav` 读取 `wav` 文件

问题描述:

我正在关注 simple_audio 上的音频识别 tensorflow 教程.笔记本运行良好.

I am following the tensorflow tutorial for audio recognition at simple_audio. The notebook works very well.

下一步,我想录制我自己的声音,然后通过在 tensorflow 中训练的模型运行它.我首先生成了一个录音:

As a next step, I wanted to record my own voice and then run it through the model trained in tensorflow. I first generated a recording:

seconds=1
sr=16000
nchannels=1
myrecording = sd.rec(int(seconds * sr), samplerate=sr, channels=nchannels)
sd.wait()
wavfile.write(filename, sr, myrecording)

到目前为止一切顺利,我可以播放我的录音.但是当我尝试使用类似这样的 tf.audio.decode_wav 加载文件时:

So far so good, I can play my recording. But when I try to load the file with tf.audio.decode_wav similar to this:

audio_binary = tf.io.read_file(filename)
audio, _ = tf.audio.decode_wav(audio_binary)

我收到以下错误:

InvalidArgumentError:WAV 音频格式错误:预期为 1 (PCM),但得到 3 [Op:DecodeWav]

InvalidArgumentError: Bad audio format for WAV: Expected 1 (PCM), but got3 [Op:DecodeWav]

非常感谢任何有关可能出错的提示.

Any pointers on what might be going wrong are greatly appreciated.

(本来想把这个写成评论,但我还没有足够的声望)

(Would have written this as a comment, but I don't have enough reputation yet)

WAV 文件的默认编码称为16 位 PCM",这意味着录制的声音在写入 WAV 文件之前使用 16 位 int 数据表示.

The default encoding for WAV files is called "16 bit PCM", which means the recorded sound is represented using 16-bit int data before it is written to your WAV file.

tf.audio.decode_wav() 声明在 文档:将 16 位 PCM WAV 文件解码为浮点张量".因此,使用任何其他编码(在您的情况下为 24 位编码)传递 WAV 文件将导致您收到的错误.

tf.audio.decode_wav() states in the documentation: "Decode a 16-bit PCM WAV file to a float tensor". Thus passing a WAV file using any other encoding (24-bit encoding, in your case) would result in an error like the one you received.