产生“不详"的librosaMFCC频谱图
我正在尝试使用librosa创建MFCC图,但是该图似乎并不十分详细.目标是将该MFCC频谱图呈现给神经网络.我正在测试的音频文件长约1秒,来自Google Speech Commands数据集.我的代码是:
I am trying to create an MFCC plot with librosa but the plot just doesn't appear to be very detailed. The goal is to present this MFCC spectrogram to a neural network. The audio file I am testing with is around 1 second long and is from the Google Speech Commands dataset. My code is:
WINDOW_SIZE = 20
NFFT=int((WINDOW_SIZE/1000)*16000)
samples, _ = librosa.load(f, sr=16000)
mfccs = librosa.feature.mfcc(y=samples[:16000], sr=16000, n_fft=NFFT, n_mfcc=40)
plt.figure(figsize=(10, 4))
librosa.display.specshow(mfccs, x_axis='time')
plt.colorbar()
plt.title('MFCC')
plt.tight_layout()
plt.show()
这是正在生成的MFCC频谱图:
This is the MFCC spectrogram being produced:
与其他系数相比,第0个系数具有更多的能量,因此图中其他波段的差异显示得不是很好.
The 0th coefficient has a lot more energy compared to the rest, so differences in the other bands don't show very well in the plot.
您可能需要对此进行归一化,以便所有系数都在同一比例尺上.您可以计算每个系数的平均值和std,然后通过减去平均值并除以标准偏差来进行标准化.可以按剪辑或在整个训练集中完成.
You may want to normalize this such that all coefficients are on the same scale. You can compute the mean and std per coefficient and then standardize by subtracting the mean and dividing by the standard deviation. This can be done per clip, or across the training set.