我不太了解FFT和采样率
我真的很困惑.我是一位AI程序员,致力于开发一款旨在检测歌曲中的节拍等的游戏.我以前没有关于音频的知识,只是阅读我能找到的任何材料.当我开始工作时,我只是不了解将样本传输到不同频率的方式.问题1,每个频率代表什么.对于我得到的算法.我可以将1024个样本转换为512个结果.那么,它们是否描述了当前每一秒的频谱强度?这真的没有任何意义,因为我记得44.1khz录音中有20,000hz.那么512个频谱样本如何解释那一刻发生的事情呢?从我读到的问题2,此刻代表了声波的数字.但是,我通过对左声道和右声道都进行平方运算并将其相加来阅读,您将获得当前的功率水平.这两个似乎与我的理解不一致,而且我真的是buff领导,所以请解释一下.
Im really confused over here. I am a ai programmer working on a game that is designed to detect beats in songs and some more. I have no previous knowledge about audio and just reading through whatever material i can find. While i got fft working and stuff I simply don't understand the way samples are transferred to different frequencies. Question 1, what does each frequency stands for. For the algorithm i got. I can transfer for example 1024 samples into 512 outcomes. So are they a description of the strength of each spectrum at the current second? it doesn't really make sense since what i remember is that there are 20,000hz in a 44.1khz audio recording. So how does 512 spectrum samples explain what is happening in that moment? Question 2, from what i read, its a number that represent the sound wave at this moment. However i read that by squaring both left channel and right channel, and add them together and you will get the current power level. Both these seems incoherent to my understanding, and i am really buff led so please explain away.
-
DFT输出
输出是基函数(通常为正弦波)的相量(Re,Im,Frequency)的复数表示.第一项是 DC 偏移,因此请跳过它.所有其他都是相同的基本频率(sampling rate/N
)的倍数.输出是对称的(如果输入仅是实数),因此仅使用结果的前一半.经常使用功率谱
the output is complex representation of phasor (Re,Im,Frequency) of basis function (usually sin wave). First item is DC offset so skip it. All the others are multiples of the same fundamental frequency (sampling rate/N
). The output is symmetric (if the input is real only) so use just first half of results. Often power spectrum is used
Amplitude=sqrt(Re^2+Im^2)
是基函数的幅度.如果需要阶段,则
which is the amplitude of basis function. If phase is needed then
phase=atan2(Im,Re)
当心 DFT 结果在很大程度上取决于输入信号的形状,频率和基本功能的相移.这会导致输出在正确的值附近振动/振荡,并产生宽的峰值,而不是针对奇异频率的尖峰.更不用说混叠了.
beware DFT results are strongly dependent on the input signal shape,frequency and phase shift to your basis functions. That causes the output to vibrate/oscillate around the correct value and produce wide peaks instead of sharp ones for singular frequencies not to mention aliasing.
频率
如果得到44100Hz
,则最大输出频率是它的一半,这意味着数据中出现的最大频率是22050Hz
.但是 DFFT 不包含此频率,因此,如果忽略结果的镜像后一半,则:
if you got 44100Hz
then the max output frequency is half of it that means the biggest frequency present in data is 22050Hz
. The DFFT however does not contain this frequency so if you ignore the mirrored second half of results then:
- 对于4个样本DFT输出频率为
{ -,11025 }
Hz - 对于8个样本,频率为:
{ -,5512.5,11025,16537.5 }
Hz
- for 4 samples DFT outputs frequencies are
{ -,11025 }
Hz - for 8 samples frequencies are:
{ -,5512.5,11025,16537.5 }
Hz
从一开始,输出频率就与其地址成线性关系,因此,如果您有N=512
个采样
The output frequency is linear to its address from start so if you got N=512
samples
- 对其进行DFFT
- 获得第一个
N/2=256
个结果 -
i
-第一个样本代表频率f=i*samplerate/N
Hz
- do DFFT on it
- obtain first
N/2=256
results i
-th sample represents frequencyf=i*samplerate/N
Hz
其中i={ 1,...,(N/2)-1}
...正在跳过i=0
where i={ 1,...,(N/2)-1}
... skipping i=0
该图显示了其中一个与我紧密关联的矿山实用程序应用程序
the image shows one of mine utility apps tighted together with
- 2声道声音发生器(左上方)
- 2通道示波器(右上)
- 2通道频谱分析仪(底部)...切换到线性频率刻度,以使我在上文本中清楚地理解
缩放图像以查看设置...我将其尽可能地靠近真实设备.
zoom the image to see the settings ... I made it as close to the real devices as I could.
在这里 DCT 和 DFT 比较:
这里 DFT 输出依赖于采样率的输入信号频率混叠
Here the DFT output dependency on input signal frequency aliasing by sampling rate
更多频道
通道的求和能力更安全.如果仅添加频道,则可能会丢失一些数据.例如,让左声道播放1 Khz正弦波,右声道正好相反,因此,如果您将它们相加,结果为零,但您可以听到声音....(如果您不完全位于扬声器之间的中间).如果独立分析每个通道,则需要计算每个通道的DFFT,但是如果使用通道的功率总和(或绝对值),则可以一次获得所有通道的频率,粗略地需要调整幅度. .
Summing power of channels is more safe. If you just add the channels then you could miss some data. For example let left channel is playing 1 Khz sin wave and the right exact opposite so if you just sum them then the result is zero but you can hear the sound .... (if you are not exactly in the middle between speakers). If you analyze each channel independently then you need to calculate DFFT for each channel but if you use power sum of channels (or abs sum) then you can obtain the frequencies for all channels at once , of coarse you need to scale the amplitudes ...
[注释]
N
越大,结果越好(更少的混叠伪像,并且更接近最大频率).对于特定的频率检测, FIR 滤波器检测器更精确,更快速.
Bigger the N
nicer the result (less aliasing artifacts and closer to the max frequency). For specific frequencies detection are FIR filter detectors more precise and faster.
强烈建议阅读 DFT 及其中的所有子链接,以及
Strongly recommend to read DFT and all sublinks there and also this plotting real time Data on (qwt) Oscillocope