Stack Exchange Network
Stack Exchange network consists of 183 Q&A communities including
Stack Overflow
, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
Visit Stack Exchange
Signal Processing Stack Exchange is a question and answer site for practitioners of the art and science of signal, image and video processing. It only takes a minute to sign up.
Sign up to join this community
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I'm doing some feature extraction on audio signals.
$M$
being a mel filterbank matrix, and
$S$
being the spectrogram (extracted from the Short Time Fourier Transform of my audio signal), we can compute:
The Log Mel Spectrogram:
$X_P = \log(M \times|S|)$
The Log Mel Power Spectrogram:
$X_{PS} = \log(M \times|S|^2)$
Question: Is there a reason to use one over the other?
Two things come to mind:
Using the Magnitude squared is computationally less expensive (no need for
sqrt
)
Using the Magnitude squared emphasizes the largest components, which might or might not be desirable to train a model.
Of course I could also compare the model performance when trained on either, but I’m mostly interested in some theoretical aspects, and if there’s anyone with experience using both.
Any insights?
$\begingroup$
Not familiar with melspectrogram, but points worth minding for when an intermediate step precedes a nonlinearity:
Said step should be inspected in context of the transform's theory. For
wavelet scattering
(a strong alt to Mel features), squaring the scalogram breaks its interpretation as encoding amplitude modulations which affects higher-order transforms, and breaks the transform's non-expansiveness in Lipschitz sense which afflicts stability.
If the transform isn't invertible, the step may affect loss of information - not at
$|S| \rightarrow |S|^2$
, but in what follows. It can also change the representation's SNR for different noise profiles. I recommend the measure described
here
.
These likely aren't worth compromising for sake of a small performance boost. Your second bullet, however, is a strong favoring argument, and I found
one of these two
to be sometimes favorable in scattering. For a brute force investigation, appropriate
test signals
might help.
$\begingroup$
$\endgroup$
–
Thanks for contributing an answer to Signal Processing Stack Exchange!
-
Please be sure to
answer the question
. Provide details and share your research!
But
avoid
…
-
Asking for help, clarification, or responding to other answers.
-
Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations.
MathJax reference
.
To learn more, see our
tips on writing great answers
.