相关文章推荐

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers. Visit Stack Exchange

Signal Processing Stack Exchange is a question and answer site for practitioners of the art and science of signal, image and video processing. It only takes a minute to sign up.

Sign up to join this community

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I'm doing some feature extraction on audio signals.
$M$ being a mel filterbank matrix, and $S$ being the spectrogram (extracted from the Short Time Fourier Transform of my audio signal), we can compute:

  • The Log Mel Spectrogram: $X_P = \log(M \times|S|)$
  • The Log Mel Power Spectrogram: $X_{PS} = \log(M \times|S|^2)$
  • Question: Is there a reason to use one over the other?

    Two things come to mind:

  • Using the Magnitude squared is computationally less expensive (no need for sqrt )
  • Using the Magnitude squared emphasizes the largest components, which might or might not be desirable to train a model.
  • Of course I could also compare the model performance when trained on either, but I’m mostly interested in some theoretical aspects, and if there’s anyone with experience using both.

    Any insights?

    $\begingroup$

    Not familiar with melspectrogram, but points worth minding for when an intermediate step precedes a nonlinearity:

  • Said step should be inspected in context of the transform's theory. For wavelet scattering (a strong alt to Mel features), squaring the scalogram breaks its interpretation as encoding amplitude modulations which affects higher-order transforms, and breaks the transform's non-expansiveness in Lipschitz sense which afflicts stability.
  • If the transform isn't invertible, the step may affect loss of information - not at $|S| \rightarrow |S|^2$ , but in what follows. It can also change the representation's SNR for different noise profiles. I recommend the measure described here .
  • These likely aren't worth compromising for sake of a small performance boost. Your second bullet, however, is a strong favoring argument, and I found one of these two to be sometimes favorable in scattering. For a brute force investigation, appropriate test signals might help.

    $\begingroup$ This is great and very interesting. I’m very happy about the measure you mention, and I’ll be stealing that, thank you very much :D $\endgroup$ Jdip Aug 18, 2022 at 20:17

    Thanks for contributing an answer to Signal Processing Stack Exchange!

    • Please be sure to answer the question . Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    Use MathJax to format equations. MathJax reference .

    To learn more, see our tips on writing great answers .

     
    推荐文章