| Peer-Reviewed

Speech Enhancement Using Hilbert Spectrum and Wavelet Packet Based Soft-Thresholding

Received: 11 April 2015    Accepted: 18 April 2015    Published: 29 April 2015
Views:       Downloads:
Abstract

A method of and a system for speech enhancement consists of Hilbert spectrum and wavelet packet analysis is studied. We implement ISA to separate speech and interfering signals from single mixture and wavelet packet based soft-thresholding algorithm to enhance the quality of target speech. The mixed signal is projected onto time-frequency (TF) space using empirical mode decomposition (EMD) based Hilbert spectrum (HS). Then a finite set of independent basis vectors are derived from the TF space by applying principal component analysis (PCA) and independent component analysis (ICA) sequentially. The vectors are clustered using hierarchical clustering to represent the independent subspaces corresponding to the component sources in the mixture. However, the speech quality of the separation algorithm is not enough and contains some residual noises. Therefore, in the next stage, the target speech is enhanced using wavelet packet decomposition (WPD) method where the speech activity is monitored by updating noise or unwanted signals statistics. The mode mixing issue of traditional EMD is addressed and resolved using ensemble EMD. The proposed algorithm is also tested using short-time Fourier transform (STFT) based spectrogram method. The simulation results show a noticeable performance in the field of audio source separation and speech enhancement.

Published in Science Journal of Circuits, Systems and Signal Processing (Volume 4, Issue 1)
DOI 10.11648/j.cssp.20150401.12
Page(s) 1-8
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Speech Enhancement, Ensemble Empirical Mode Decomposition, Source Separation, Independent Subspace Analysis, Hilbert Spectrum, Wavelet Packet Decomposition

References
[1] H. Saruwatari, S. Kurita, K. Takeda, F. Itakura, T. Nishikawa, and K. Shikano, “Blind Source Separation Combining Independent Component Analysis and Beamforming.” EURASIP Journal on Applied Signal Processing, vol. 11, pp. 1135-1146, 2003.
[2] J. M. Valin, J. Rouat, and F. Michaud, “Enhanced Robot Audition Based on Microphone Array Source Separation with Post-Filter,” Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2004.
[3] Y. Ephraim, and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. on Acoustic, Speech and Signals Processing, vol. 32, pp. 1109-1121, 1984.
[4] O. Cappe, “Estimation of the musical noise phenomenon with the Ephraim and Malah noise suppressor,” IEEE Trans. on Acoustic, Speech and Signals Processing, vol. 2, pp. 345-349, 1994.
[5] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. on Acoustic, Speech and Signals Processing, vol. 27, pp. 113-120, 1979.
[6] G. J. Brown, and M. Cooke,“Computational auditory scene analysis,” Computer Speech Language, vol. 8(4), pp. 297-336, 1994.
[7] M. A.Casey, and A. Westner, “Separation of mixed audio sources by independent subspace analysis,” Proc. of International Computer Music Conference, pp. 154-161, 2000.
[8] M. K. I. Molla, and K. Hirose, “Single mixture audio source separation by subspace decomposition of Hilbert spectrum,” IEEE transactions on audio, speech and language processing, vol. 15(3), pp. 893-900, 2007.
[9] Y. Ghanbari, and M. R. K. Mollaei, “A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets”, Speech Communications, Elsevier, vol. 48, pp. 927-940, 2006.
[10] N. E. Huang, Z.Shen, S. R Long, et al. “The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis,” Proc. Roy. Soc. London A, vol. 454, pp. 903-995, 1998.
[11] Z. Wu, and N. E. Huang, “Ensemble empirical mode decomposition: a noise-assisted data analysis method,” Advances in Adaptive Data Analysis, vol. 1(1), 2009.
[12] A. Hyvärinen, and E. Oja, “Independent component analysis: algorithms and applications,”Neural Networks, vol.13(4-5), pp. 411-430, 2000.
[13] J. F. Cardoso, and A. Souloumiac, “Blind beamforming for nongaussian signals,” IEE Proceedings-F,pp. 362-370, 1993.
[14] J. Rosca, D.Erdogmus, J. Princip, and S. Haykin, Independent component analysis and blind signal separation, Springer, 2006.
[15] R. A. Singer, R. G. Sea, “A new filter for optimal tracking in dense multi-target environment,” Proceedings of the ninth Allerton Conference Circuit and System Theory. Urbana-Champaign, USA: Univ. of Illinois, pp. 201-211,1971.
[16] N. E. Huang, et al.,“Application of Hilbert-Huang transform to non-stationary financial time series analysis,” Applied Stochastic Model in Business and Industry, vol. 19, pp. 245-268, 2003.
Cite This Article
  • APA Style

    Md. Ekramul Hamid, Md. Khademul Islam Molla, Md. Iqbal Aziz Khan, Takayoshi Nakai. (2015). Speech Enhancement Using Hilbert Spectrum and Wavelet Packet Based Soft-Thresholding. Science Journal of Circuits, Systems and Signal Processing, 4(1), 1-8. https://doi.org/10.11648/j.cssp.20150401.12

    Copy | Download

    ACS Style

    Md. Ekramul Hamid; Md. Khademul Islam Molla; Md. Iqbal Aziz Khan; Takayoshi Nakai. Speech Enhancement Using Hilbert Spectrum and Wavelet Packet Based Soft-Thresholding. Sci. J. Circuits Syst. Signal Process. 2015, 4(1), 1-8. doi: 10.11648/j.cssp.20150401.12

    Copy | Download

    AMA Style

    Md. Ekramul Hamid, Md. Khademul Islam Molla, Md. Iqbal Aziz Khan, Takayoshi Nakai. Speech Enhancement Using Hilbert Spectrum and Wavelet Packet Based Soft-Thresholding. Sci J Circuits Syst Signal Process. 2015;4(1):1-8. doi: 10.11648/j.cssp.20150401.12

    Copy | Download

  • @article{10.11648/j.cssp.20150401.12,
      author = {Md. Ekramul Hamid and Md. Khademul Islam Molla and Md. Iqbal Aziz Khan and Takayoshi Nakai},
      title = {Speech Enhancement Using Hilbert Spectrum and Wavelet Packet Based Soft-Thresholding},
      journal = {Science Journal of Circuits, Systems and Signal Processing},
      volume = {4},
      number = {1},
      pages = {1-8},
      doi = {10.11648/j.cssp.20150401.12},
      url = {https://doi.org/10.11648/j.cssp.20150401.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.cssp.20150401.12},
      abstract = {A method of and a system for speech enhancement consists of Hilbert spectrum and wavelet packet analysis is studied. We implement ISA to separate speech and interfering signals from single mixture and wavelet packet based soft-thresholding algorithm to enhance the quality of target speech. The mixed signal is projected onto time-frequency (TF) space using empirical mode decomposition (EMD) based Hilbert spectrum (HS). Then a finite set of independent basis vectors are derived from the TF space by applying principal component analysis (PCA) and independent component analysis (ICA) sequentially. The vectors are clustered using hierarchical clustering to represent the independent subspaces corresponding to the component sources in the mixture. However, the speech quality of the separation algorithm is not enough and contains some residual noises. Therefore, in the next stage, the target speech is enhanced using wavelet packet decomposition (WPD) method where the speech activity is monitored by updating noise or unwanted signals statistics. The mode mixing issue of traditional EMD is addressed and resolved using ensemble EMD. The proposed algorithm is also tested using short-time Fourier transform (STFT) based spectrogram method. The simulation results show a noticeable performance in the field of audio source separation and speech enhancement.},
     year = {2015}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Speech Enhancement Using Hilbert Spectrum and Wavelet Packet Based Soft-Thresholding
    AU  - Md. Ekramul Hamid
    AU  - Md. Khademul Islam Molla
    AU  - Md. Iqbal Aziz Khan
    AU  - Takayoshi Nakai
    Y1  - 2015/04/29
    PY  - 2015
    N1  - https://doi.org/10.11648/j.cssp.20150401.12
    DO  - 10.11648/j.cssp.20150401.12
    T2  - Science Journal of Circuits, Systems and Signal Processing
    JF  - Science Journal of Circuits, Systems and Signal Processing
    JO  - Science Journal of Circuits, Systems and Signal Processing
    SP  - 1
    EP  - 8
    PB  - Science Publishing Group
    SN  - 2326-9073
    UR  - https://doi.org/10.11648/j.cssp.20150401.12
    AB  - A method of and a system for speech enhancement consists of Hilbert spectrum and wavelet packet analysis is studied. We implement ISA to separate speech and interfering signals from single mixture and wavelet packet based soft-thresholding algorithm to enhance the quality of target speech. The mixed signal is projected onto time-frequency (TF) space using empirical mode decomposition (EMD) based Hilbert spectrum (HS). Then a finite set of independent basis vectors are derived from the TF space by applying principal component analysis (PCA) and independent component analysis (ICA) sequentially. The vectors are clustered using hierarchical clustering to represent the independent subspaces corresponding to the component sources in the mixture. However, the speech quality of the separation algorithm is not enough and contains some residual noises. Therefore, in the next stage, the target speech is enhanced using wavelet packet decomposition (WPD) method where the speech activity is monitored by updating noise or unwanted signals statistics. The mode mixing issue of traditional EMD is addressed and resolved using ensemble EMD. The proposed algorithm is also tested using short-time Fourier transform (STFT) based spectrogram method. The simulation results show a noticeable performance in the field of audio source separation and speech enhancement.
    VL  - 4
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Dept. of Computer Science and Engineering, University of Rajshahi, Rajshahi, Bangladesh; Dept. of Electric and Electronic Engineering, Shizuoka University, Hamamatsu-shi, Japan

  • Dept. of Computer Science and Engineering, University of Rajshahi, Rajshahi, Bangladesh

  • Dept. of Computer Science and Engineering, University of Rajshahi, Rajshahi, Bangladesh; Dept. of Electric and Electronic Engineering, Shizuoka University, Hamamatsu-shi, Japan

  • Dept. of Electric and Electronic Engineering, Shizuoka University, Hamamatsu-shi, Japan

  • Sections