| Peer-Reviewed

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

Received: 11 September 2017    Accepted: 21 September 2017    Published: 23 October 2017
Views:       Downloads:
Abstract

This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero- crossing rates are one of the most distinguishable time domain features of a speech signal to classify its voiced activity into voiced/unvoiced segment. A new idea is developed where frame by frame processing has done in narrow band speech signal using spectrogram image. Two time domain features, short-time energy (STE) and short-time zero-crossing rate (ZCR) are used to classify its voiced/unvoiced parts. In the first stage, each frame of the analyzing spectrogram is divided into three separate sub bands and examines their short-time energy ratio pattern. Then an energy ratio pattern matching look up table is used to classify the voicing activity. However, this method successfully classifies patterns 1 through 4 but fails in the rest of the patterns in the look up table. Therefore, the rest of the patterns are confirmed in the second stage where frame wise short-time average zero- crossing rate is compared with a threshold value. In this study, the threshold value is calculated from the short-time average zero-crossing rate of White Gaussian Noise (wGn). The accuracy of the proposed method is evaluated using both male and female speech waveforms under different signal-to-noise ratios (SNRs). Experimental results show that the proposed method achieves better accuracy than the conventional methods in the literature.

Published in Science Journal of Circuits, Systems and Signal Processing (Volume 6, Issue 2)
DOI 10.11648/j.cssp.20170602.12
Page(s) 11-17
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Voiced/Unvoiced Classification, Spectrogram Image, Short-time Energy Ratio, Energy Ratio Pattern, Short-time Zero-crossing Rate, White Gaussian Noise

References
[1] Jong Kwan Lee, Chang D. Yoo, “Wavelet speech enhancement based on voiced/unvoiced decision”, Korea Advanced Institute of Science and Technology The 32nd International Congress and Exposition on Noise Control Engineering, Jeju International Convention Center, Seogwipo, Korea, August 25-28, 2003.
[2] B. Atal, and L. Rabiner, “A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition,” IEEE Trans. On ASSP, vol. ASSP-24, pp. 201-212, 1976.
[3] S. Ahmadi, and A. S. Spanias, “Cepstrum-Based Pitch Detection using a New Statistical V/UV Classification Algorithm,” IEEE Trans. Speech Audio Processing, vol. 7 No. 3, pp. 333-338, 1999.
[4] Y. Qi, and B. R. Hunt, “Voiced-Unvoiced-Silence Classifications of Speech using Hybrid Features and a Network Classifier,” IEEE Trans. Speech Audio Processing, vol. 1 No. 2, pp. 250-255, 1993.
[5] L. Siegel, “A Procedure for using Pattern Classification Techniques to obtain a Voiced/Unvoiced Classifier”, IEEE Trans. on ASSP, vol. ASSP-27, pp. 83- 88, 1979.
[6] T. L. Burrows, “Speech Processing with Linear and Neural Network Models”, Ph.D. thesis, Cambridge University Engineering Department, U.K., 1996.
[7] D. G. Childers, M. Hahn, and J. N. Larar, “Silent and Voiced/Unvoiced/Mixed Excitation (Four-Way) Classification of Speech,” IEEE Trans. on ASSP, vol. 37 No. 11, pp. 1771-1774, 1989.
[8] Jashmin K. Shah, Ananth N. Iyer, Brett Y. Smolenski, and Robert E. Yantorno “Robust voiced/unvoiced classification using novel features and Gaussian Mixture model”, Speech Processing Lab., ECE Dept., Temple University, 1947 N 12th St., Philadelphia, PA 19122-6077, USA.
[9] Jaber Marvan, “Voice Activity detection Method and Apparatus for voiced/unvoiced decision and Pitch Estimation in a Noisy speech feature extraction”, 08/23/2007, United States Patent 20070198251.
[10] Rabiner, L. R., and Schafer, R. W., Digital Processing of Speech Signals, Englewood Cliffs, New Jersey, Prentice Hall, 512-ISBN-13: 9780132136037, 1978.
[11] Karen Kafadar,” Gaussian white-noise generation for digital signal synthesis” IEEE Transactions on Instrumentation and Measurement, Volume: IM-35, Issue: 4, Dec. 1986 DOI: 10.1109/TIM.1986.6499122
[12] Titze, I. R. “Principles of Voice Production”, Prentice Hall (currently published by NCVS.org) (pp. 188), 1994, ISBN 978-0-13-717893-3.
[13] Baken, R. J. “Clinical Measurement of Speech and Voice”. London: Taylor and Francis Ltd. (pp. 177), 1987, ISBN 1-5659-3869-0.
[14] Alkulaibi, A., Soraghan, J. J., and Durrani, T. S., “Fast HOS based simultaneous voiced/unvoiced detection and pitch estimation using 3-level binary speech signals”, in the proceedings of 8th IEEE Signal Processing Workshop on Statistical Signal and Array Processing, pp. 194-197, 1996.
[15] Lobo, and Loizou, P., "Voiced/unvoiced speech discrimination in noise using Gabor atomic decomposition”, in the Proceedings of ICASSP, pp. 820-823, 2003.
Cite This Article
  • APA Style

    Kazi Mahmudul Hassan, Ekramul Hamid, Khademul Islam Molla. (2017). A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image. Science Journal of Circuits, Systems and Signal Processing, 6(2), 11-17. https://doi.org/10.11648/j.cssp.20170602.12

    Copy | Download

    ACS Style

    Kazi Mahmudul Hassan; Ekramul Hamid; Khademul Islam Molla. A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image. Sci. J. Circuits Syst. Signal Process. 2017, 6(2), 11-17. doi: 10.11648/j.cssp.20170602.12

    Copy | Download

    AMA Style

    Kazi Mahmudul Hassan, Ekramul Hamid, Khademul Islam Molla. A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image. Sci J Circuits Syst Signal Process. 2017;6(2):11-17. doi: 10.11648/j.cssp.20170602.12

    Copy | Download

  • @article{10.11648/j.cssp.20170602.12,
      author = {Kazi Mahmudul Hassan and Ekramul Hamid and Khademul Islam Molla},
      title = {A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image},
      journal = {Science Journal of Circuits, Systems and Signal Processing},
      volume = {6},
      number = {2},
      pages = {11-17},
      doi = {10.11648/j.cssp.20170602.12},
      url = {https://doi.org/10.11648/j.cssp.20170602.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.cssp.20170602.12},
      abstract = {This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero- crossing rates are one of the most distinguishable time domain features of a speech signal to classify its voiced activity into voiced/unvoiced segment. A new idea is developed where frame by frame processing has done in narrow band speech signal using spectrogram image. Two time domain features, short-time energy (STE) and short-time zero-crossing rate (ZCR) are used to classify its voiced/unvoiced parts. In the first stage, each frame of the analyzing spectrogram is divided into three separate sub bands and examines their short-time energy ratio pattern. Then an energy ratio pattern matching look up table is used to classify the voicing activity. However, this method successfully classifies patterns 1 through 4 but fails in the rest of the patterns in the look up table. Therefore, the rest of the patterns are confirmed in the second stage where frame wise short-time average zero- crossing rate is compared with a threshold value. In this study, the threshold value is calculated from the short-time average zero-crossing rate of White Gaussian Noise (wGn). The accuracy of the proposed method is evaluated using both male and female speech waveforms under different signal-to-noise ratios (SNRs). Experimental results show that the proposed method achieves better accuracy than the conventional methods in the literature.},
     year = {2017}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image
    AU  - Kazi Mahmudul Hassan
    AU  - Ekramul Hamid
    AU  - Khademul Islam Molla
    Y1  - 2017/10/23
    PY  - 2017
    N1  - https://doi.org/10.11648/j.cssp.20170602.12
    DO  - 10.11648/j.cssp.20170602.12
    T2  - Science Journal of Circuits, Systems and Signal Processing
    JF  - Science Journal of Circuits, Systems and Signal Processing
    JO  - Science Journal of Circuits, Systems and Signal Processing
    SP  - 11
    EP  - 17
    PB  - Science Publishing Group
    SN  - 2326-9073
    UR  - https://doi.org/10.11648/j.cssp.20170602.12
    AB  - This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero- crossing rates are one of the most distinguishable time domain features of a speech signal to classify its voiced activity into voiced/unvoiced segment. A new idea is developed where frame by frame processing has done in narrow band speech signal using spectrogram image. Two time domain features, short-time energy (STE) and short-time zero-crossing rate (ZCR) are used to classify its voiced/unvoiced parts. In the first stage, each frame of the analyzing spectrogram is divided into three separate sub bands and examines their short-time energy ratio pattern. Then an energy ratio pattern matching look up table is used to classify the voicing activity. However, this method successfully classifies patterns 1 through 4 but fails in the rest of the patterns in the look up table. Therefore, the rest of the patterns are confirmed in the second stage where frame wise short-time average zero- crossing rate is compared with a threshold value. In this study, the threshold value is calculated from the short-time average zero-crossing rate of White Gaussian Noise (wGn). The accuracy of the proposed method is evaluated using both male and female speech waveforms under different signal-to-noise ratios (SNRs). Experimental results show that the proposed method achieves better accuracy than the conventional methods in the literature.
    VL  - 6
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Department of Computer Science & Engineering, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh, Bangladesh

  • Department of Computer Science & Engineering, University of Rajshahi, Rajshahi, Bangladesh

  • Department of Computer Science & Engineering, University of Rajshahi, Rajshahi, Bangladesh

  • Sections