A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

Kazi Mahmudul Hassan; Ekramul Hamid; Khademul Islam Molla

doi:doi:10.11648/j.cssp.20170602.12

| Peer-Reviewed

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

Kazi Mahmudul Hassan, Ekramul Hamid, Khademul Islam Molla

Published in Science Journal of Circuits, Systems and Signal Processing (Volume 6, Issue 2)

Received: 11 September 2017 Accepted: 21 September 2017 Published: 23 October 2017

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero- crossing rates are one of the most distinguishable time domain features of a speech signal to classify its voiced activity into voiced/unvoiced segment. A new idea is developed where frame by frame processing has done in narrow band speech signal using spectrogram image. Two time domain features, short-time energy (STE) and short-time zero-crossing rate (ZCR) are used to classify its voiced/unvoiced parts. In the first stage, each frame of the analyzing spectrogram is divided into three separate sub bands and examines their short-time energy ratio pattern. Then an energy ratio pattern matching look up table is used to classify the voicing activity. However, this method successfully classifies patterns 1 through 4 but fails in the rest of the patterns in the look up table. Therefore, the rest of the patterns are confirmed in the second stage where frame wise short-time average zero- crossing rate is compared with a threshold value. In this study, the threshold value is calculated from the short-time average zero-crossing rate of White Gaussian Noise (wGn). The accuracy of the proposed method is evaluated using both male and female speech waveforms under different signal-to-noise ratios (SNRs). Experimental results show that the proposed method achieves better accuracy than the conventional methods in the literature.

Published in	Science Journal of Circuits, Systems and Signal Processing (Volume 6, Issue 2)
DOI	10.11648/j.cssp.20170602.12
Page(s)	11-17
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Voiced/Unvoiced Classification, Spectrogram Image, Short-time Energy Ratio, Energy Ratio Pattern, Short-time Zero-crossing Rate, White Gaussian Noise

References

[1]	Jong Kwan Lee, Chang D. Yoo, “Wavelet speech enhancement based on voiced/unvoiced decision”, Korea Advanced Institute of Science and Technology The 32nd International Congress and Exposition on Noise Control Engineering, Jeju International Convention Center, Seogwipo, Korea, August 25-28, 2003.
[2]	B. Atal, and L. Rabiner, “A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition,” IEEE Trans. On ASSP, vol. ASSP-24, pp. 201-212, 1976.
[3]	S. Ahmadi, and A. S. Spanias, “Cepstrum-Based Pitch Detection using a New Statistical V/UV Classification Algorithm,” IEEE Trans. Speech Audio Processing, vol. 7 No. 3, pp. 333-338, 1999.
[4]	Y. Qi, and B. R. Hunt, “Voiced-Unvoiced-Silence Classifications of Speech using Hybrid Features and a Network Classifier,” IEEE Trans. Speech Audio Processing, vol. 1 No. 2, pp. 250-255, 1993.
[5]	L. Siegel, “A Procedure for using Pattern Classification Techniques to obtain a Voiced/Unvoiced Classifier”, IEEE Trans. on ASSP, vol. ASSP-27, pp. 83- 88, 1979.
[6]	T. L. Burrows, “Speech Processing with Linear and Neural Network Models”, Ph.D. thesis, Cambridge University Engineering Department, U.K., 1996.
[7]	D. G. Childers, M. Hahn, and J. N. Larar, “Silent and Voiced/Unvoiced/Mixed Excitation (Four-Way) Classification of Speech,” IEEE Trans. on ASSP, vol. 37 No. 11, pp. 1771-1774, 1989.
[8]	Jashmin K. Shah, Ananth N. Iyer, Brett Y. Smolenski, and Robert E. Yantorno “Robust voiced/unvoiced classification using novel features and Gaussian Mixture model”, Speech Processing Lab., ECE Dept., Temple University, 1947 N 12th St., Philadelphia, PA 19122-6077, USA.
[9]	Jaber Marvan, “Voice Activity detection Method and Apparatus for voiced/unvoiced decision and Pitch Estimation in a Noisy speech feature extraction”, 08/23/2007, United States Patent 20070198251.
[10]	Rabiner, L. R., and Schafer, R. W., Digital Processing of Speech Signals, Englewood Cliffs, New Jersey, Prentice Hall, 512-ISBN-13: 9780132136037, 1978.
[11]	Karen Kafadar,” Gaussian white-noise generation for digital signal synthesis” IEEE Transactions on Instrumentation and Measurement, Volume: IM-35, Issue: 4, Dec. 1986 DOI: 10.1109/TIM.1986.6499122
[12]	Titze, I. R. “Principles of Voice Production”, Prentice Hall (currently published by NCVS.org) (pp. 188), 1994, ISBN 978-0-13-717893-3.
[13]	Baken, R. J. “Clinical Measurement of Speech and Voice”. London: Taylor and Francis Ltd. (pp. 177), 1987, ISBN 1-5659-3869-0.
[14]	Alkulaibi, A., Soraghan, J. J., and Durrani, T. S., “Fast HOS based simultaneous voiced/unvoiced detection and pitch estimation using 3-level binary speech signals”, in the proceedings of 8th IEEE Signal Processing Workshop on Statistical Signal and Array Processing, pp. 194-197, 1996.
[15]	Lobo, and Loizou, P., "Voiced/unvoiced speech discrimination in noise using Gabor atomic decomposition”, in the Proceedings of ICASSP, pp. 820-823, 2003.

Cite This Article

Plain Text BibTeX RIS

APA Style

Kazi Mahmudul Hassan, Ekramul Hamid, Khademul Islam Molla. (2017). A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image. Science Journal of Circuits, Systems and Signal Processing, 6(2), 11-17. https://doi.org/10.11648/j.cssp.20170602.12

Copy | Download

ACS Style

Kazi Mahmudul Hassan; Ekramul Hamid; Khademul Islam Molla. A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image. Sci. J. Circuits Syst. Signal Process. 2017, 6(2), 11-17. doi: 10.11648/j.cssp.20170602.12

Copy | Download

AMA Style

Kazi Mahmudul Hassan, Ekramul Hamid, Khademul Islam Molla. A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image. Sci J Circuits Syst Signal Process. 2017;6(2):11-17. doi: 10.11648/j.cssp.20170602.12

Copy | Download

@article{10.11648/j.cssp.20170602.12,
  author = {Kazi Mahmudul Hassan and Ekramul Hamid and Khademul Islam Molla},
  title = {A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image},
  journal = {Science Journal of Circuits, Systems and Signal Processing},
  volume = {6},
  number = {2},
  pages = {11-17},
  doi = {10.11648/j.cssp.20170602.12},
  url = {https://doi.org/10.11648/j.cssp.20170602.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.cssp.20170602.12},
  abstract = {This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero- crossing rates are one of the most distinguishable time domain features of a speech signal to classify its voiced activity into voiced/unvoiced segment. A new idea is developed where frame by frame processing has done in narrow band speech signal using spectrogram image. Two time domain features, short-time energy (STE) and short-time zero-crossing rate (ZCR) are used to classify its voiced/unvoiced parts. In the first stage, each frame of the analyzing spectrogram is divided into three separate sub bands and examines their short-time energy ratio pattern. Then an energy ratio pattern matching look up table is used to classify the voicing activity. However, this method successfully classifies patterns 1 through 4 but fails in the rest of the patterns in the look up table. Therefore, the rest of the patterns are confirmed in the second stage where frame wise short-time average zero- crossing rate is compared with a threshold value. In this study, the threshold value is calculated from the short-time average zero-crossing rate of White Gaussian Noise (wGn). The accuracy of the proposed method is evaluated using both male and female speech waveforms under different signal-to-noise ratios (SNRs). Experimental results show that the proposed method achieves better accuracy than the conventional methods in the literature.},
 year = {2017}
}

Copy | Download

TY - JOUR
T1 - A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image
AU - Kazi Mahmudul Hassan
AU - Ekramul Hamid
AU - Khademul Islam Molla
Y1 - 2017/10/23
PY - 2017
N1 - https://doi.org/10.11648/j.cssp.20170602.12
DO - 10.11648/j.cssp.20170602.12
T2 - Science Journal of Circuits, Systems and Signal Processing
JF - Science Journal of Circuits, Systems and Signal Processing
JO - Science Journal of Circuits, Systems and Signal Processing
SP - 11
EP - 17
PB - Science Publishing Group
SN - 2326-9073
UR - https://doi.org/10.11648/j.cssp.20170602.12
AB - This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero- crossing rates are one of the most distinguishable time domain features of a speech signal to classify its voiced activity into voiced/unvoiced segment. A new idea is developed where frame by frame processing has done in narrow band speech signal using spectrogram image. Two time domain features, short-time energy (STE) and short-time zero-crossing rate (ZCR) are used to classify its voiced/unvoiced parts. In the first stage, each frame of the analyzing spectrogram is divided into three separate sub bands and examines their short-time energy ratio pattern. Then an energy ratio pattern matching look up table is used to classify the voicing activity. However, this method successfully classifies patterns 1 through 4 but fails in the rest of the patterns in the look up table. Therefore, the rest of the patterns are confirmed in the second stage where frame wise short-time average zero- crossing rate is compared with a threshold value. In this study, the threshold value is calculated from the short-time average zero-crossing rate of White Gaussian Noise (wGn). The accuracy of the proposed method is evaluated using both male and female speech waveforms under different signal-to-noise ratios (SNRs). Experimental results show that the proposed method achieves better accuracy than the conventional methods in the literature.
VL - 6
IS - 2
ER -

Copy | Download

Author Information

Kazi Mahmudul Hassan

Department of Computer Science & Engineering, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh, Bangladesh
Ekramul Hamid

Department of Computer Science & Engineering, University of Rajshahi, Rajshahi, Bangladesh
Khademul Islam Molla

Department of Computer Science & Engineering, University of Rajshahi, Rajshahi, Bangladesh

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Kazi Mahmudul Hassan, Ekramul Hamid, Khademul Islam Molla. (2017). A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image. Science Journal of Circuits, Systems and Signal Processing, 6(2), 11-17. https://doi.org/10.11648/j.cssp.20170602.12

Copy | Download

ACS Style

Kazi Mahmudul Hassan; Ekramul Hamid; Khademul Islam Molla. A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image. Sci. J. Circuits Syst. Signal Process. 2017, 6(2), 11-17. doi: 10.11648/j.cssp.20170602.12

Copy | Download

AMA Style

Kazi Mahmudul Hassan, Ekramul Hamid, Khademul Islam Molla. A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image. Sci J Circuits Syst Signal Process. 2017;6(2):11-17. doi: 10.11648/j.cssp.20170602.12

Copy | Download

@article{10.11648/j.cssp.20170602.12,
  author = {Kazi Mahmudul Hassan and Ekramul Hamid and Khademul Islam Molla},
  title = {A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image},
  journal = {Science Journal of Circuits, Systems and Signal Processing},
  volume = {6},
  number = {2},
  pages = {11-17},
  doi = {10.11648/j.cssp.20170602.12},
  url = {https://doi.org/10.11648/j.cssp.20170602.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.cssp.20170602.12},
  abstract = {This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero- crossing rates are one of the most distinguishable time domain features of a speech signal to classify its voiced activity into voiced/unvoiced segment. A new idea is developed where frame by frame processing has done in narrow band speech signal using spectrogram image. Two time domain features, short-time energy (STE) and short-time zero-crossing rate (ZCR) are used to classify its voiced/unvoiced parts. In the first stage, each frame of the analyzing spectrogram is divided into three separate sub bands and examines their short-time energy ratio pattern. Then an energy ratio pattern matching look up table is used to classify the voicing activity. However, this method successfully classifies patterns 1 through 4 but fails in the rest of the patterns in the look up table. Therefore, the rest of the patterns are confirmed in the second stage where frame wise short-time average zero- crossing rate is compared with a threshold value. In this study, the threshold value is calculated from the short-time average zero-crossing rate of White Gaussian Noise (wGn). The accuracy of the proposed method is evaluated using both male and female speech waveforms under different signal-to-noise ratios (SNRs). Experimental results show that the proposed method achieves better accuracy than the conventional methods in the literature.},
 year = {2017}
}

Copy | Download