Localization of Speaker using Fusion Techniques and Neural Network Algorithms
DOI:
https://doi.org/10.31185/wjps.399Keywords:
Speaker localization, Data fusion, feature fusion, RBM, LSTM.Abstract
ABSTACT
Sound source localization especially speech and speaker is sole of the most significant techniques recently because used in various applications like smart environments, industry, robots, and audio conferences. So, the usage of these techniques needs more accuracy. In this paper, a speaker localization proposed it depends on the speech signals in closed spaces by employing fusion techniques and neural networks (NN) algorithms to get more accuracy. The proposed work included finding the classification of the speaker signals, which included three phases: the preprocessing phase, the phase of the feature extraction and classification phase. Data Fusion technique used to generate the dataset of speakers. In feature extraction phase features fusion technique was used for constructing a feature vector by using Generalized Cross Correlation (GCC) for time delay estimation, Root_MUSIC, and Minimum Variance Distortion Less (MVDR) for a direction of arrival for the signal source. In the classification stage two NN algorithms used, Restricted Boltzmann Machine (RBM), which implemented using Tensor flow library and Long Short-Term Memory (LSTM), which implemented using Keras library. The experiments results shows that the accuracy of the two methods was 99.84%, 99.15% for RBM, and LSTM respectively.
References
T. Kundu, "Acoustic source localization," Ultrasonics, vol. 54, pp. 25-38, 2014.
J. Van Opstal, The auditory system and human sound-localization behavior: Academic Press, 2016.
N. Dey and A. S. Ashour, Direction of arrival estimation and localization of multi-speech sources: Springer, 2018.
C. Lenz, "Localization of Sound Sources, Studies on Mechatronics," PhD Thesis, Autonomous Systems Lab, Swiss Federal Institute of Technology Press, 2009.
L. Perotin, R. Serizel, E. Vincent, and A. Guérin, "CRNN-based multiple DoA estimation using acoustic intensity features for Ambisonics recordings," IEEE Journal of Selected Topics in Signal Processing, vol. 13, pp. 22-33, 2019.
J. M. Vera-Diaz, D. Pizarro, and J. Macias-Guarasa, "Towards end-to-end acoustic localization using deep learning: From audio signals to source position coordinates," Sensors, vol. 18, p. 3418, 2018.
S. Sivasankaran, E. Vincent, and D. Fohr, "Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment," 2018.
F. Vesperini, P. Vecchiotti, E. Principi, S. Squartini, and F. Piazza, "Localizing speakers in multiple rooms by using deep neural networks," Computer Speech & Language, vol. 49, pp. 83-106, 2018.
A. Cullen, A. Hines, and N. Harte, "Perception and prediction of speaker appeal–A single speaker study," Computer Speech & Language, vol. 52, pp. 23-40, 2018.
L. Feng, "Speaker recognition," Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark, 2004.
T. van Niedek, T. Heskes, and D. van Leeuwen, "Phonetic Classification in TensorFlow," Bachelor’s Thesis, Radboud University, 2016.
M. Cobos, F. Antonacci, L. Comanducci, and A. Sarti, "Frequency-Sliding Generalized Cross-Correlation: A Sub-band Time Delay Estimation Approach," arXiv preprint arXiv:1910.08838, 2019.
T. Padois, "Acoustic source localization based on the generalized cross-correlation and the generalized mean with few microphones," The Journal of the Acoustical Society of America, vol. 143, pp. EL393-EL398, 2018.
C. C. Lai, S. E. Nordholm, and Y. H. Leung, A Study Into the Design of Steerable Microphone Arrays: Springer, 2017.
H. Chen and L. Cao, "Multiple sound source localization using gammatone auditory filtering and direct sound componence detection," in IOP Conference Series: Earth and Environmental Science, 2017.
Q. Huang, R. Hu, and Y. Fang, "Real-valued MVDR beamforming using spherical arrays with frequency invariant characteristic," Digital Signal Processing, vol. 48, pp. 239-245, 2016.
Y. Xiao, J. Yin, H. Qi, H. Yin, and G. Hua, "MVDR algorithm based on estimated diagonal loading for beamforming," Mathematical Problems in Engineering, vol. 2017, 2017.
S. A. Vorobyov, "Principles of minimum variance robust adaptive beamforming design," Signal Processing, vol. 93, pp. 3264-3277, 2013.
F.-G. Yan, S. Liu, J. Wang, and M. Jin, "Two-Step Root-MUSIC for Direction of Arrival Estimation without EVD/SVD Computation," International Journal of Antennas and Propagation, vol. 2018, 2018.
A. Patwari and G. Reddy, "1D direction of arrival estimation using root-MUSIC and ESPRIT for dense uniform linear arrays," in Recent Trends in Electronics, Information & Communication Technology (RTEICT), 2017 2nd IEEE International Conference, pp. 667-672, 2017.
Y. C. Liwei HUANG, Huiqin CHEN, "Research of DOA Estimation Based on Modified MUSIC Algorithms," Advances in Engineering Research, vol. 118, 2017.
L. Bao, X. Sun, Y. Chen, G. Man, and H. Shao, "Restricted boltzmann machine-assisted estimation of distribution algorithm for complex problems," Complexity, vol. 2018, 2018.
H. Hu, L. Gao, and Q. Ma, "Deep restricted boltzmann networks," arXiv preprint arXiv:1611.07917, 2016.
H. Larochelle, M. Mandel, R. Pascanu, and Y. Bengio, "Learning algorithms for the classification restricted boltzmann machine," Journal of Machine Learning Research, vol. 13, pp. 643-669, 2012.
S. Zheng, K. Ristovski, A. Farahat, and C. Gupta, "Long short-term memory network for remaining useful life estimation," in Prognostics and Health Management (ICPHM), 2017 IEEE International Conference, pp. 88-95, 2017.
D. B. Silva, P. P. Cruz, and A. M. Gutierrez, "Are the long–short term memory and convolution neural networks really based on biological systems?," ICT Express, vol. 4, pp. 100-106, 2018.
T. Fischer and C. Krauss, "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, vol. 270, pp. 654-669, 2018.
J. Medina-Quero, S. Zhang, C. Nugent, and M. Espinilla, "Ensemble classifier of long short-term memory with fuzzy temporal windows on binary sensors for activity recognition," Expert Systems with Applications, vol. 114, pp. 441-453, 2018.
M. Kumar and A. G. Singh, "Performance Analysis of LPC and MFCC Techniques in Automatic Speech Recognition," 2015.
P. L. Chithra and R. Aparna, "Performance Analysis of Windowing Techniques in Automatic Speech Signal Segmentation," Indian Journal of Science and Technology, vol. Vol 8, November 2015.
N. McClure, TensorFlow machine learning cookbook: Packt Publishing Ltd, book, 2017.
A. Nandy and M. Biswas, "Reinforcement learning with keras, tensorflow, and chainerrl," in Reinforcement Learning, ed: Springer, 2018, pp. 129-153.
S. Asskali, "Polyp Detection: Effect of Early and Late Feature Fusion,"Master thesis, 2017.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 sawsan jaddoa, Rasha H. Ali, Mohammed Najm Abdullah, Buthainah F. Abed

This work is licensed under a Creative Commons Attribution 4.0 International License.