تحسين أداء الخوارزميات المستخدمة في تمييز الكلام

Authors

حسين الأحمد
طارق علي

Abstract

تعد تقنيات التعرف على الكلام من أهم التقنيات الحديثة وقد تم تطوير العديد من الأنظمة المختلفة من حيث الطرق المستخدمة في استخراج السمات وطرق التصنيف.

يتضمن التعرف على الصوت مجالين هما: التعرف على الكلام والتعرف على المتكلم، حيث اقتصر البحث على مجال التعرف على الكلام.

يقدم البحث مقترحاً لتحسين أداء أنظمة التعرف على الكلمات المفردة عن طريق خوارزمية للجمع بين أكثر من تقنية من التقنيات المستخدمة في استخلاص السمات وتعديل الشبكة العصبونية لدراسة تأثيرها على عملية التعرف ودراسة تأثير الضجيج على النظام المقترح.

تم في هذا البحث دراسة أربع أنظمة لتمييز الكلام, حيث اعتمد النظام الأول خوارزمية MFCCلاستخلاص السمات واعتمد النظام الثاني خوارزمية PLP , في حين اعتمد النظام الثالث على الدمج بين سمات الخوارزميتين السابقتين اضافة إلى معدل تخطي الصفر, وفي النظام الرابع تم تعديل الشبكة العصبونية المستخدمة في عملية التمييز وتقليل نسبة الخطأ فيها, كما قمنا بدراسة أثر الضجيج على هذه الأنظمة السابقة.

تمت مقارنة النتائج من حيث معدل التعرف وزمن تدريب الشبكة العصبونية لكل نظام على حدة, لنحصل على نسبة تعرف وصلت حتى 98% باستخدام النظام المقترح.

The speech recognition is one of the most modern techniques, many related systems were developed, which they differ in feature extraction methods and classification methods.

Voice recognition is divided into two areas: speech recognition and speaker recognition, however the research was limited to focus on the field of speech recognition.

The research offers a proposal to improve the performance of single word recognition systems by an algorithm to combine more than one of the techniques used in features extraction, and modify the neural network to study its effect on speech recognition, and to study the effect of noise on the proposed system.

Four systems were studied for speech recognition, first one used MFCC algorithm for features extraction, second one used PLP algorithm, third one merged MFCC, PLP, and zero crossing rate features ,in the last system we modified the neural network with less error rate, We have studied the impact of noise on these previous systems.

The research provided a comparative study for the recognition ratio, and training time for each system, to obtain a recognition ratio reached up to 98% using the proposed system.

References

- م.راما غسان حسن, تحسين نتائج التعرف على الصوت بالاعتماد على نتائج تكامل أنظمة مختلفة, جامعة تشرين, 2015, 85.

INGE GAVAT, DIANA MILITARU, New trends in machine learning in speech recognition , SISOM 2015 Bucharest 21-22 May, pp 276.

POONAM SHARMA, ANGALI GARG, Feature Extraction and Recognition of Hindi Spoken Words using Neural Networks, International Journal of Computer Applications (0975 – 8887) Volume 142 – No.7, May 2016, pp 17.

VETON Z. KËPUSKA, HUSSIEN A. ELHARATI, Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC,LPCC, PLP, RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions, Journal of Computer and Communications, 2015, 3, 1-9, 9.

YUSRA FAISAL AL-IRAHYIM, LUJAIN YOUNIS ABDULKADER, Speaker Dependent Speech Recognition in Computer Game Control, International Journal of Computer Applications (0975–8887)Volume 158–No 4,January 2017, 37.

DIAMANTARAS K. AND KUNG S, Principle Component Neural Networks Theory and Applications, New York, John Wiley & Sons Inc, 2006, 255.

LAVNEET SINGH, GIRIJA CHETTY, A Comparative Study of Recognition of Speech Using Improved MFCC Algorithms and Rasta Filters, Information Systems, Technology and Management Communications in Computer and Information Science Volume 285, 2012, pp 304-314.

BHAVNA SHARMA, K. VENUGOPALAN, Comparison of Neural Network Training Functions for Hematoma Classification in Brain CT Images. IOSR Journal of Computer Engineering, Volume 16, Issue 1. Jan 2014, pp 35.

PITZ, M, SCHLUTER R, NEY H, MOLAU S, Computing Mel-frequency cepstral coefficients on the power spectrum, Print ISBN: 0-7803-7041-4 INSPEC Accession Number: 7120280 Acoustics, Speech, and Signal Processing, 2001, Proceedings. (ICASSP '01). 2001 IEEE International Conference on (Volume: 1) Page 73 - 76 vol.1, pp12.

H. HERMANSKY, Perceptual linear predictive (PLP) analysis of speech, Speech Technology Laboratory, Division of Panasonic Technologies, Inc. 3888 State Street, Santa Barbara, California 93105. 1989, pp 1752.

H. DEMUTH, M. BEALE, Neural Networks Toolbox User’s Guide. The MathWorks, Inc. 1992-2002, pp 826.

P. RANI, S. KAKKAR, S. RANI, Speech recognition using neural networks. International conference on advancement in engineering and technology. 2015, pp 14.

N. DAVE, Features Extraction Methods LPC, PLP and MFCC in Speech Recognition, International Journal for Advanced Research in Engineering and Technology. Vol.1, Issue VI, July 2013, pp5.

BHUSHAN C. KAMLE, Speech recognition using artificial neural networks, Int’l journal of Computing, Communication & Instrumentation Engg, (IJCCIE), Vol 3, Issue 1, 2016, pp 4.

A. MANSOUR, G. SALH, H. Z. ALABDEN, Speech recognition using back propagation algorithm in neural network, International Journal of Computer Trends and Technology(IJCTT), Vol 23,Number 3, May 2015, pp21.

Downloads

Published

2018-12-30

How to Cite

الأحمد ح, علي ط. تحسين أداء الخوارزميات المستخدمة في تمييز الكلام. Tuj-eng [Internet]. 2018Dec.30 [cited 2025Jan.28];40(6). Available from: https://journal.tishreen.edu.sy/index.php/engscnc/article/view/5804

Download Citation

Issue

Vol. 40 No. 6 (2018): العلوم الهندسية

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The authors retain the copyright and grant the right to publish in the magazine for the first time with the transfer of the commercial right to Tishreen University Journal for Research and Scientific Studies - Engineering Sciences Series

Under a CC BY- NC-SA 04 license that allows others to share the work with of the work's authorship and initial publication in this journal. Authors can use a copy of their articles in their scientific activity, and on their scientific websites, provided that the place of publication is indicted in Tishreen University Journal for Research and Scientific Studies - Engineering Sciences Series . The Readers have the right to send, print and subscribe to the initial version of the article, and the title of Tishreen University Journal for Research and Scientific Studies - Engineering Sciences Series Publisher

journal uses a CC BY-NC-SA license which mean

You are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material
The licensor cannot revoke these freedoms as long as you follow the license terms.

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial — You may not use the material for commercial purposes.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

تحسين أداء الخوارزميات المستخدمة في تمييز الكلام

Authors

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information

Developed By

Language

Browse

Make a Submission