A review of fuzzy reinforcement learning methods with critical-only architecture

Authors

20.1001.1.27174409.1397.1.2.2.5/DOR

Abstract

This article reviews fuzzy reinforcement learning methods with critical-only architecture. Fuzzy reinforcement learning is the result of combining fuzzy systems as a comprehensive approximator and reinforcement learning method. Reinforcement learning is a powerful learning method that adjusts system parameters online using only a numerical signal of reward or penalty. In critical architecture, only a zero-order Sugeno fuzzy system is used to approximate the value-action function, and the final action is obtained based on the value of the candidate actions in the sequence of each fuzzy rule and learning (FQL). In this paper, two basic methods called fuzzy Q learning to regulate the value of candidate's actions are described. In these two methods (FSL) fuzzy Sarsa is used to generalize the methods of learning standard Q and standard Sarsa learning, respectively. The existence of positive mathematical analysis of convergence is FQL over FSL. There are the most important advantages and extensions of FSL and FQL. FQL methods, while examples of divergence, have been used in many control issues such as robot movement, robot arm movement, boat movement, computer network routing, and wind farm control.

Keywords


[1] ف. آخوندیT ا. خانی، و. در همیT به کارگیری آموزش تقویتی گسسته در فضای پیوسته با استفاده از ایده گسسته سازی تطبیقی، پانزدهمین کنفرانس ملی سالانه انجمن کامپیوتر ایران، تهران، (1388) .
 
[2] ف. اعلمی یان هرندی، و. در همی، تنظیم پارامترهای مقدم و وزن قواعد فازی در یک طبقه بندی فازی، بیستمین کنفرانس ملی سالانه انجمن کامپیوتر ایران (2015 CSICC) - دانشگاه فردوسی مشهد، (1393).
 
[3] ف. توکلی، و. در همی، ع. کمالی نژاد، بکارگیری دو مرحله ای یادگیری سارسای فازی در کنترل راه دوپا، چهارمین کنگره ی مشترک سیستمهای فازی و هوشمند ایران، دانشگاه سیستان و بلوچستان، زاهدان، (1394).
 
[4] و. درهمی، ف. اعلمی یان هرندی، .م.ب. دولتشاهی، یادگیری تقویتی، انتشارات دانشگاه یزد، (1396).
 
[5] S. Abe and R. Thawonmas, A fuzzy classifier with ellipsoidal regions, IEEE Transactions on Fuzzy Systems, 5(3)(1997), 358-368.
 
[6] F. Alamiyan Harandi and V. Derhami, A reinforcement learning algorithm for adjusting antecedent parameters and weights of fuzzy rules in a fuzzy classifier, Journal of Intelligent and Fuzzy Systems, 30(4)(2016), 2339-2347.
 
[7] F. Alamiyan Harandi, V. Derhami and F. Jamshidi, A new framework for mobile robot trajectory tracking using depth data and learning algorithms, Journal of Intelligent and Fuzzy Systems, 34(6)(2018), 3969-3982.
 
[8] V. Derhami, Similarity of learned helplessness in human being and fuzzy reinforcement learning algorithms, Journal of Intelligent and Fuzzy Systems, 24(2013), 347- 354.
 
[9] V. Derhami, V. Johari Majd and M. Nili Ahmadabadi, Exploration and exploitation balance management in fuzzy reinforcement learning, Fuzzy sets and systems, 161(4)(2010), 578-595.
 
[10] V. Derhami, V. Johari Majd and M. Nili Ahmadabadi, Fuzzy Sarsa learning and the proof of existence of its stationary points, Asian Journal of Control, 10(5)(2008), 535-549.
 
[11] V. Derhami, V. Johari Majd and M. Nili Ahmadabadi, Improvement of fuzzy Q-learning using expertness criteria, Proc. 10th annual Computer Society of Iran Computer Conference, 1(2005), 1002-1009.
 
[12] F. Fathinezhad, V. Derhami and M. Rezaeian, Supervised fuzzy reinforcement learning for robot navigation, Applied Soft Computing, 40(2016), 33-41.
 
[13] H. Ishibuchi, K. Nozaki and H. Tanaka, Distributed representation of fuzzy rules and its application to pattern classification, Fuzzy Sets and Systems, 52(1992), 21- 32.
 
[14] J. Jang, C. Sun and E. Mizutani, Neuro-Fuzzy and Soft Computing. Prentice-Hall, upper Sanddle River, (1997).
 
[15] L. Jouffe, Fuzzy inference system learning by reinforcement methods, IEEE Trans. Syst., Man, Cybern. C, 28(3)(1998), 338-355.
 
[16] M. S. Kim, G. G. Hong and J. J. Lee, Online fuzzy Q-learning with extended rule and interpolation technique, Proc. IEEE Int. Conf. Intelligent Robots and Systems, 2(1999), 757-762.
 
[17] T. Nakashima, M. Udo and H. Ishibuchi, Implementation of fuzzy Q-learning for a soccer agent Proc. IEEE Int. Conf. on Fuzzy systems, 1(2003), 533-536.
 
[18] K. Nozaki, H. Ishibuchi and H. Tanaka, Adaptive fuzzy rule-based classification systems, IEEE Transactions on Fuzzy Systems, 4(3)(1996), 238-250.
 
[19] S. B. Roh, W. Pedrycz and T. C. Ahn, A design of granular fuzzy classifier, Expert Systems with Applications, 41(16)(2014), 6786-6795.
 
[20] A. Sharifi, S.M. ALIYARI and M. Teshnehlab, Semi-polynomial Takagi-Sugeno- Kang Type Fuzzy System for System Identification and Pattern Classification, Journal of Control, 4(3)(2010), 15-28.
 
[21] R. S. Sutton, Learning to predict by the methods of temporal differences, Machine learning, 3(1)(1988), 9-44.
 
[22] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, MIT Press Cambridge, (1998).
 
[23] L. X. Wang, A course in fuzzy systems, Prentice-Hall press, USA, (1999).
 
[24] C. Ye, N. H. C. Yung and D. Wang, A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance, IEEE Transaction Systems, Man, Cybernetics, 33(1)(2003), 17-27.
 
[25] M. Zolghadri Jahromi and M. Taheri, A proposed method for learning rule weights in fuzzy rule-based classification systems, Fuzzy Sets and Systems, 159(2008), 449-459.