Coarse Tuning of Fuzzy Reinforcement Learning Architecture using Value Iteration Method

Document Type : Original Article

Authors

1 Department of Computer Engineering, Yazd University, Yazd, Iran.

2 Yazd university

3 Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran

Abstract

This research presents a new method of coarse-tuning the fuzzy reinforcement learning architecture using agent-environment interactive data. This method solves two main challenges the fuzzy reinforcement learning: the low-speed training process and determining the input membership functions of the fuzzy structure. First, the agent interacts with the environment to gather training data. Then, the transition probability matrix and the expected return matrix are calculated by applying a clustering algorithm due to the continuous space. Each cluster is a state of the environment, and an approximation of the transition probability from one cluster to another is calculated using the gathered data. Finally, the parameters of the fuzzy system are adjusted using the modified value iteration method of dynamic programming for the continuous space. The proposed method is fully described with an example. This method increases the learning speed and adjusts the input membership functions of the fuzzy system.

Keywords


[1] و. درهمی، ف. اعلمی‌یان هرندی، م.ب. دولتشاهی (۱۳۹۶)، یادگیری تقویتی، انتشارات دانشگاه یزد.
 
[2] و. درهمی، ف. اعلمی‌یان هرندی (۱۳۹۷)، مروری بر روش‌های یادگیری تقویتی فازی با معماری نقاد-تنها، سیستم‌های فازی و کاربردها، ۱ (۲)، ۱۱-۳۴.
 
[3] الیاسی (۱۴۰۱)، طراحی یک کنترل‌کننده تطبیقی افق پیش‌رونده مبتنی بر سیستم استنتاج فازی TSK برای یک سیستم دینامیکی غیرخطی، سیستم‌های فازی و کاربردها، ۴ (۱)، ۱۷۱-۱۸۸.
 
[4] ح. فهیمی، ج. چاچی، ا. کاظمی‌فرد (۱۴۰۱)، شبکه‌های عصبی در تحلیل اطلاعات فازی از تصاویر شبکیه چشم، سیستم‌های فازی و کاربردها، ۴ (۲)، ۱-۲۰.
 
[5] McClement, D. G., Lawrence, N. P., Backström, J. U., Loewen, P. D., Forbes, M. G., & Gopaluni, R. B. (2022). Meta-reinforcement learning for the tuning of PI controllers: An offline approach. Journal of Process Control, 118, 139-152.
 
[6] Elguea-Aguinaco, Í., Serrano-Muñoz, A., Chrysostomou, D., Inziarte-Hidalgo, I., Bøgh, S., & Arana-Arexolaleiba, N. (2023). A review on reinforcement learning for contact-rich robotic manipulation tasks. Robotics and Computer-Integrated Manufacturing, 81, 102517.
 
[7] Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W. M., ... & Silver, D. (2019). Alphastar: Mastering the real-time strategy game starcraft ii. DeepMind blog, 2, 20.
 
[8] Afsar, M. M., Crump, T., & Far, B. (2022). Reinforcement learning based recommender systems: A survey. ACM Computing Surveys, 55(7), 1-38.
 
[9] Yang, T., Zhao, L., Li, W., & Zomaya, A. Y. (2020). Reinforcement learning in sustainable energy and electric systems: A survey. Annual Reviews in Control, 49, 145-163.
 
[10] Uc-Cetina, V., Navarro-Guerrero, N., Martin-Gonzalez, A., Weber, C., & Wermter, S. (2022). Survey on reinforcement learning for language processing. Artificial Intelligence Review, 1-33.
 
[11] Lobbezoo, A., Qian, Y., & Kwon, H. J. (2021). Reinforcement learning for pick and place operations in robotics: A survey. Robotics, 10(3), 105.
 
[12] Beltran-Hernandez, C. C., Petit, D., Ramirez-Alpizar, I. G., Nishi, T., Kikuchi, S., Matsubara, T., & Harada, K. (2020). Learning force control for contact-rich manipulation tasks with rigid position-controlled robots. IEEE Robotics and Automation Letters, 5(4), 5709-5716.
 
[13] Wu, K., Wang, H., Esfahani, M. A., & Yuan, S. (2021). Learn to navigate autonomously through deep reinforcement learning. IEEE Transactions on Industrial Electronics, 69(5), 5342-5352.
 
[14] Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A. A., Yogamani, S., & Pérez, P. (2021). Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6), 4909-4926.
 
[15] Lou, X., Yin, Q., Zhang, J., Yu, C., He, Z., Cheng, N., & Huang, K. (2022). Offline reinforcement learning with representations for actions. Information Sciences, 610, 746-758.
 
[16] Fathinezhad, F., Derhami, V., & Rezaeian, M. (2016). Supervised fuzzy reinforcement learning for robot navigation. Applied Soft Computing, 40, 33-41.
 
[17] Harandi, F. A., Derhami, V., & Jamshidi, F. (2019). A new feature selection method based on task environments for controlling robots. Applied Soft Computing, 85, 105812.
 
[18] Chebotar, Y., Hausman, K., Lu, Y., Xiao, T., Kalashnikov, D., Varley, J., Irpan, A., Eysenbach, B., Julian, R.C., Finn, C. & Levine, S. (2021). Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills. Proceedings of the 38th International Conference on Machine Learning (PMLR), 139, 1518-1528.
 
[19] Derhami, V., Majd, V. J., & Ahmadabadi, M. N. (2008). Fuzzy Sarsa learning and the proof of existence of its stationary points. Asian Journal of Control, 10(5), 535-549.
 
[20] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. MIT press Cambridge, 1998.
 
[21] Zhou, K., Yang, S., & Shao, Z. (2017). Household monthly electricity consumption pattern mining: A fuzzy clustering-based model and a case study. Journal of cleaner production, 141, 900-908.
 
[22] Saini, P., Kaur, J., & Lamba, S. (2021). A Review on Pattern Recognition Using Machine Learning. Advances in Mechanical Engineering: Select Proceedings of CAMSE 2020, 619-627.
 
[23] Li, C., Kulwa, F., Zhang, J., Li, Z., Xu, H., & Zhao, X. (2021). A review of clustering methods in microorganism image analysis. Information technology in biomedicine, 13-25.
 
[24] Subramaniam, M., Kathirvel, A., Sabitha, E., & Basha, H. A. (2021). Modified firefly algorithm and fuzzy c-mean clustering based semantic information retrieval. Journal of Web Engineering, 33-52.
 
[25] M. Yazdian-Dehkordi, F. Nadi, S. Abbasi (2022). Adaptive Gaussian Density Distance for Clustering, Tabriz Journal of Electrical Engineering, 52 (3), 205-215.