Plug-and-Play RL

Machine learning has enabled artificial intelligence (AI) to achieve cognitive abilities we once knew as innately human such as perception and rationalization. Furthermore, intelligent systems can be more reliable, precise, and agile than humans. Contrary to humans, intelligent systems are resilient to hazardous conditions such as space or pandemics. We are at the edge of a transformation where intelligent and autonomous systems can replace humans in many complex tasks. Examples include automated warehouses, medical and industrial robots, urban traffic control, and smart infrastructures. 

The prevalence of intelligent systems in our society brings new research opportunities to enhance and re-envision their interactions with each other and humans to harness their potential together. Challenges include uncertainty due to inadequate model of the environment, scalability due to the computational burden of coordinating multiple intelligent systems, non-stationarity induced by the evolving decisions of the systems, and sequential decision-making due to continuing interactions among them. Hence, the overarching goal of this research plan is to develop the theoretical and algorithmic foundation of learning and autonomy in complex and dynamic systems, and address these challenges with systematic guarantees.

Plug-and-play operation is a desirable feature for sustainable and scalable applications. This way, we can mitigate the necessity of homogenous agents and the need to calibrate the entire system whenever a new autonomous system is introduced or removed, e.g., due to systemic updates or failures. Therefore, we specifically focus on developing a rigorous theory for decentralized multi-agent reinforcement learning that can be deployed in plug-and-play scheme independent of the other intelligent systems and their objectives.

In our recent contribution [Sayin et al., 2020], we have addressed a long-standing problem in the learning in games literature. We have shown that simple, stylized learning dynamics (known as fictitious play) can converge to an equilibrium in zero-sum Markov games. We have established a two-timescale learning framework bringing game theory and control theory together. This framework will be leverage for us to achieve these ambitious goals.

Representative papers:

  • M. O. Sayin and K. A. Cetiner, “On the heterogeneity of independent learning dynamics in zero-sum stochastic games”, submitted to L4DC’22[pdf]
  • A. Ozdaglar, M. O. Sayin and K. Zhang, “Independent Learning in Stochastic Games,” available at arXiv:2111.11743, An invited chapter for the International Congress of Mathematicians 2022 (ICM’22). [url]
  • M. O. Sayin*, K. Zhang*, D. S. Leslie, T. Başar, and A. Ozdaglar, “Decentralized Q-learning in Zero-sum Markov Games,” in Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS) 2021, available at arXiv:2106.02748. [Also in Workshop on Reinforcement Learning Theory at Int. Conf. Mach. Learn. (ICML), 2021]
  • M. O. Sayin, F. Parise, and A. Ozdaglar, “Fictitious Play in Zero-sum Stochastic Games,” under review, SIAM Journal on Control and Optimization, available at arXiv:2010.04223, 2020.