Transfer learning in multi-agent RL settings

Course of study:

Artificial Intelligence

Kind of thesis:

Theoretical analysis and Numerical Simulation

Programming languages:

Python

Keywords:

Multi-agent Reinforcement Learning, Transfer Learning, Successor features, Multi-objective environments

Problem:

The VDN algorithm [1] tackles cooperative multi-agent reinforcement learning with a joint reward signal, made complex by vast action and observation spaces. Some problems can be segmented into tasks with distinct reward functions, integrating easily into standard reinforcement learning.

Successor features (SF) separate environmental dynamics from rewards, while generalized policy improvement (GPI) considers multiple policies. Together, they facilitate cross-task information exchange in the RL framework [2].

The UneVen algorithm integrates SF with VDN to refine the policy exploration process [3]. In certain environments, SFs can combine policies to identify optimal solutions without added environment interactions [4], emphasizing the parallels between transfer learning via SFs and multi-objective optimization in RL.

This project aims to transfer knowledge between cooperative multi-agent reinforcement learning tasks, specifically determining the optimal policy set from [4]. An initial approach might tweak the VDN algorithm [1], drawing inpiration from [2], [3], and [4].

In the top figure, two agents must choose the path that gives more return depending on rewards given for collect triangles of different colors. In the bottom figure, a schematic representation of the algorithm that suggests how to add new policies to the set of optimum policies by solving different tasks.

Goal:

Suggest and run simulations of a new cooperative multi-agent reinforcement learning algorithm with a joint reward signal that combines the ideas of VDN, successor features, and the strategy to find a set of optimum policies as described in [4].

Preliminary work:

[1] proposes an algorithm to solve cooperative multi-agent reinforcement learning problems. [2] investigates how to do transfer learning using SF and GPI, and [4]’s work focuses on the algorithm to find the set of policies to deliver the optimum solution when using SF and GPI.

Tasks:

This project can include

Read the literature on SF, GPI, MARL, and multi-objective environments.

Choose a multi-agent environment and run some simulations using VDN.

Slightly modify a multi-agent environment to treat it as a multi-objective problem.

Propose an algorithm that combines VDN, SF, and the strategy used in [4] to identify the set of optimal policies.

Run simulations with the proposed algorithm and assess the results.

The final tasks will be discussed with the supervisor. Please feel free to get in contact.

References

[1] Sunehag, Peter, et al. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017.

[2] Barreto, André, et al. Fast reinforcement learning with generalized policy updates. Proceedings of the National Academy of Sciences 117.48, (2020): 30079-30087. Click here for a video presentation, and here for the Google DeepMind blog post about the topic.

[3] Gupta, Tarun, et al. Uneven: Universal value exploration for multi-agent reinforcement learning. International Conference on Machine Learning. PMLR, 2021.

[4] Alegre, Lucas Nunes, Ana Bazzan, and Bruno C. Da Silva. Optimistic linear support and successor features as a basis for optimal policy transfer. International Conference on Machine Learning. PMLR, 2022.

[5] Barreto, André, et al. Successor features for transfer in reinforcement learning. Advances in neural information processing systems 30, 2017.

[6] Alegre, Lucas N., et al. MO-Gymnasium (Software) Multi-Objective Gymnasium type environment, 2022.

Supervision

Supervisor: Rafael Fernandes Cunha
Room: 5161.0438 (Bernoulliborg)
Email: r.f.cunha@rug.nl