[1] |
Stern R, Sturtevant N, Felner A, et al. Multi-agent pathfinding: Definitions, variants, and benchmarks. arXiv: 1906.08291, 2019.
|
[2] |
Hönig W, Kiesel S, Tinka A, et al. Persistent and robust execution of MAPF schedules in warehouses. IEEE Robotics and Automation Letters, 2019, 4 (2): 1125–1131. doi: 10.1109/LRA.2019.2894217
|
[3] |
Wurman P R, D’Andrea R, Mountz M. Coordinating hundreds of cooperative, autonomous vehicles in warehouses. AI Magazine, 2008, 29: 9–19. doi: 10.1609/aimag.v29i1.2082
|
[4] |
Balakrishnan H, Jung Y. A framework for coordinated surface operations planning at Dallas-Fort Worth international airport. In: AIAA Guidance, Navigation and Control Conference and Exhibit. Hilton Head, USA: AIAA, 2007: 6553.
|
[5] |
Baxter J L, Burke E, Garibaldi J, et al. Multi-robot search and rescue: A potential field based approach. Studies in Computational Intelligence, 2007, 76: 9–16. doi: 10.1007/978-3-540-73424-6_2
|
[6] |
Zhang Y, Qian Y, Yao Y, et al. Learning to cooperate: Application of deep reinforcement learning for online AGV path finding. In: AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. Auckland, New Zealand: ACM, 2020: 2077–2079.
|
[7] |
Corrales P A, Gregori F A. Swarm AGV optimization using deep reinforcement learning. In: MLMI '20: Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence. New York: ACM, 2020: 65–69.
|
[8] |
Yu J, LaValle S. Structure and intractability of optimal multi-robot path planning on graphs. In: AAAI'13: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. New York: ACM, 2013: 1443–1449.
|
[9] |
Panait L, Luke S. Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, 2005, 11: 387–434. doi: 10.1007/s10458-005-2631-2
|
[10] |
Silver D. Cooperative pathfinding. In: First Artificial Intelligence and Interactive Digital Entertainment Conference. Washington, DC, USA: AAAI, 2005: 117–122.
|
[11] |
Sharon G, Stern R, Felner A, et al. Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence, 2015, 219: 40–66. doi: 10.1016/j.artint.2014.11.006
|
[12] |
Sharon G, Stern R, Felner A, et al. Conflict-based search for optimal multi-agent path finding. In: Proceedings of the International Symposium on Combinatorial Search. Palo Alto, USA: AAAI, 2012: 190.
|
[13] |
Barer M, Sharon G, Stern R, et al. Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. In: Proceedings of the Seventh Annual Symposium on Combinatorial Search. Prague, Czech: AAAI, 2021, 5: 19–27.
|
[14] |
Boyarski E, Felner A, Stern R, et al. ICBS: Improved conflict-based search algorithm for multi-agent pathfinding. In: IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence. New York: ACM, 2015: 740–746.
|
[15] |
Sartoretti G, Kerr J, Shi Y, et al. PRIMAL: Pathfinding via reinforcement and imitation multi-agent learning. IEEE Robotics and Automation Letters, 2019, 4: 2378–2385. doi: 10.1109/LRA.2019.2903261
|
[16] |
Damani M, Luo Z, Wenzel E, et al. PRIMAL2: Pathfinding via reinforcement and imitation multi-agent learning-lifelong. IEEE Robotics and Automation Letters, 2021, 6: 2666–2673. doi: 10.1109/LRA.2021.3062803
|
[17] |
Everett M, Chen Y F, How J P. Motion planning among dynamic, decision-making agents with deep reinforcement learning. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain: IEEE, 2018: 3052–3059.
|
[18] |
Li Q, Gama F, Ribeiro A, et al. Graph neural networks for decentralized multi-robot path planning. arXiv: 1912.06095, 2019.
|
[19] |
Li Q, Lin W, Liu Z, et al. Message-aware graph attention networks for large-scale multi-robot path planning. IEEE Robotics and Automation Letters, 2021, 6 (3): 5533–5540. doi: 10.1109/LRA.2021.3077863
|
[20] |
Liu Z, Chen B, Zhou H, et al. MAPPER: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, USA: IEEE, 2020: 11748–11754.
|
[21] |
Jansen R, Sturtevant N. A new approach to cooperative pathfinding. In: AAMAS '08: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems. New York: ACM. 2008: 1401–1404.
|
[22] |
Ryan M R K. Exploiting subgraph structure in multi-robot path planning. Journal of Artificial Intelligence Research, 2008, 31: 497–542. doi: 10.1613/jair.2408
|
[23] |
Wang K H C, Botea A. Fast and memory-efficient multi-agent pathfinding. In: ICAPS'08: Proceedings of the Eighteenth International Conference on International Conference on Automated Planning and Scheduling. New York: ACM, 2008: 380–387.
|
[24] |
Aljalaud F, Sturtevant N R. Finding bounded suboptimal multi-agent path planning solutions using increasing cost tree search. In: Proceedings of the Sixth International Symposium on Combinatorial Search. Washington, DC, USA: AAAI, 2013: 203.
|
[25] |
Sharon G, Stern R, Goldenberg M, et al. The increasing cost tree search for optimal multi-agent pathfinding. Artificial Intelligence, 2013, 195: 470–495. doi: 10.1016/j.artint.2012.11.006
|
[26] |
Walker T T, Sturtevant N R, Felner A. Extended increasing cost tree search for non-unit cost domains. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Main track. California: IJCAI, 2018: 534–540.
|
[27] |
Surynek P. On propositional encodings of cooperative path-finding. In: 2012 IEEE 24th International Conference on Tools with Artificial Intelligence. Athens, Greece: IEEE, 2013: 524–531.
|
[28] |
Surynek P. Compact representations of cooperative path-finding as SAT based on matchings in bipartite graphs. In: 2014 IEEE 26th International Conference on Tools with Artificial Intelligence. Limassol, Cyprus: IEEE, 2014: 875–882.
|
[29] |
Surynek P. Reduced time-expansion graphs and goal decomposition for solving cooperative path finding sub-optimally. In: IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence. New York: ACM, 2015: 1916–1922.
|
[30] |
Surynek P, Felner A, Stern R, et al. Efficient SAT approach to multi-agent path finding under the sum of costs objective. In: ECAI'16: Proceedings of the Twenty-Second European Conference on Artificial Intelligence. New York: ACM, 2016: 810–818.
|
[31] |
Yu J, LaValle S M. Multi-agent path planning and network flow. In: Frazzoli E, Lozano-Perez T, Roy N, et al., editors. Algorithmic Foundations of Robotics X. Berlin: Springer, 2013: 157–173.
|
[32] |
Yu J, LaValle S M. Optimal multirobot path planning on graphs: Complete algorithms and effective heuristics. IEEE Transactions on Robotics, 2016, 32 (5): 1163–1177. doi: 10.1109/TRO.2016.2593448
|
[33] |
Lam E, Le Bodic P, Harabor D, et al. Branch-and-cut-and-price for multi-agent pathfinding. Computers & Operations Research, 2022, 144: 105809. doi: 10.1016/j.cor.2022.105809
|
[34] |
Erdem E, Kisa D, Oztok U, et al. A general formal framework for pathfinding problems with multiple agents. Proceedings of the AAAI Conference on Artificial Intelligence, 2013, 27 (1): 290–296. doi: 10.1609/aaai.v27i1.8592
|
[35] |
Chen Y F, Liu M, Everett M, et al. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). Singapore: IEEE, 2017: 285–292.
|
[36] |
Wang B, Liu Z, Li Q, et al. Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robotics and Automation Letters, 2020, 5 (4): 6932–6939. doi: 10.1109/LRA.2020.3026638
|
[37] |
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv: 2010.11929, 2020.
|
[38] |
Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv: 1801.01290, 2018.
|
Figure 1. State’s structure of each agent (here, for the red agent). Agents represented as a red square are positioned in the squares, while their goals are represented as a solid circle of the same color. For the current agent (red agent), its local observation contains four channel information: positions of obstacles, positions of nearby agents, goal positions of these nearby agents, and position of its own goal (if exist in the FOV). In addition to the current (red) agent status, other nearby agents’ states are also required (here, take the yellow agent and blue agent, for example).
Figure 2. Model overview. For the input observations, there are local observations of the target agent (red agent) that need to be planned and other agents nearby the target agent (blue agent, yellow agent, etc). For our deep reinforcement learning framework, there are three components: observation encoder, communication block, and decision block. Finally, the estimated optimal action from the policy network is taken as the output of the model.
Figure
3.
The structure of LA-Encoder. Firstly, the local observation
Figure 4. The structure of Transformer-based communication block (TC-Block). The TC-Block takes the features obtained from the LA-Encoder as input, and regards them as a sequence. In addition, there also need to embed the position of each feature tensor. We use relative coordinates as position information. And inspired by Ref. [37] using Transformer in image classification task, we embed the position information from X coordinate and Y coordinate.
[1] |
Stern R, Sturtevant N, Felner A, et al. Multi-agent pathfinding: Definitions, variants, and benchmarks. arXiv: 1906.08291, 2019.
|
[2] |
Hönig W, Kiesel S, Tinka A, et al. Persistent and robust execution of MAPF schedules in warehouses. IEEE Robotics and Automation Letters, 2019, 4 (2): 1125–1131. doi: 10.1109/LRA.2019.2894217
|
[3] |
Wurman P R, D’Andrea R, Mountz M. Coordinating hundreds of cooperative, autonomous vehicles in warehouses. AI Magazine, 2008, 29: 9–19. doi: 10.1609/aimag.v29i1.2082
|
[4] |
Balakrishnan H, Jung Y. A framework for coordinated surface operations planning at Dallas-Fort Worth international airport. In: AIAA Guidance, Navigation and Control Conference and Exhibit. Hilton Head, USA: AIAA, 2007: 6553.
|
[5] |
Baxter J L, Burke E, Garibaldi J, et al. Multi-robot search and rescue: A potential field based approach. Studies in Computational Intelligence, 2007, 76: 9–16. doi: 10.1007/978-3-540-73424-6_2
|
[6] |
Zhang Y, Qian Y, Yao Y, et al. Learning to cooperate: Application of deep reinforcement learning for online AGV path finding. In: AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. Auckland, New Zealand: ACM, 2020: 2077–2079.
|
[7] |
Corrales P A, Gregori F A. Swarm AGV optimization using deep reinforcement learning. In: MLMI '20: Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence. New York: ACM, 2020: 65–69.
|
[8] |
Yu J, LaValle S. Structure and intractability of optimal multi-robot path planning on graphs. In: AAAI'13: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. New York: ACM, 2013: 1443–1449.
|
[9] |
Panait L, Luke S. Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, 2005, 11: 387–434. doi: 10.1007/s10458-005-2631-2
|
[10] |
Silver D. Cooperative pathfinding. In: First Artificial Intelligence and Interactive Digital Entertainment Conference. Washington, DC, USA: AAAI, 2005: 117–122.
|
[11] |
Sharon G, Stern R, Felner A, et al. Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence, 2015, 219: 40–66. doi: 10.1016/j.artint.2014.11.006
|
[12] |
Sharon G, Stern R, Felner A, et al. Conflict-based search for optimal multi-agent path finding. In: Proceedings of the International Symposium on Combinatorial Search. Palo Alto, USA: AAAI, 2012: 190.
|
[13] |
Barer M, Sharon G, Stern R, et al. Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. In: Proceedings of the Seventh Annual Symposium on Combinatorial Search. Prague, Czech: AAAI, 2021, 5: 19–27.
|
[14] |
Boyarski E, Felner A, Stern R, et al. ICBS: Improved conflict-based search algorithm for multi-agent pathfinding. In: IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence. New York: ACM, 2015: 740–746.
|
[15] |
Sartoretti G, Kerr J, Shi Y, et al. PRIMAL: Pathfinding via reinforcement and imitation multi-agent learning. IEEE Robotics and Automation Letters, 2019, 4: 2378–2385. doi: 10.1109/LRA.2019.2903261
|
[16] |
Damani M, Luo Z, Wenzel E, et al. PRIMAL2: Pathfinding via reinforcement and imitation multi-agent learning-lifelong. IEEE Robotics and Automation Letters, 2021, 6: 2666–2673. doi: 10.1109/LRA.2021.3062803
|
[17] |
Everett M, Chen Y F, How J P. Motion planning among dynamic, decision-making agents with deep reinforcement learning. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain: IEEE, 2018: 3052–3059.
|
[18] |
Li Q, Gama F, Ribeiro A, et al. Graph neural networks for decentralized multi-robot path planning. arXiv: 1912.06095, 2019.
|
[19] |
Li Q, Lin W, Liu Z, et al. Message-aware graph attention networks for large-scale multi-robot path planning. IEEE Robotics and Automation Letters, 2021, 6 (3): 5533–5540. doi: 10.1109/LRA.2021.3077863
|
[20] |
Liu Z, Chen B, Zhou H, et al. MAPPER: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, USA: IEEE, 2020: 11748–11754.
|
[21] |
Jansen R, Sturtevant N. A new approach to cooperative pathfinding. In: AAMAS '08: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems. New York: ACM. 2008: 1401–1404.
|
[22] |
Ryan M R K. Exploiting subgraph structure in multi-robot path planning. Journal of Artificial Intelligence Research, 2008, 31: 497–542. doi: 10.1613/jair.2408
|
[23] |
Wang K H C, Botea A. Fast and memory-efficient multi-agent pathfinding. In: ICAPS'08: Proceedings of the Eighteenth International Conference on International Conference on Automated Planning and Scheduling. New York: ACM, 2008: 380–387.
|
[24] |
Aljalaud F, Sturtevant N R. Finding bounded suboptimal multi-agent path planning solutions using increasing cost tree search. In: Proceedings of the Sixth International Symposium on Combinatorial Search. Washington, DC, USA: AAAI, 2013: 203.
|
[25] |
Sharon G, Stern R, Goldenberg M, et al. The increasing cost tree search for optimal multi-agent pathfinding. Artificial Intelligence, 2013, 195: 470–495. doi: 10.1016/j.artint.2012.11.006
|
[26] |
Walker T T, Sturtevant N R, Felner A. Extended increasing cost tree search for non-unit cost domains. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Main track. California: IJCAI, 2018: 534–540.
|
[27] |
Surynek P. On propositional encodings of cooperative path-finding. In: 2012 IEEE 24th International Conference on Tools with Artificial Intelligence. Athens, Greece: IEEE, 2013: 524–531.
|
[28] |
Surynek P. Compact representations of cooperative path-finding as SAT based on matchings in bipartite graphs. In: 2014 IEEE 26th International Conference on Tools with Artificial Intelligence. Limassol, Cyprus: IEEE, 2014: 875–882.
|
[29] |
Surynek P. Reduced time-expansion graphs and goal decomposition for solving cooperative path finding sub-optimally. In: IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence. New York: ACM, 2015: 1916–1922.
|
[30] |
Surynek P, Felner A, Stern R, et al. Efficient SAT approach to multi-agent path finding under the sum of costs objective. In: ECAI'16: Proceedings of the Twenty-Second European Conference on Artificial Intelligence. New York: ACM, 2016: 810–818.
|
[31] |
Yu J, LaValle S M. Multi-agent path planning and network flow. In: Frazzoli E, Lozano-Perez T, Roy N, et al., editors. Algorithmic Foundations of Robotics X. Berlin: Springer, 2013: 157–173.
|
[32] |
Yu J, LaValle S M. Optimal multirobot path planning on graphs: Complete algorithms and effective heuristics. IEEE Transactions on Robotics, 2016, 32 (5): 1163–1177. doi: 10.1109/TRO.2016.2593448
|
[33] |
Lam E, Le Bodic P, Harabor D, et al. Branch-and-cut-and-price for multi-agent pathfinding. Computers & Operations Research, 2022, 144: 105809. doi: 10.1016/j.cor.2022.105809
|
[34] |
Erdem E, Kisa D, Oztok U, et al. A general formal framework for pathfinding problems with multiple agents. Proceedings of the AAAI Conference on Artificial Intelligence, 2013, 27 (1): 290–296. doi: 10.1609/aaai.v27i1.8592
|
[35] |
Chen Y F, Liu M, Everett M, et al. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). Singapore: IEEE, 2017: 285–292.
|
[36] |
Wang B, Liu Z, Li Q, et al. Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robotics and Automation Letters, 2020, 5 (4): 6932–6939. doi: 10.1109/LRA.2020.3026638
|
[37] |
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv: 2010.11929, 2020.
|
[38] |
Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv: 1801.01290, 2018.
|