ABSTRACT
Humans locate and track objects and other humans in their surroundings using audio, vision, or a combination of the two sensory modalities. A common strategy for humans searching for others in an indoor environment is to rely on a sound’s Direction of Arrival (DoA), as well as their knowledge of whether a room was previously occupied. In this paper, a similar search behavior is implemented on a mobile robot for the purpose of tracking other humans. To produce this search behavior, we develop an algorithm that performs probabilistic inference of human presence in a specific map region using two sensory cues: DoA of sound and a vision-based estimate of human proximity. A key characteristic of this approach is that the robot can navigate towards a human irrespective of whether the sound signal is continuous, sporadic, or absent altogether. We deploy the proposed search behavior on a robot and evaluate its efficacy at finding a target person across multiple rooms, considering varying levels of human sound (e.g., calling out occasionally, once, or not at all). Our experimental findings indicate that while audio signals are not vital in localizing the target person, they greatly reduce the search time.
Supplemental Material
- Anas Abou Allaban, Maozhen Wang, and Taşkın Padır. 2020. A Systematic Review of Robotics Research in Support of In-Home Care for Older Adults. Information 11, 2 (2020). https://doi.org/10.3390/info11020075Google ScholarCross Ref
- Mohammed Aledhari, Rehma Razzak, Reza M. Parizi, and Gautam Srivastava. 2021. Multimodal Machine Learning for Pedestrian Detection. In IEEE Vehicular Technology Conference. IEEE, New York, NY, USA, 1–7. https://doi.org/10.1109/VTC2021-Spring51267.2021.9448692Google ScholarCross Ref
- Fady Alnajjar, Sumayya Khalid, Alistair A Vogan, Shingo Shimoda, Rui Nouchi, and Ryuta Kawashima. 2019. Emerging cognitive intervention technologies to meet the needs of an aging population: a systematic review. Frontiers in Aging Neuroscience 11 (2019), 291.Google ScholarCross Ref
- Nicola Bellotto and Huosheng Hu. 2009. Multisensor-Based Human Detection and Tracking for Mobile Service Robots. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 1 (2009), 167–181. https://doi.org/10.1109/TSMCB.2008.2004050Google ScholarDigital Library
- Changan Chen, Ziad Al-Halah, and Kristen Grauman. 2021. Semantic Audio-Visual Navigation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, New York, NY, USA, 15516–15525.Google Scholar
- Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, and Kristen Grauman. 2020. Soundspaces: Audio-visual navigation in 3d environments. In European Conference on Computer Vision. Springer, New York, NY, USA, 17–36.Google ScholarDigital Library
- John R Frost and Lawrence D Stone. 2001. Review of search theory: advances and applications to search and rescue decision support. US Coast Guard Research and Development Center CG-D-15-01 (2001).Google Scholar
- Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, and Joshua B. Tenenbaum. 2020. Look, Listen, and Act: Towards Audio-Visual Embodied Navigation. In IEEE International Conference on Robotics and Automation. IEEE, New York, NY, USA, 9701–9707.Google ScholarCross Ref
- Ruohan Gao and Kristen Grauman. 2019. Co-Separating Sounds of Visual Objects. In IEEE/CVF International Conference on Computer Vision. IEEE, New York, NY, USA, 3879–3888.Google Scholar
- Petros Giannakopoulos, Aggelos Pikrakis, and Yannis Cotronis. 2021. A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments. arXiv preprint arXiv:2110.12778 (2021).Google Scholar
- Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, and Jitendra Malik. 2017. Cognitive Mapping and Planning for Visual Navigation. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, New York, NY, USA, 2616–2625.Google Scholar
- A.D. Heyes. 1983. Human navigation by sound. Physics in Technology 14, 2 (mar 1983), 68. https://doi.org/10.1088/0305-4624/14/2/I02Google ScholarCross Ref
- YC Ho and RCKA Lee. 1964. A Bayesian approach to problems in stochastic estimation and control. IEEE transactions on automatic control 9, 4 (1964), 333–339.Google ScholarCross Ref
- Jie Huang, T. Supaongprapa, I. Terakura, N. Ohnishi, and N. Sugie. 1997. Mobile robot and sound localization. In IEEE/RSJ International Conference on Intelligent Robot and Systems, Vol. 2. IEEE, New York, NY, USA, 683–689 vol.2.Google Scholar
- Charles C. Kemp, Aaron Edsinger, Henry M. Clever, and Blaine Matulevich. 2022. The Design of Stretch: A Compact, Lightweight Mobile Manipulator for Indoor Human Environments. In International Conference on Robotics and Automation. IEEE, New York, NY, USA, 3150–3157. https://doi.org/10.1109/ICRA46639.2022.9811922Google ScholarDigital Library
- Rui Luo, Dmitry Sinyukov, and Taskin Padir. 2021. Novelti 2.0: Enhanced Assistive Robot Navigation with Low Throughput Interfaces. In PErvasive Technologies Related to Assistive Environments Conference. Association for Computing Machinery, New York, NY, USA, 123–131. https://doi.org/10.1145/3453892.3458042Google ScholarDigital Library
- George Mois and Jenay M Beer. 2020. Robotics to support aging in place. In Living with Robots. Elsevier, New York, NY, USA, 49–74.Google Scholar
- George Mois and Jenay M Beer. 2020. The role of healthcare robotics in providing support to older adults: a socio-ecological perspective. Current Geriatrics Reports 9, 2 (2020), 82–89.Google ScholarCross Ref
- Sudipta Paul, Amit Roy-Chowdhury, and Anoop Cherian. 2022. AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). Vol. 35. Curran Associates, Inc., Red Hook, NY, USA, 6236–6249.Google Scholar
- Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, and In So Kweon. 2021. Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (2021), 1605–1619.Google ScholarCross Ref
- Dmitry A. Sinyukov and Taşkın Padır. 2018. A Novel Shared Position Control Method for Robot Navigation Via Low Throughput Human-Machine Interfaces. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, New York, NY, USA, 3913–3920. https://doi.org/10.1109/IROS.2018.8593921Google ScholarDigital Library
- Hao Sun, Peng Yang, Zuojun Liu, Linan Zu, and Qinqi Xu. 2011. Microphone array based auditory localization for rescue robot. In Chinese Control and Decision Conference. IEEE, New York, NY, USA, 606–609.Google ScholarCross Ref
- Loreto Susperregi, Jose Maria Martínez-Otzeta, Ander Ansuategui, Aitor Ibarguren, and Basilio Sierra. 2013. RGB-D, Laser and Thermal Sensor Fusion for People following in a Mobile Robot. International Journal of Advanced Robotic Systems 10, 6 (2013), 271. https://doi.org/10.5772/56123Google ScholarCross Ref
- Shen Khang Teoh, Yiek Heng Wong, Chun Farn Leong, and Lyk Yin Tan. 2021. Face Detection and Face Re-identification System Using Deep Learning and OpenVINO. In 2021 2nd International Conference on Artificial Intelligence and Data Sciences (AiDAS). 1–5. https://doi.org/10.1109/AiDAS53897.2021.9574201Google ScholarCross Ref
- Jing Yuan, Huan Chen, Fengchi Sun, and Yalou Huang. 2015. Multisensor Information Fusion for People Tracking With a Mobile Robot: A Particle Filtering Approach. IEEE Transactions on Instrumentation and Measurement 64, 9 (2015), 2427–2442. https://doi.org/10.1109/TIM.2015.2407512Google ScholarCross Ref
- Xiaoyang Zhao, Zhi Zhu, Mingshan Liu, Chongyu Zhao, Yafei Zhao, Jia Pan, Zheng Wang, and Chuan Wu. 2020. A smart robotic walker with intelligent close-proximity interaction capabilities for elderly mobility safety. Frontiers in Neurorobotics 14 (2020), 575889.Google ScholarCross Ref
- Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, and Ali Farhadi. 2017. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In IEEE International Conference on Robotics and Automation. IEEE, New York, NY, USA, 3357–3364.Google ScholarDigital Library
- Mark Zolotas, Joshua Elsdon, and Yiannis Demiris. 2018. Head-Mounted Augmented Reality for Explainable Robotic Wheelchair Assistance. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, New York, NY, USA, 1823–1829. https://doi.org/10.1109/IROS.2018.8594002Google ScholarDigital Library
- Andrea Zunino, Marco Crocco, Samuele Martelli, Andrea Trucco, Alessio Del Bue, and Vittorio Murino. 2015. Seeing the Sound: A New Multimodal Imaging Device for Computer Vision. In IEEE International Conference on Computer Vision Workshops. IEEE, New York, NY, USA, 693–701.Google Scholar
Index Terms
- Sporadic Audio-Visual Embodied Assistive Robot Navigation For Human Tracking
Recommendations
Audio-Visual SLAM towards Human Tracking and Human-Robot Interaction in Indoor Environments
2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)We propose a novel audio-visual simultaneous and localization (SLAM) framework that exploits human pose and acoustic speech of human sound sources to allow a robot equipped with a microphone array and a monocular camera to track, map, and interact with ...
Sound and Visual Tracking for Humanoid Robot
Mobile robots capable of auditory perception usually adopt the “stop-perceive-act” principle to avoid sounds made during moving due to motor noise. Although this principle reduces the complexity of the problems involved in auditory processing for mobile ...
Human-aware robot navigation: A survey
Navigation is a basic skill for autonomous robots. In the last years human-robot interaction has become an important research field that spans all of the robot capabilities including perception, reasoning, learning, manipulation and navigation. For ...
Comments