skip to main content
10.1145/3594806.3594845acmotherconferencesArticle/Chapter ViewAbstractPublication PagespetraConference Proceedingsconference-collections
research-article
Open Access

Sporadic Audio-Visual Embodied Assistive Robot Navigation For Human Tracking

Published:10 August 2023Publication History

ABSTRACT

Humans locate and track objects and other humans in their surroundings using audio, vision, or a combination of the two sensory modalities. A common strategy for humans searching for others in an indoor environment is to rely on a sound’s Direction of Arrival (DoA), as well as their knowledge of whether a room was previously occupied. In this paper, a similar search behavior is implemented on a mobile robot for the purpose of tracking other humans. To produce this search behavior, we develop an algorithm that performs probabilistic inference of human presence in a specific map region using two sensory cues: DoA of sound and a vision-based estimate of human proximity. A key characteristic of this approach is that the robot can navigate towards a human irrespective of whether the sound signal is continuous, sporadic, or absent altogether. We deploy the proposed search behavior on a robot and evaluate its efficacy at finding a target person across multiple rooms, considering varying levels of human sound (e.g., calling out occasionally, once, or not at all). Our experimental findings indicate that while audio signals are not vital in localizing the target person, they greatly reduce the search time.

Skip Supplemental Material Section

Supplemental Material

Multi room.mp4

mp4

19.5 MB

Audio based Navigation( Please play with 2x speed and check description ).mp4

mp4

54.2 MB

References

  1. Anas Abou Allaban, Maozhen Wang, and Taşkın Padır. 2020. A Systematic Review of Robotics Research in Support of In-Home Care for Older Adults. Information 11, 2 (2020). https://doi.org/10.3390/info11020075Google ScholarGoogle ScholarCross RefCross Ref
  2. Mohammed Aledhari, Rehma Razzak, Reza M. Parizi, and Gautam Srivastava. 2021. Multimodal Machine Learning for Pedestrian Detection. In IEEE Vehicular Technology Conference. IEEE, New York, NY, USA, 1–7. https://doi.org/10.1109/VTC2021-Spring51267.2021.9448692Google ScholarGoogle ScholarCross RefCross Ref
  3. Fady Alnajjar, Sumayya Khalid, Alistair A Vogan, Shingo Shimoda, Rui Nouchi, and Ryuta Kawashima. 2019. Emerging cognitive intervention technologies to meet the needs of an aging population: a systematic review. Frontiers in Aging Neuroscience 11 (2019), 291.Google ScholarGoogle ScholarCross RefCross Ref
  4. Nicola Bellotto and Huosheng Hu. 2009. Multisensor-Based Human Detection and Tracking for Mobile Service Robots. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 1 (2009), 167–181. https://doi.org/10.1109/TSMCB.2008.2004050Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Changan Chen, Ziad Al-Halah, and Kristen Grauman. 2021. Semantic Audio-Visual Navigation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, New York, NY, USA, 15516–15525.Google ScholarGoogle Scholar
  6. Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, and Kristen Grauman. 2020. Soundspaces: Audio-visual navigation in 3d environments. In European Conference on Computer Vision. Springer, New York, NY, USA, 17–36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. John R Frost and Lawrence D Stone. 2001. Review of search theory: advances and applications to search and rescue decision support. US Coast Guard Research and Development Center CG-D-15-01 (2001).Google ScholarGoogle Scholar
  8. Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, and Joshua B. Tenenbaum. 2020. Look, Listen, and Act: Towards Audio-Visual Embodied Navigation. In IEEE International Conference on Robotics and Automation. IEEE, New York, NY, USA, 9701–9707.Google ScholarGoogle ScholarCross RefCross Ref
  9. Ruohan Gao and Kristen Grauman. 2019. Co-Separating Sounds of Visual Objects. In IEEE/CVF International Conference on Computer Vision. IEEE, New York, NY, USA, 3879–3888.Google ScholarGoogle Scholar
  10. Petros Giannakopoulos, Aggelos Pikrakis, and Yannis Cotronis. 2021. A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments. arXiv preprint arXiv:2110.12778 (2021).Google ScholarGoogle Scholar
  11. Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, and Jitendra Malik. 2017. Cognitive Mapping and Planning for Visual Navigation. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, New York, NY, USA, 2616–2625.Google ScholarGoogle Scholar
  12. A.D. Heyes. 1983. Human navigation by sound. Physics in Technology 14, 2 (mar 1983), 68. https://doi.org/10.1088/0305-4624/14/2/I02Google ScholarGoogle ScholarCross RefCross Ref
  13. YC Ho and RCKA Lee. 1964. A Bayesian approach to problems in stochastic estimation and control. IEEE transactions on automatic control 9, 4 (1964), 333–339.Google ScholarGoogle ScholarCross RefCross Ref
  14. Jie Huang, T. Supaongprapa, I. Terakura, N. Ohnishi, and N. Sugie. 1997. Mobile robot and sound localization. In IEEE/RSJ International Conference on Intelligent Robot and Systems, Vol. 2. IEEE, New York, NY, USA, 683–689 vol.2.Google ScholarGoogle Scholar
  15. Charles C. Kemp, Aaron Edsinger, Henry M. Clever, and Blaine Matulevich. 2022. The Design of Stretch: A Compact, Lightweight Mobile Manipulator for Indoor Human Environments. In International Conference on Robotics and Automation. IEEE, New York, NY, USA, 3150–3157. https://doi.org/10.1109/ICRA46639.2022.9811922Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Rui Luo, Dmitry Sinyukov, and Taskin Padir. 2021. Novelti 2.0: Enhanced Assistive Robot Navigation with Low Throughput Interfaces. In PErvasive Technologies Related to Assistive Environments Conference. Association for Computing Machinery, New York, NY, USA, 123–131. https://doi.org/10.1145/3453892.3458042Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. George Mois and Jenay M Beer. 2020. Robotics to support aging in place. In Living with Robots. Elsevier, New York, NY, USA, 49–74.Google ScholarGoogle Scholar
  18. George Mois and Jenay M Beer. 2020. The role of healthcare robotics in providing support to older adults: a socio-ecological perspective. Current Geriatrics Reports 9, 2 (2020), 82–89.Google ScholarGoogle ScholarCross RefCross Ref
  19. Sudipta Paul, Amit Roy-Chowdhury, and Anoop Cherian. 2022. AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). Vol. 35. Curran Associates, Inc., Red Hook, NY, USA, 6236–6249.Google ScholarGoogle Scholar
  20. Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, and In So Kweon. 2021. Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (2021), 1605–1619.Google ScholarGoogle ScholarCross RefCross Ref
  21. Dmitry A. Sinyukov and Taşkın Padır. 2018. A Novel Shared Position Control Method for Robot Navigation Via Low Throughput Human-Machine Interfaces. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, New York, NY, USA, 3913–3920. https://doi.org/10.1109/IROS.2018.8593921Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hao Sun, Peng Yang, Zuojun Liu, Linan Zu, and Qinqi Xu. 2011. Microphone array based auditory localization for rescue robot. In Chinese Control and Decision Conference. IEEE, New York, NY, USA, 606–609.Google ScholarGoogle ScholarCross RefCross Ref
  23. Loreto Susperregi, Jose Maria Martínez-Otzeta, Ander Ansuategui, Aitor Ibarguren, and Basilio Sierra. 2013. RGB-D, Laser and Thermal Sensor Fusion for People following in a Mobile Robot. International Journal of Advanced Robotic Systems 10, 6 (2013), 271. https://doi.org/10.5772/56123Google ScholarGoogle ScholarCross RefCross Ref
  24. Shen Khang Teoh, Yiek Heng Wong, Chun Farn Leong, and Lyk Yin Tan. 2021. Face Detection and Face Re-identification System Using Deep Learning and OpenVINO. In 2021 2nd International Conference on Artificial Intelligence and Data Sciences (AiDAS). 1–5. https://doi.org/10.1109/AiDAS53897.2021.9574201Google ScholarGoogle ScholarCross RefCross Ref
  25. Jing Yuan, Huan Chen, Fengchi Sun, and Yalou Huang. 2015. Multisensor Information Fusion for People Tracking With a Mobile Robot: A Particle Filtering Approach. IEEE Transactions on Instrumentation and Measurement 64, 9 (2015), 2427–2442. https://doi.org/10.1109/TIM.2015.2407512Google ScholarGoogle ScholarCross RefCross Ref
  26. Xiaoyang Zhao, Zhi Zhu, Mingshan Liu, Chongyu Zhao, Yafei Zhao, Jia Pan, Zheng Wang, and Chuan Wu. 2020. A smart robotic walker with intelligent close-proximity interaction capabilities for elderly mobility safety. Frontiers in Neurorobotics 14 (2020), 575889.Google ScholarGoogle ScholarCross RefCross Ref
  27. Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, and Ali Farhadi. 2017. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In IEEE International Conference on Robotics and Automation. IEEE, New York, NY, USA, 3357–3364.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mark Zolotas, Joshua Elsdon, and Yiannis Demiris. 2018. Head-Mounted Augmented Reality for Explainable Robotic Wheelchair Assistance. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, New York, NY, USA, 1823–1829. https://doi.org/10.1109/IROS.2018.8594002Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Andrea Zunino, Marco Crocco, Samuele Martelli, Andrea Trucco, Alessio Del Bue, and Vittorio Murino. 2015. Seeing the Sound: A New Multimodal Imaging Device for Computer Vision. In IEEE International Conference on Computer Vision Workshops. IEEE, New York, NY, USA, 693–701.Google ScholarGoogle Scholar

Index Terms

  1. Sporadic Audio-Visual Embodied Assistive Robot Navigation For Human Tracking

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments
          July 2023
          797 pages
          ISBN:9798400700699
          DOI:10.1145/3594806

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 August 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)67
          • Downloads (Last 6 weeks)10

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format