research-article

Open Access

Sporadic Audio-Visual Embodied Assistive Robot Navigation For Human Tracking

Authors:
Gaurav Singh

Northeastern University, United States

Northeastern University, United States

0009-0002-2078-9523
View Profile

,
Paul Ghanem

Northeastern University, USA

Northeastern University, USA

0000-0002-7392-0065
View Profile

,
Taskin Padir

Northeastern University, USA

Northeastern University, USA

0000-0001-5123-5801
View Profile

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive EnvironmentsJuly 2023Pages 99–105https://doi.org/10.1145/3594806.3594845

Published:10 August 2023Publication History

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments

Pages 99–105

ABSTRACT

Humans locate and track objects and other humans in their surroundings using audio, vision, or a combination of the two sensory modalities. A common strategy for humans searching for others in an indoor environment is to rely on a sound’s Direction of Arrival (DoA), as well as their knowledge of whether a room was previously occupied. In this paper, a similar search behavior is implemented on a mobile robot for the purpose of tracking other humans. To produce this search behavior, we develop an algorithm that performs probabilistic inference of human presence in a specific map region using two sensory cues: DoA of sound and a vision-based estimate of human proximity. A key characteristic of this approach is that the robot can navigate towards a human irrespective of whether the sound signal is continuous, sporadic, or absent altogether. We deploy the proposed search behavior on a robot and evaluate its efficacy at finding a target person across multiple rooms, considering varying levels of human sound (e.g., calling out occasionally, once, or not at all). Our experimental findings indicate that while audio signals are not vital in localizing the target person, they greatly reduce the search time.

Supplemental Material

Multi room.mp4

mp4

19.5 MB

Download

Audio based Navigation( Please play with 2x speed and check description ).mp4

mp4

54.2 MB

Download

References

Anas Abou Allaban, Maozhen Wang, and Taşkın Padır. 2020. A Systematic Review of Robotics Research in Support of In-Home Care for Older Adults. Information 11, 2 (2020). https://doi.org/10.3390/info11020075Google ScholarCross Ref
Mohammed Aledhari, Rehma Razzak, Reza M. Parizi, and Gautam Srivastava. 2021. Multimodal Machine Learning for Pedestrian Detection. In IEEE Vehicular Technology Conference. IEEE, New York, NY, USA, 1–7. https://doi.org/10.1109/VTC2021-Spring51267.2021.9448692Google ScholarCross Ref
Fady Alnajjar, Sumayya Khalid, Alistair A Vogan, Shingo Shimoda, Rui Nouchi, and Ryuta Kawashima. 2019. Emerging cognitive intervention technologies to meet the needs of an aging population: a systematic review. Frontiers in Aging Neuroscience 11 (2019), 291.Google ScholarCross Ref
Nicola Bellotto and Huosheng Hu. 2009. Multisensor-Based Human Detection and Tracking for Mobile Service Robots. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 1 (2009), 167–181. https://doi.org/10.1109/TSMCB.2008.2004050Google ScholarDigital Library
Changan Chen, Ziad Al-Halah, and Kristen Grauman. 2021. Semantic Audio-Visual Navigation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, New York, NY, USA, 15516–15525.Google Scholar
Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, and Kristen Grauman. 2020. Soundspaces: Audio-visual navigation in 3d environments. In European Conference on Computer Vision. Springer, New York, NY, USA, 17–36.Google ScholarDigital Library
John R Frost and Lawrence D Stone. 2001. Review of search theory: advances and applications to search and rescue decision support. US Coast Guard Research and Development Center CG-D-15-01 (2001).Google Scholar
Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, and Joshua B. Tenenbaum. 2020. Look, Listen, and Act: Towards Audio-Visual Embodied Navigation. In IEEE International Conference on Robotics and Automation. IEEE, New York, NY, USA, 9701–9707.Google ScholarCross Ref
Ruohan Gao and Kristen Grauman. 2019. Co-Separating Sounds of Visual Objects. In IEEE/CVF International Conference on Computer Vision. IEEE, New York, NY, USA, 3879–3888.Google Scholar
Petros Giannakopoulos, Aggelos Pikrakis, and Yannis Cotronis. 2021. A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments. arXiv preprint arXiv:2110.12778 (2021).Google Scholar
Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, and Jitendra Malik. 2017. Cognitive Mapping and Planning for Visual Navigation. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, New York, NY, USA, 2616–2625.Google Scholar
A.D. Heyes. 1983. Human navigation by sound. Physics in Technology 14, 2 (mar 1983), 68. https://doi.org/10.1088/0305-4624/14/2/I02Google ScholarCross Ref
YC Ho and RCKA Lee. 1964. A Bayesian approach to problems in stochastic estimation and control. IEEE transactions on automatic control 9, 4 (1964), 333–339.Google ScholarCross Ref
Jie Huang, T. Supaongprapa, I. Terakura, N. Ohnishi, and N. Sugie. 1997. Mobile robot and sound localization. In IEEE/RSJ International Conference on Intelligent Robot and Systems, Vol. 2. IEEE, New York, NY, USA, 683–689 vol.2.Google Scholar
Charles C. Kemp, Aaron Edsinger, Henry M. Clever, and Blaine Matulevich. 2022. The Design of Stretch: A Compact, Lightweight Mobile Manipulator for Indoor Human Environments. In International Conference on Robotics and Automation. IEEE, New York, NY, USA, 3150–3157. https://doi.org/10.1109/ICRA46639.2022.9811922Google ScholarDigital Library
Rui Luo, Dmitry Sinyukov, and Taskin Padir. 2021. Novelti 2.0: Enhanced Assistive Robot Navigation with Low Throughput Interfaces. In PErvasive Technologies Related to Assistive Environments Conference. Association for Computing Machinery, New York, NY, USA, 123–131. https://doi.org/10.1145/3453892.3458042Google ScholarDigital Library
George Mois and Jenay M Beer. 2020. Robotics to support aging in place. In Living with Robots. Elsevier, New York, NY, USA, 49–74.Google Scholar
George Mois and Jenay M Beer. 2020. The role of healthcare robotics in providing support to older adults: a socio-ecological perspective. Current Geriatrics Reports 9, 2 (2020), 82–89.Google ScholarCross Ref
Sudipta Paul, Amit Roy-Chowdhury, and Anoop Cherian. 2022. AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). Vol. 35. Curran Associates, Inc., Red Hook, NY, USA, 6236–6249.Google Scholar
Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, and In So Kweon. 2021. Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (2021), 1605–1619.Google ScholarCross Ref
Dmitry A. Sinyukov and Taşkın Padır. 2018. A Novel Shared Position Control Method for Robot Navigation Via Low Throughput Human-Machine Interfaces. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, New York, NY, USA, 3913–3920. https://doi.org/10.1109/IROS.2018.8593921Google ScholarDigital Library
Hao Sun, Peng Yang, Zuojun Liu, Linan Zu, and Qinqi Xu. 2011. Microphone array based auditory localization for rescue robot. In Chinese Control and Decision Conference. IEEE, New York, NY, USA, 606–609.Google ScholarCross Ref
Loreto Susperregi, Jose Maria Martínez-Otzeta, Ander Ansuategui, Aitor Ibarguren, and Basilio Sierra. 2013. RGB-D, Laser and Thermal Sensor Fusion for People following in a Mobile Robot. International Journal of Advanced Robotic Systems 10, 6 (2013), 271. https://doi.org/10.5772/56123Google ScholarCross Ref
Shen Khang Teoh, Yiek Heng Wong, Chun Farn Leong, and Lyk Yin Tan. 2021. Face Detection and Face Re-identification System Using Deep Learning and OpenVINO. In 2021 2nd International Conference on Artificial Intelligence and Data Sciences (AiDAS). 1–5. https://doi.org/10.1109/AiDAS53897.2021.9574201Google ScholarCross Ref
Jing Yuan, Huan Chen, Fengchi Sun, and Yalou Huang. 2015. Multisensor Information Fusion for People Tracking With a Mobile Robot: A Particle Filtering Approach. IEEE Transactions on Instrumentation and Measurement 64, 9 (2015), 2427–2442. https://doi.org/10.1109/TIM.2015.2407512Google ScholarCross Ref
Xiaoyang Zhao, Zhi Zhu, Mingshan Liu, Chongyu Zhao, Yafei Zhao, Jia Pan, Zheng Wang, and Chuan Wu. 2020. A smart robotic walker with intelligent close-proximity interaction capabilities for elderly mobility safety. Frontiers in Neurorobotics 14 (2020), 575889.Google ScholarCross Ref
Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, and Ali Farhadi. 2017. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In IEEE International Conference on Robotics and Automation. IEEE, New York, NY, USA, 3357–3364.Google ScholarDigital Library
Mark Zolotas, Joshua Elsdon, and Yiannis Demiris. 2018. Head-Mounted Augmented Reality for Explainable Robotic Wheelchair Assistance. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, New York, NY, USA, 1823–1829. https://doi.org/10.1109/IROS.2018.8594002Google ScholarDigital Library
Andrea Zunino, Marco Crocco, Samuele Martelli, Andrea Trucco, Alessio Del Bue, and Vittorio Murino. 2015. Seeing the Sound: A New Multimodal Imaging Device for Computer Vision. In IEEE International Conference on Computer Vision Workshops. IEEE, New York, NY, USA, 693–701.Google Scholar

Index Terms

Sporadic Audio-Visual Embodied Assistive Robot Navigation For Human Tracking

Recommendations

Audio-Visual SLAM towards Human Tracking and Human-Robot Interaction in Indoor Environments
2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)
We propose a novel audio-visual simultaneous and localization (SLAM) framework that exploits human pose and acoustic speech of human sound sources to allow a robot equipped with a microphone array and a monocular camera to track, map, and interact with ...
Read More
Sound and Visual Tracking for Humanoid Robot

Mobile robots capable of auditory perception usually adopt the “stop-perceive-act” principle to avoid sounds made during moving due to motor noise. Although this principle reduces the complexity of the problems involved in auditory processing for mobile ...
Read More
Human-aware robot navigation: A survey

Navigation is a basic skill for autonomous robots. In the last years human-robot interaction has become an important research field that spans all of the robot capabilities including perception, reasoning, learning, manipulation and navigation. For ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments
July 2023
797 pages
ISBN:9798400700699
DOI:10.1145/3594806

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 August 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Assistive robotics
Audio-visual embodied navigation
Human search and tracking
Probabilistic inference
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 67
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Sporadic Audio-Visual Embodied Assistive Robot Navigation For Human Tracking

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Audio-Visual SLAM towards Human Tracking and Human-Robot Interaction in Indoor Environments

Sound and Visual Tracking for Humanoid Robot

Human-aware robot navigation: A survey

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Sporadic Audio-Visual Embodied Assistive Robot Navigation For Human Tracking

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Audio-Visual SLAM towards Human Tracking and Human-Robot Interaction in Indoor Environments

Sound and Visual Tracking for Humanoid Robot

Human-aware robot navigation: A survey

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media