AI-Driven Robotic Semantic SLAM: Real-Time Depth and Adaptive Mapping for Autonomous Navigation in Complex Environments

anonymous 3
Preprint 2025
Visual SLAM

The framework integrates Depth Anything V2 for real-time dense depth estimation, CLIP for zero-shot semantic feature classification, and LLaMA for dynamic mapping adjustments. Depth Anything V2 generates consistent depth maps, while CLIP identifies contextual features (e.g., underwater scenes, clutter, reflective surfaces) to guide optimization. LLaMA refines SLAM parameters based on detected features and scene descriptions, enabling adaptability to dynamic environments such as underwater or cluttered scenarios. The integration of these components within SAFARI ensures robust 3D mapping, improved pose optimization, and enhanced scalability in complex conditions.

Blueye navigating in the pool.

The state of art maring pool.

Abstract

This paper introduces SAFARI (Semantic-Aware Framework for Adaptive and Robust Robotic SLAM Implementation), an AI-driven framework tailored for autonomous navigation in complex and dynamic environments. Traditional SLAM systems often face challenges with dynamic objects, reflective surfaces, and ambiguous structures due to their reliance solely on geometric features. SAFARI integrates DepthAnythingV2 for real-time depth estimation, CLIP for zero-shot semantic scene understanding, and LLaMA for adaptive optimization. LLaMA enables real-time analysis and dynamic parameter tuning, such as adjusting tracking and mapping iterations based on scene complexity and feature density. A novel semantic-depth fusion module combines semantic insights with geometric data to refine depth maps, optimize keyframe selection, and enhance mapping accuracy. The framework is evaluated on the Replica and KU Marine Pool (KUMP) datasets, the latter designed to simulate underwater robotic navigation scenarios with real-world challenges such as low visibility, reflective surfaces, and dynamic currents. SAFARI demonstrates superior performance compared to traditional methods, achieving significant improvements in Absolute Trajectory Error (ATE), Relative Pose Error (RPE), and perceptual mapping metrics. While SAFARI showcases robust performance, challenges persist in handling highly reflective surfaces and achieving precise semantic segmentation. Future work will enhance semantic-depth fusion algorithms, integrate advanced segmentation techniques, and expand the KUMP dataset for broader robotic applications. SAFARI represents a significant advancement in AI-powered SLAM, offering an adaptable and reliable solution for autonomous systems in diverse and challenging environments.

BibTeX


  @article{Paper,
  author = {anonymous and et al.},
  title = {..},
  year = {2025},
  publisher = {TRO},
  doi = {......},
  url = {https://doi.org/.....}}
        
  @misc{Dataset,
  author = {anonymous and et al.},
  title = {....},
  year = {2025},
  publisher = {},
  version = {V1},
  doi = {xxx},
  url = {https://doi.org/....}}