AI-Driven Robotic Semantic SLAM: Real-Time Depth and Adaptive Mapping for Autonomous Navigation in Complex Environments

Abstract

This paper introduces SAFARI (Semantic-Aware Framework for Adaptive and Robust Robotic SLAM Implementation), an AI-driven framework tailored for autonomous navigation in complex and dynamic environments. Traditional SLAM systems often face challenges with dynamic objects, reflective surfaces, and ambiguous structures due to their reliance solely on geometric features. SAFARI integrates DepthAnythingV2 for real-time depth estimation, CLIP for zero-shot semantic scene understanding, and LLaMA for adaptive optimization. LLaMA enables real-time analysis and dynamic parameter tuning, such as adjusting tracking and mapping iterations based on scene complexity and feature density. A novel semantic-depth fusion module combines semantic insights with geometric data to refine depth maps, optimize keyframe selection, and enhance mapping accuracy. The framework is evaluated on the Replica and KU Marine Pool (KUMP) datasets, the latter designed to simulate underwater robotic navigation scenarios with real-world challenges such as low visibility, reflective surfaces, and dynamic currents. SAFARI demonstrates superior performance compared to traditional methods, achieving significant improvements in Absolute Trajectory Error (ATE), Relative Pose Error (RPE), and perceptual mapping metrics. While SAFARI showcases robust performance, challenges persist in handling highly reflective surfaces and achieving precise semantic segmentation. Future work will enhance semantic-depth fusion algorithms, integrate advanced segmentation techniques, and expand the KUMP dataset for broader robotic applications. SAFARI represents a significant advancement in AI-powered SLAM, offering an adaptable and reliable solution for autonomous systems in diverse and challenging environments.

BibTeX

@article{Paper, author = {anonymous and et al.}, title = {..}, year = {2025}, publisher = {TRO}, doi = {......}, url = {https://doi.org/.....}} @misc{Dataset, author = {anonymous and et al.}, title = {....}, year = {2025}, publisher = {}, version = {V1}, doi = {xxx}, url = {https://doi.org/....}}

AI-Driven Robotic Semantic SLAM: Real-Time Depth and Adaptive Mapping for Autonomous Navigation in Complex Environments

Abstract

Fig. 5: Results for Room0 of the Replica dataset using segment masks. The figure showcases ground truth RGB, ground truth depth, rasterized RGB (PSNR: 34.25), rasterized depth (L1: 0.00), rasterized silhouette, and the rasterized semantic map (IoU: 0.8916).

Fig. 6: Results for Room0 of the Replica dataset without using segment masks. The figure showcases ground truth RGB, ground truth depth, rasterized RGB (PSNR: 20.18), rasterized depth (L1: 0.23), rasterized silhouette, and the depth difference (L1).

Fig. 7: Feature confidence trends over time for the underwater dataset. The plot illustrates the confidence variations of key features such as underwater camera, marine plants, and coral reef across time indices.

Fig. 8: Top 10 features by average confidence for the underwater dataset. The bar plot highlights the dominance of underwater camera and marine plants as the most confident features.

Fig. 9: Confidence heatmap showing the relationship between features and recommendations for the underwater dataset. The heatmap emphasizes high-confidence features and their corresponding recommendations, providing actionable insights for optimizing underwater mapping performance.

Fig. 10: Visualization of ground truth and rasterized data at time step 19. The figure includes ground truth RGB, ground truth depth, rasterized RGB (PSNR: 19.53), rasterized depth (L1: 0.09), rasterized silhouette, and depth difference (L1).

BibTeX