Hierarchical Intermediate Feature Description and Alignment across Point Clouds and Images without Human Annotations
Precisely interpreting the complementarity between point clouds and images is crucial for effective environmental perception by automated systems. Traditional methods, reliant on dense human annotations for training, often fail to generalize to unseen scenarios due to their costly and straightforward alignment strategies. This project aims to align point clouds and images without human labels, a challenge largely unexplored despite advancements in fully supervised methods. The approach involves deriving semantic and geometric insights from raw data and integrating intra- and cross-modal consensuses as supervision signals for neural networks. Innovative techniques are introduced, including a task-agnostic pipeline to construct unified features for group-level alignment between modalities, and a novel self-supervised learning framework for learning transferable features without human annotations. Additionally, a new architecture is developed to identify point-pixel correspondences using consensus-driven features for adaptive alignment. This research aims to advance multi-modal and self-supervised learning techniques, enhancing machine perception across modalities and potentially transforming applications in autonomous driving, robotic manipulation, and augmented reality.
Multi-modal Fusion for Autonomous and Intelligent Underwater Localization and Mapping
Marine Surveying is vital for managing the complex coastline and port activities of rapidly evolving coastal cities like Hong Kong. This research project focuses on enhancing Marine Surveying through the use of innovative Underwater Unmanned Vehicles (UUVs), designed for autonomous operation in Hong Kong's challenging marine environment. These UUVs leverage advanced spatial perception technologies for accurate multimodal localization and mapping, providing crucial data for maritime navigation and environmental monitoring. The introduction of UUVs represents a significant advancement in Marine Surveying, offering more precise and efficient surveying capabilities that improve sustainability, reduce safety risks, and enhance navigational safety. This initiative not only positions Hong Kong as a leader in underwater robotics but also has the potential to set a global standard for Marine Surveying. If successful, the broader impacts of this research could revolutionize underwater robotics and contribute significantly to control theory and computer vision, leading to safer and more cost-effective maritime operations worldwide.
Local Feature Extraction and Matching for Cross-modal Registration
Accurate cross-modal registration between real-scene images and 3D point clouds is essential for improving spatial location services and smart city applications. The main challenge in this process is local feature extraction and matching, where existing methods often fail to capture shared features or establish consistent keypoints across modalities. This project introduces a novel integrated framework designed to overcome these issues by enhancing shared feature description, keypoint detection, and feature matching between images and point clouds. The framework begins with an adaptive feature description using pseudo-siamese neural networks to develop a discriminative and compact algorithm tailored to the nuances of both modalities. A self-supervised keypoint detection mechanism employing non-maximal suppression follows, ensuring unique and repeatable detections. Finally, an optimal transport feature matching model, powered by attentional graph neural networks, offers globally optimal and efficient matching. This approach aims to advance cross-modal learning, metric learning, and deep learning, providing critical theoretical and practical advancements for cross-modal registration, with significant implications for spatial location services and smart city infrastructure.
Unsupervised Metric-Semantic Understanding from Large-Scale 3D Point Clouds
Recent advancements in metric-semantic understanding have led to the development of semantically annotated 3D models that enable more effective environmental interpretation by algorithms. Traditional methods, relying heavily on human-generated annotations, treat tasks like semantic segmentation and object detection as fully-supervised problems, which are resource-intensive and struggle to generalize across various real-world scenarios. This project proposes a paradigm shift towards unsupervised and zero-shot learning methods to interpret 3D scenes without human labels, overcoming the lack of explicit supervision. It utilizes semantic and shape priors from raw point clouds, integrating inductive biases as supervisory signals for neural network training. Innovations include an unsupervised pipeline that uses shape similarities and motion cues for semantic categorization and object boundary detection, a zero-shot learning framework for segmenting unseen object classes in point clouds, and a new architectural approach for reconstructing continuous 3D surfaces from discrete scans using local shape continuity as pseudo labels. These developments aim to enhance unsupervised and zero-shot learning techniques, refining fully-supervised methods and potentially transforming applications in autonomous driving, navigation, and mixed reality.