Research

Publications
Improving Generalization of Language-Conditioned Robot Manipulation
Chenglin Cui,Chaoran Zhu, Changjae Oh, Andrea Cavallaro
in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), Oct. 2025.

Abstract: The control of a robot for manipulation tasks generally relies on visual inputs. Recent advances in vision-language models (VLMs) enable using natural language instructions to condition this input and operate robots in a wider range of environments. However, existing methods require a large amount of data to fine-tune the VLM to operate in unseen environments. In this paper, we present a framework that learns object-arrangement tasks with just a few-shot demonstrations. We present a two-stage framework that divides object-arrangement tasks into a target localization stage, for picking an object, and a region determination stage, for placing the object. We present an instance-level semantic fusion module that maps the instance-level image crops with the input text embedding, enabling the model to identify the target objects defined by the natural language instructions. We validate our method on both simulation and real-robot environments. Our method, fine-tuned with a few-shot demonstration, improves generalization capability and shows zero-shot ability in real-robot manipulation scenarios.

BEVMOT: A multi-object detection and tracking method in Bird's-Eye-View via Spatiotemporal Transformers
Chenglin Cui, Song Ruiqi, Li Xinqing and Ai Yunfeng
IEEE Intelligent Transportation Systems Conference (ITSC)

Abstract: Camera-based Bird’s-Eye-View (BEV) 3D object detection is thoroughly challenging and essential in autonomous driving perception system for alleviating the impression caused by object overlap and occlusion effectively. 3D multi-object tracking is one of the most important perception tasks in autonomous driving system, which suffers from the narrow view provided by only one single camera. In this paper, we present a novel framework called BEVMOT, which combines multi-object detection and tracking task in a unified framework in a considerable inference speed. The encoder of our framework generates the 360 degree panoramic image around the ego-car in the bird's eyes view utilizing the multiple cameras equipped on the car to broaden the perception field. An efficient decoder composes of multi-head attention and deformable attention following by a multi-object detection and tracking head, which learn the object center point and tracking embedding for acquiring object boxes and tracking id directly without the NMS post-processing. Meanwhile, tracking branch utilize tracking embedding to initialize new trajectories, update existing trajectories and achieve data association frame-by-frame. Extensive experiments show that our approach represents wonderful performance on NuScenes datasets, which exceeds many classic methods in term of AMOTA and AMOTP metrics.

A Real-time Road Boundary Detection Approach in Surface Mine Based on Meta Random Forest
with Ai Yunfeng, Song Ruiqi, Tian Bin and Chen Long
IEEE Transactions on Intelligent Vehicles, doi: 10.1109/TIV.2023.3296767

Decoupled Real-Time Trajectory Planning for Multiple Autonomous Mining Trucks in Unloading Areas
with Yang Qingyuan, Ai Yunfeng, Teng Siyu, Gao Yu, Tian Bin and Chen Long
IEEE Transactions on Intelligent Vehicles, doi: 10.1109/TIV.2023.3312813

VSSeg: Point Cloud Segmentation for Unstructured Region in Surface Mine
with Huang Chongqing, Song Ruiqi, Li Xinqing and Ai Yunfeng
IEEE Intelligent Transportation Systems Conference (ITSC)

An Improved Cloth Simulation Filtering Algorithm Based on Mining Point Cloud
with Ren Liangcai, Tang Jianlin, Song Ruiqi, and Ai Yunfeng
International Conference on Cyber-Physical Social Intelligence (ICCSI), doi: 10.1109/ICCSI53130.2021.9736201

Patent
An anti-collision method, device and system of profile-ring for mining unmanned vehicle
with Ai Yunfeng and Yang Qingyuan , [P]. Beijing: CN115973196B, 2023-06-16
Competition
The National Second Prize (Basic Four-wheel Group), 2020 The 15th National College Student Intelligent Car Competition
Chinese University Class A Competition
View video