Overview
At NVIDIA GTC 2024, Li Auto's intelligent driving R&D lead Jia Peng described the company's approach and recent development progress in autonomous driving.
Summary of the presentation
- Trend: Autonomous driving has become a primary direction in China’s automotive industry, and the use of large language models is increasingly common.
- Technical evolution: The field is shifting from rule-based systems to knowledge-driven approaches, with end-to-end models enabling explicit modeling of perception and decision functions.
- End-to-end model: A single pipeline covers perception, tracking, prediction, decision, and planning to form a complete autonomous driving system.
- Development framework: A dual fast-and-slow system provides both rapid reflexive responses and deliberative reasoning, improving performance in complex traffic scenarios.
- Long-term view: Cognitive models using multimodal large language models aim to interpret and handle unknown scenes, world models provide realistic simulation environments, and data closed loops enable continuous learning and adaptation.

01 From rule-based to knowledge-driven
Li Auto began in-house autonomous driving development in 2021, starting from L2 functionality and gradually extending to scenarios such as highway NOA. Historically, autonomous driving relied heavily on code and hand-crafted rules. With increasing complexity and more demanding scenarios, the industry is moving toward data-driven methods. Successes from companies like Tesla illustrate the effectiveness of data-driven approaches.
Li Auto has modeled perception, tracking, prediction, decision, and planning modules, integrating them into an end-to-end framework. These models can be virtualized for training and testing in simulation environments. End-to-end modeling enables autonomous driving across diverse scenarios, with potential improvements in safety and efficiency.
02 Li Auto development framework
Li Auto has implemented a fast-and-slow system architecture, analogous to human quick reactions and slow deliberation. The fast system handles intuitive, immediate responses, while the slow system performs logical reasoning and decision-making. This framework combines perception, cognition, and decision layers to better handle complex traffic conditions.
Progress has been made across end-to-end models, cognitive models, world models, and data closed loops. End-to-end models aim to support full-scenario autonomous driving, including urban roads and signalized intersections. Cognitive models leverage multimodal large language models to interpret unfamiliar scenes and support decision-making. World models reconstruct and generate realistic environments for simulation. Closed-loop data systems enable continuous learning and adaptation to new scenarios and challenges.
Conclusion
As autonomous driving technologies evolve, Li Auto continues iterative development toward higher levels of autonomy, integrating end-to-end modeling, cognition, simulation-based world models, and data closed loops to handle increasingly complex driving scenarios.