Optimizing Mobileye's REM: Graviton & Triton for ML Inference

Mobileye’s Road Experience Management (REM) system is fundamental to its autonomous driving ecosystem, generating and maintaining highly accurate, crowdsourced high-definition maps. This process, crucial for vehicle localization, navigation, and identifying road changes, demands computationally intensive operations. This article details Mobileye’s optimization journey for REM’s Change Detection subsystem, specifically focusing on the CDNet deep learning model, aiming to minimize costs and maximize throughput.

Initially, Mobileye faced challenges with CDNet inference, despite GPUs offering superior raw performance (54.8 samples/sec vs. 5.85 on CPU). They strategically opted for CPU-based inference on Amazon EC2 Spot Instances, prioritizing overall cost efficiency. This decision was driven by the high cost and lower Spot availability of GPUs, and the fact that the broader change detection pipeline had many CPU-suited components, leading to GPU idleness. The initial CPU setup, however, was bottlenecked by each process loading its own 8.5 GB CDNet model, limiting tasks per instance and consuming 50% of task time for model initialization.

3 SaaS Tools Bundle — Limited Time Lifetime Deal

.rll-youtube-player .play{--wpr-bg-6bf21712-0edb-4f54-ab4d-e9ad4c422de9: url('https://chatgptautomations.com/wp-content/plugins/wp-rocket/assets/img/youtube.png');}

Limited Time

🔥 Lifetime Deal Bundle

3 SaaS Tools for the Price of 2

"It's not SaaS of the Day — It's Must Have SaaS"

🔗 Auto Backlinks Builder

📰 AI Content Aggregator

🖼️ AI Post Image Generator

1 Site

^$98

Lifetime

3 Sites

^$198

Lifetime

10 Sites

^$498

Lifetime

50 Sites

^$1398

Lifetime

Get the Bundle — Save 33% →

One-time payment · No subscription · All 3 tools included · Limited time offer

A significant breakthrough came with integrating Triton Inference Server. By centralizing model inference, the memory footprint per task plummeted to 2.5 GB, and the average task runtime halved from four to two minutes. This optimization alone doubled efficiency, allowing full CPU utilization with 32 tasks per 32-vCPU instance. Further refining, Mobileye reduced the Triton Docker image size from 15 GB to 2.7 GB by trimming unnecessary backends, enhancing memory efficiency and startup times.

The final optimization involved adopting AWS Graviton instances, leveraging their ARM architecture, Neon vector processing, and bfloat16 support for ML workloads. This move significantly increased instance diversification and Spot availability, crucial for handling millions of daily change detection tasks. Graviton instances demonstrated superior performance (19.4 samples/sec on r8g.8xlarge compared to 13.5 on comparable non-Graviton CPUs) and price-performance. The migration was seamless due to Triton and modern AI/ML frameworks supporting Graviton. Ultimately, these combined efforts yielded “more than a 2x improvement in throughput,” delivering substantial cost savings and an improved user experience for Mobileye’s customers.

The ai automation mobileye system leverages advanced machine learning processors to enhance real-time decision making in autonomous vehicle navigation.

While Mobileye focuses on autonomous driving inference, similar chatgpt automation inference techniques can be applied across various machine learning deployment scenarios.

(Source: https://aws.amazon.com/blogs/machine-learning/optimizing-mobileyes-rem-with-aws-graviton-a-focus-on-ml-inference-and-triton-integration/)