Tesla FSD Chip Microarchitecture: A Deep Dive

ampheoampheo
4 min read

Tesla’s Full Self-Driving (FSD) Chip is a custom-designed ASIC optimized for vision-based autonomous driving. Here’s a breakdown of its microarchitecture, from silicon to software:


1. Key Specifications (HW3/HW4)

MetricFSD Chip (HW3)FSD Chip (HW4)
Process Node14nm (Samsung)7nm (Samsung)
Die Size260 mm²~200 mm² (est.)
Transistors6B10B+ (est.)
Peak TOPS144 TOPS (INT8)256 TOPS (INT8)
Power Consumption36W45W (est.)
Cameras Supported8x 1.2MP12x 5MP

2. Block Diagram & Core Components

plaintext

┌─────────────────────────────────────────────────────┐
│                   Tesla FSD Chip                    │
├───────────────────┬─────────────────┬───────────────┤
│    **Dual NPUs**  │ **GPU**         │ **CPU Cores** │
│ (Neural Processor)│ (Custom)        │ (ARM A72)     │
├───────────────────┼─────────────────┼───────────────┤
│ - 96x96 MAC array │ - 1TFLOPS (FP32)│ - 12x A72     │
│ - 2GHz clock      │ - Texture units │ - Lockstep    │
│ - 32MB SRAM cache │                 │   for ASIL-D  │
└───────────────────┴─────────────────┴───────────────┘

3. Neural Processing Unit (NPU) – The Secret Sauce

  • Array Structure:

    • 96x96 MAC (Multiply-Accumulate) units per NPU (x2 in HW3).

    • Optimized for 8-bit integer (INT8) operations (95% of Tesla’s NN workloads).

  • On-Chip Memory:

    • 32MB SRAM cache (vs. 4–8MB in competing chips like NVIDIA Xavier).

    • Reduces DRAM access latency by 5x (critical for real-time inference).

  • Custom ISA:

    • Supports Tesla’s HydraNet multi-task learning (simultaneous detection/lane prediction).

4. CPU & GPU Components

  • CPU:

    • 12x ARM Cortex-A72 (64-bit) in triple-redundant lockstep for ASIL-D safety.

    • Runs lightweight tasks (sensor polling, CAN bus communication).

  • GPU:

    • Custom-designed ~1 TFLOPS FP32 unit (not for graphics, but for post-processing).

    • Handles non-ML tasks like image warping (for multi-camera stitching).


5. Memory Hierarchy

plaintext

┌──────────────┐       ┌──────────────┐
│  32MB SRAM   │◄─────►│   Dual NPUs  │ (On-chip)
└──────────────┘       └──────────────┘
       ▲                      
       │  ~100GB/s bandwidth
┌──────────────┐              
│  8GB LPDDR4  │ (Off-chip)  
└──────────────┘
  • SRAM-First Design: Minimizes external memory access (power-efficient).

  • No HBM/GDDR: Unlike NVIDIA/AMD chips, Tesla prioritizes latency over bandwidth.


6. Software Stack Integration

  • Compiler: Custom toolchain converts PyTorch models to NPU-optimized bytecode.

  • Real-Time OS: Lightweight Tesla OS (modified Linux) with <10μs interrupt latency.

  • HydraNet: Runs 48 neural networks in parallel (e.g., traffic light, obstacle, depth estimation).


7. HW3 vs. HW4 Improvements

FeatureHW3 (2019)HW4 (2023)
NPUs2x 96x96 MAC2x 128x128 MAC (est.)
Camera Inputs8x 1.2MP (HDR)12x 5MP (HDR++)
SafetyASIL-BASIL-D
Backward CompatibleNoYes (with HW3 cameras)

8. Benchmark vs. Competitors

ChipTOPS (INT8)PowerSRAMUse Case
Tesla FSD HW425645W32MBVision-only autonomy
NVIDIA Orin25460W8MBMulti-sensor fusion
Mobileye EyeQ64810W16MBL2+ ADAS

Why Tesla’s NPU Wins:

  • 5x TOPS/mm² efficiency vs. GPUs (dedicated silicon for vision NNs).

  • Zero external memory access for common ops (e.g., convolutions).


9. Limitations & Trade-Offs

  • No LiDAR/Radar Support: HW4 still lacks hardware accelerators for time-of-flight processing.

  • Fixed-Precision Only: No FP16/FP32 in NPUs (limits future model complexity).

  • Thermal Constraints: Sustained 45W requires liquid cooling in Cybertruck.


10. The Dojo Connection

  • Dojo D1 Chip: Scaled-up version of FSD NPU (354 TOPS, 1.25TB/s fabric).

  • Training-Inference Symmetry: Models trained on Dojo map directly to FSD NPUs.


Key Takeaways

  1. Domain-Specific Design: Tesla’s NPUs are optimized only for camera-based autonomy.

  2. Memory is King: 32MB SRAM avoids the "memory wall" that bottlenecks GPUs.

  3. Vertical Integration: From silicon (Samsung) to software (HydraNet), Tesla controls the stack.

For sensor fusion (LiDAR/radar), FPGAs still dominate—but for vision-only scale, Tesla’s ASIC approach is unmatched.

0
Subscribe to my newsletter

Read articles from ampheo directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

ampheo
ampheo