SQIL:

Saliency-aware Quantized Imitation Learning

Seongmin Park1, Hyungmin Kim1, Sangwoo Kim1, Wonseok Jeon2,
Juyoung Yang2, Byeongwook Jeon2, Yoonseon Oh1, and Jungwook Choi1*

1Hanyang University, 2Hyundai Motor Company
Seoul, Republic of Korea
1{skstjdals, kong4274, kimzl121, yoh21}@hanyang.ac.kr
2{wsjeon, jyjy6711, smiler}@hyundai.com, 1*choij@hanyang.ac.kr

AI-IN Lab Logo
Teaser Image

SQIL is the first systematic study of Quantized Imitation Learning, revealing that most quantized failures occur at mission-critical states requiring fine-grained control. By leveraging policy-driven saliency (SIS) and a SIS-weighted 4-bit QAT scheme, SQIL achieves 2–4× efficiency gains while preserving full-precision-level success rates across real-world robotics, autonomous driving and physics simulation.

Abstract

Deep neural network (DNN)-based policy models, such as vision-language-action (VLA) models, excel at automating complex decision-making from multi-modal inputs. However, scaling these models greatly increases computational overhead, complicating deployment in resource-constrained settings like robot manipulation and autonomous driving. To address this, we propose Saliency-Aware Quantized Imitation Learning (SQIL), which combines quantization-aware training with a selective loss-weighting strategy for mission-critical states. By identifying these states via saliency scores and emphasizing them in the training loss, SQIL preserves decision fidelity under low-bit precision. We validate SQIL's generalization capability across extensive simulation benchmarks with environment variations, real-world tasks, and cross-domain tasks (self-driving, physics simulation), consistently recovering full-precision performance. Notably, a 4-bit weight-quantized VLA model for robotic manipulation achieves up to 2.5x speedup and 2.5x energy savings on an edge GPU with minimal accuracy loss. These results underline SQIL's potential for efficiently deploying large IL-based policy models on resource-limited devices.

Method & Analysis

Quantization compresses policy parameters to low-bit precision, reducing compute and memory. Given full-precision weights \( w^{\text{FP}} \), we apply symmetric uniform quantization:

$$ w^{Q} = \text{Clip}\left( \left\lfloor \frac{w^{\text{FP}}}{\gamma} \right\rceil, -2^{b-1}, 2^{b-1}-1 \right) $$

This enables a quantized policy \( \pi^Q_\theta \) that is efficient, but incur performance loss at high-sensitivity states, as illustrated in the introductory figure.

Saliency-based Importance Score (SIS)

To detect such mission-critical states, SQIL computes a Saliency-based Importance Score (SIS):

$$ \text{SIS}(s_t) = \mathbb{E}_k \left[ \left\| \pi(s_t) - \pi(\phi(s_t, k)) \right\|^2 \right] $$

where \( \phi(s_t, k) \) introduces a local state perturbation at location \(k\). High SIS indicates strong sensitivity in decision-making. Teaser Image

Saliency-aware Quantized Imitation Learning (SQIL)

SQIL enhances imitation learning under quantization by combining two complementary components: quantization-aware training (QAT) and quantization-robust action distillation (QRD). QAT aligns the quantized policy with expert actions, while QRD further reduces quantization errors by matching the output distribution of the quantized policy to that of the full-precision (FP) policy.

To identify which states deserve more focus during distillation, we use the saliency-based importance score (SIS). QRD applies a selective weighting coefficient \(\alpha_t\), assigning larger weights to mission-critical states—those with high SIS values.

$$ \mathcal{L}^{\text{SQIL}}(\theta) = \underbrace{- \log \pi^Q_\theta(a_t|s_t)}_{\text{QAT}} + \underbrace{\alpha_t \cdot D(\pi^Q_\theta(\cdot|s_t) || \pi^{FP}(\cdot|s_t))}_{\text{QRD weighted by SIS}} $$

Here, \( D(\cdot || \cdot) \) is a discrepancy metric such as the L2 norm, and \(\alpha_t = \beta\) for the top 20% highest SIS states (\( \text{SIS}(s_t) > T \)), otherwise 1. This weighting emphasizes learning from states most affected by quantization. As shown in experiments, this mechanism significantly reduces action discrepancies and improves control fidelity under 4-bit quantization.

Teaser Image Teaser Image

Keyframe (KF) methods identify coarse transitions (e.g., "drawer open") using object state or vision-language cues. SIS captures finer interaction moments like grasping or releasing, by measuring control sensitivity, improving performance under quantization (+1.1% over KF).

Teaser Image

Saliency visualization shows how quantization distorts the policy's attention. While the FP policy attends to meaningful regions (e.g., robot arm, bowl, plate), PTQ often misfocuses on irrelevant areas. SQIL successfully restores the focus pattern of the FP policy, producing saliency maps that align closely with expert behavior.

Action Distribution Comparison
Action Distribution Comparison:
This figure compares the action distributions of FP, PTQ, QAT, QRD, and SQIL in a self-driving task.

PTQ deviates significantly from FP due to quantization noise.
QAT aligns peaks with expert actions but overly sharpens the distribution.
QRD maintains FP-like shape but may underrepresent expert intent.
SQIL combines both benefits—preserving the FP structure while prioritizing expert-like decisions.

Experiments

Teaser Image Teaser Image

Despite operating under 4-bit quantization, SQIL outperforms other quantized baselines and matches full-precision performance across real-world and cross-domain tasks, demonstrating its robustness and generality.

Teaser Image

In autonomous driving, our 4-bit model achieves up to 3.7× lower latency and 3.1× energy savings.
In robot manipulation, INT4 provides 2.5× speedup and 4× memory reduction, enabling efficient inference on edge devices.

Rollout Videos

Real-World Robot Manipulation: Qunatized OpenVLA

Tabletop task: Sweep the gray cloth to the left side of the table

Baseline FP

Success

PTQ W4

Failed to grasp the cloth

SQIL W4 (Ours)

Success

Tabletop task: pick up the green cup and put it into the brown cup

Baseline FP

Success

PTQ W4

Failed to place the cup accurately

SQIL W4 (Ours)

Success

BridgeData V2 task: stack purple cup on green cup

Baseline FP

Success

PTQ W4

Failed to pick up the purple cup

SQIL W4 (Ours)

Success

BridgeData V2 task: put eggplant into pot

Baseline FP

Success

PTQ W4

Failed to pick up the eggplant

SQIL W4 (Ours)

Success

Simulation-based Robot Manipulation: Quantized OpenVLA on LIBERO Benchmark

LIBERO-Spatial: pick up the black bowl on the stove and place it on the plate

Baseline FP

Success

PTQ W4

Failure

QAT W4

Failure

SQIL W4 (Ours)

Success

LIBERO-Object: pick up the cream cheese and place it in the basket

Baseline FP

Success

PTQ W4

Failure

QAT W4

Failure

SQIL W4 (Ours)

Success

LIBERO-Goal: push the plate to the front of the stove

Baseline FP

Success

PTQ W4

Failure

QAT W4

Failure

SQIL W4 (Ours)

Success

LIBERO-Long: put the black bowl in the bottom drawer of the cabinet and close it

Baseline FP

Success

PTQ W4

Failure

QAT W4

Failure

SQIL W4 (Ours)

Success

Autonomous Driving: Quantized CILRS on NoCrash-dense Benchmark

Baseline FP
Successfully completed driving without collisions with vehicles or pedestrians
QAT W4
Driving failed due to a collision with a vehicle
SQIL W4
Successfully completed driving without collisions with vehicles or pedestrians

BibTeX

@article{park2025saliency,
  title={Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control},
  author={Park, Seongmin and Kim, Hyungmin and Kim, Sangwoo and Jeon, Wonseok and Yang, Juyoung and Jeon, Byeongwook and Oh, Yoonseon and Choi, Jungwook},
  journal={arXiv preprint arXiv:2505.15304},
  year={2025}
}