SQIL: Saliency-aware Quantized Imitation Learning

Abstract

Deep neural network (DNN)-based policy models, such as vision-language-action (VLA) models, excel at automating complex decision-making from multi-modal inputs. However, scaling these models greatly increases computational overhead, complicating deployment in resource-constrained settings like robot manipulation and autonomous driving. To address this, we propose Saliency-Aware Quantized Imitation Learning (SQIL), which combines quantization-aware training with a selective loss-weighting strategy for mission-critical states. By identifying these states via saliency scores and emphasizing them in the training loss, SQIL preserves decision fidelity under low-bit precision. We validate SQIL's generalization capability across extensive simulation benchmarks with environment variations, real-world tasks, and cross-domain tasks (self-driving, physics simulation), consistently recovering full-precision performance. Notably, a 4-bit weight-quantized VLA model for robotic manipulation achieves up to 2.5x speedup and 2.5x energy savings on an edge GPU with minimal accuracy loss. These results underline SQIL's potential for efficiently deploying large IL-based policy models on resource-limited devices.

Method & Analysis

Quantization compresses policy parameters to low-bit precision, reducing compute and memory. Given full-precision weights $ w^{\text{FP}} $, we apply symmetric uniform quantization:

$$ w^{Q} = \text{Clip}\left( \left\lfloor \frac{w^{\text{FP}}}{\gamma} \right\rceil, -2^{b-1}, 2^{b-1}-1 \right) $$

This enables a quantized policy $ \pi^Q_\theta $ that is efficient, but incur performance loss at high-sensitivity states, as illustrated in the introductory figure.

Saliency-based Importance Score (SIS)

To detect such mission-critical states, SQIL computes a Saliency-based Importance Score (SIS):

$$ \text{SIS}(s_t) = \mathbb{E}_k \left[ \left\| \pi(s_t) - \pi(\phi(s_t, k)) \right\|^2 \right] $$

where $ \phi(s_t, k) $ introduces a local state perturbation at location $k$. High SIS indicates strong sensitivity in decision-making. Teaser Image

Saliency-aware Quantized Imitation Learning (SQIL)

SQIL enhances imitation learning under quantization by combining two complementary components: quantization-aware training (QAT) and quantization-robust action distillation (QRD). QAT aligns the quantized policy with expert actions, while QRD further reduces quantization errors by matching the output distribution of the quantized policy to that of the full-precision (FP) policy.

To identify which states deserve more focus during distillation, we use the saliency-based importance score (SIS). QRD applies a selective weighting coefficient $\alpha_t$, assigning larger weights to mission-critical states—those with high SIS values.

$$ \mathcal{L}^{\text{SQIL}}(\theta) = \underbrace{- \log \pi^Q_\theta(a_t|s_t)}_{\text{QAT}} + \underbrace{\alpha_t \cdot D(\pi^Q_\theta(\cdot|s_t) || \pi^{FP}(\cdot|s_t))}_{\text{QRD weighted by SIS}} $$

Here, $ D(\cdot || \cdot) $ is a discrepancy metric such as the L2 norm, and $\alpha_t = \beta$ for the top 20% highest SIS states ($ \text{SIS}(s_t) > T $), otherwise 1. This weighting emphasizes learning from states most affected by quantization. As shown in experiments, this mechanism significantly reduces action discrepancies and improves control fidelity under 4-bit quantization.

Keyframe (KF) methods identify coarse transitions (e.g., "drawer open") using object state or vision-language cues. SIS captures finer interaction moments like grasping or releasing, by measuring control sensitivity, improving performance under quantization (+1.1% over KF).

Saliency visualization shows how quantization distorts the policy's attention. While the FP policy attends to meaningful regions (e.g., robot arm, bowl, plate), PTQ often misfocuses on irrelevant areas. SQIL successfully restores the focus pattern of the FP policy, producing saliency maps that align closely with expert behavior.

Action Distribution Comparison:
This figure compares the action distributions of FP, PTQ, QAT, QRD, and SQIL in a self-driving task.

• PTQ deviates significantly from FP due to quantization noise.
• QAT aligns peaks with expert actions but overly sharpens the distribution.
• QRD maintains FP-like shape but may underrepresent expert intent.
• SQIL combines both benefits—preserving the FP structure while prioritizing expert-like decisions.

Experiments

Despite operating under 4-bit quantization, SQIL outperforms other quantized baselines and matches full-precision performance across real-world and cross-domain tasks, demonstrating its robustness and generality.

In autonomous driving, our 4-bit model achieves up to 3.7× lower latency and 3.1× energy savings.
In robot manipulation, INT4 provides 2.5× speedup and 4× memory reduction, enabling efficient inference on edge devices.

Rollout Videos

Real-World Robot Manipulation: Qunatized OpenVLA

Tabletop task: Sweep the gray cloth to the left side of the table

Baseline FP

Success

PTQ W4

Failed to grasp the cloth

SQIL W4 (Ours)

Success

Tabletop task: pick up the green cup and put it into the brown cup

Baseline FP

Success

PTQ W4

Failed to place the cup accurately

SQIL W4 (Ours)

Success

BridgeData V2 task: stack purple cup on green cup

Baseline FP

Success

PTQ W4

Failed to pick up the purple cup

SQIL W4 (Ours)

Success

BridgeData V2 task: put eggplant into pot

Baseline FP

Success

PTQ W4

Failed to pick up the eggplant

SQIL W4 (Ours)

Success

Simulation-based Robot Manipulation: Quantized OpenVLA on LIBERO Benchmark

LIBERO-Spatial: pick up the black bowl on the stove and place it on the plate

Baseline FP

Success

PTQ W4

Failure

QAT W4

Failure

SQIL W4 (Ours)

Success

LIBERO-Object: pick up the cream cheese and place it in the basket

Baseline FP

Success

PTQ W4

Failure

QAT W4

Failure

SQIL W4 (Ours)

Success

LIBERO-Goal: push the plate to the front of the stove

Baseline FP

Success

PTQ W4

Failure

QAT W4

Failure

SQIL W4 (Ours)

Success

LIBERO-Long: put the black bowl in the bottom drawer of the cabinet and close it

Baseline FP

Success

PTQ W4

Failure

QAT W4

Failure

SQIL W4 (Ours)

Success

Autonomous Driving: Quantized CILRS on NoCrash-dense Benchmark

Baseline FP

Successfully completed driving without collisions with vehicles or pedestrians

QAT W4

Driving failed due to a collision with a vehicle

SQIL W4

Successfully completed driving without collisions with vehicles or pedestrians

BibTeX

@article{park2025saliency,
  title={Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control},
  author={Park, Seongmin and Kim, Hyungmin and Kim, Sangwoo and Jeon, Wonseok and Yang, Juyoung and Jeon, Byeongwook and Oh, Yoonseon and Choi, Jungwook},
  journal={arXiv preprint arXiv:2505.15304},
  year={2025}
}