FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model Deployment

1Politecnico di Torino, 2MBZUAI - Mohamed bin Zayed University of Artificial Intelligence 3Amazon Science 4Vector Institute
riccardo.zaccone@polito.it

About this work

Modern LLMs and Vision Transformers are usually deployed as fixed computational monoliths: one model, one cost, one accuracy point. FlexRank changes this by turning a pretrained model into a family of nested low-rank submodels that share the same weights.

🚀 Train once, deploy everywhere: choose the rank budget at inference time.
🧩 One shared model, many sizes: smaller submodels are nested inside larger ones.
🎯 Budget-aware rank allocation: FlexRank spends parameters where they matter most.
Real inference savings: Gauge-Aligned Reparametrization makes low-rank deployment practical.

Overview of the FlexRank pipeline
Figure 1. FlexRank starts from a base model, decomposes each linear layer, extracts nested submodels through a global rank ordering, and refines them by distillation from the original model.

Method

FlexRank is built around a simple deployment question: if we can only afford part of the model, which part should we keep? A uniform rank cut is too crude, because different layers and modules contribute differently to the final prediction. FlexRank instead learns an ordered decomposition of the pretrained model and uses it to build a budget-aware hierarchy.

🧱 Step 1 - Decompose: each linear layer is factorized into low-rank components ordered by importance.
🧭 Step 2 - Search: dynamic programming decides how many components to keep in each layer for each budget.
🔁 Step 3 - Consolidate: sampled submodels are distilled from the original model so all budgets work well together.

The result is not a collection of separately trained compressed models. It is one elastic model whose smaller configurations are nested inside the larger ones, making the accuracy-cost curve smooth and easy to deploy.

Why Nestedness Matters

The main idea is simple: elastic models should not be a bag of unrelated submodels. They should form a clean hierarchy where larger models refine the components reused by smaller ones.

Post-training selection: good full model, weak smaller models.
All-submodel training: too much interference between rank choices.
Nested submodel training: compatible budgets, shared knowledge, Pareto-efficient behavior.

Comparison of post-training selection, all-submodel learning, and nested submodel learning
Figures 2-3. Nested submodel learning is the only strategy that recovers the Pareto-efficient hierarchy in the synthetic setting, and FlexRank recovers the same principle in deep networks.

Main Results

FlexRank delivers smooth accuracy-cost trade-offs across LLMs and Vision Transformers. It consistently improves over SVD, DataSVD, and ACIP-style low-rank elastic baselines.

🦙 Llama models: graceful degradation over many parameter budgets.
🖼️ DINOv3 ViTs: strong ImageNet1K accuracy even after large parameter reductions.
🏁 Beyond low-rank baselines: competitive with structured pruning and depth-elastic methods.

FlexRank main accuracy-cost results on Llama and DINOv3 models
Figures 4-5. FlexRank gives the smoothest degradation across parameter budgets on Llama and DINOv3 models, and remains competitive with pruning and depth-elastic baselines.

Ablations

The ablations show that FlexRank is more than SVD plus training: both the rank allocation and the nested training procedure are doing important work.

🔍 Rank profiles are non-uniform. In the heatmaps below, each column corresponds to a GPT-2 module and each row to a target model size. If uniform compression were enough, the heatmaps would look almost flat. Instead, FlexRank preserves more capacity in specific modules, showing that the dynamic-programming search identifies where rank is most valuable.

📈 Initialization alone does not solve elasticity. In Figure 7(a), the DataSVD curves with 256 and 1024 calibration samples almost overlap, showing that a small calibration set is already enough to estimate the layer-wise decomposition. However, the loss remains far from the original model at smaller parameter counts, so better initialization alone is not enough.

🧠 Joint nested training is the key consolidation step. Figure 7(b) shows that independently adapting each layer is still much weaker than training the selected submodels end-to-end: the model needs to repair cross-layer interactions, not only local reconstruction errors. Figure 8 then isolates the role of budget sampling: each single-budget model is strong near the budget it was trained for, but fails to trace the full Pareto curve. FlexRank stays close to the best curve because it distills many nested budgets into the same shared weights.

FlexRank rank allocation heatmaps across model components
Figure 6. The learned GPT-2 profiles are not uniform: FlexRank keeps more rank in the components that matter most for each target size.
FlexRank ablations on DataSVD initialization and nested submodel training FlexRank ablations on DataSVD initialization and nested submodel training
Figures 7-8. Figure 7(a) shows that DataSVD converges with few calibration samples; Figure 7(b) shows that end-to-end submodel training is needed beyond layer-wise adaptation; Figure 8 shows that sampling nested budgets is necessary to cover the full Pareto frontier.

Conclusions

FlexRank makes low-rank compression elastic: a single pretrained model becomes a family of deployable submodels, each selected by budget and backed by shared nested weights.

One model. Many budgets.
Less compute. Smooth degradation.
🌍 Adaptive deployment across heterogeneous hardware, latency, and memory constraints.

How to cite us


      @inproceedings{
      zaccone2026flexrank,
      title={FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model Deployment},
      author={Zaccone, Riccardo and Laskaridis, Stefanos and Ciccone, Marco and Horvath, Samuel},
      booktitle={Forty-third International Conference on Machine Learning},
      year={2026},
      url={https://openreview.net/forum?id=DK0kvnNelx}
      }