GitHub - microsoft/Olive: Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.

AI Model Optimization Toolkit for the ONNX Runtime

Given a model and targeted hardware, Olive (abbreviation of Onnx LIVE) composes the best suitable optimization techniques to output the most efficient ONNX model(s) for inferencing on the cloud or edge, while taking a set of constraints such as accuracy and latency into consideration.

📰 News Highlights

Here are some recent videos, blog articles and labs that highlight Olive:

[ Oct 2025 ] Exploring Optimal Quantization Settings for Small Language Models with Olive
[ Sep 2025 ] Olive examples are relocated to new Olive-recipes repository
[ Aug 2025 ] Olive 0.9.2 is released with new quantization algorithms
[ May 2025 ] Olive 0.9.0 is released with support for NPUs
[ Mar 2025 ] Olive 0.8.0 is released with new quantization techniques
[ Feb 2025 ] New Notebook available - Finetune and Optimize DeepSeek R1 with Olive 🐋
[ Nov 2024 ] Democratizing AI Model optimization with the new Olive CLI
[ Nov 2024 ] Unlocking NLP Potential: Fine-Tuning with Microsoft Olive (Ignite Pre-Day Lab PRE016)
[ Nov 2024 ] Olive supports generating models for MultiLoRA serving on the ONNX Runtime
[ Oct 2024 ] Windows Dev Chat: Optimizing models from Hugging Face for the ONNX Runtime (video)
[ May 2024 ] AI Toolkit - VS Code Extension that uses Olive to fine tune models

For a full list of news and blogs, read the news archive.

🚀 Getting Started

✨ Quickstart

If you prefer using the command line directly instead of Jupyter notebooks, we've outlined the quickstart commands here.

1. Install Olive CLI

We recommend installing Olive in a virtual environment or a conda environment.

pip install olive-ai[auto-opt]
pip install transformers onnxruntime-genai

Note

Olive has optional dependencies that can be installed to enable additional features. Please refer to Olive package config for the list of extras and their dependencies.

Note

For Windows users: to avoid HF_HUB_DISABLE_SYMLINKS_WARNING

Olive depends on huggingface_hub library if you download models from there. On Windows you will get a warning like

UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\name\.cache\huggingface\hub\model-name.
Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.

You can fix this warning using one of the 4 options below. Note that option 1, 2 and 3 saves diskspace, while option 4 only supresses the warning

Enable Developer Mode on Windows (one-time setup) documented in the Microsoft Developer Tools Docs.
Run Python as administrator when using Olive with the huggingface_hub library.
Reconfigure where the cache is stored. This restriction is not limited to storing the cache in the default location only.
Keep using HF_HUB_DISABLE_SYMLINKS_WARNING=1 to suppress the warning and accept the extra disk usage.

Regarding the options, you should decide what is possible in your environment (e.g. company policy) and what fits best for you.

The limitation of the huggingface_hub library is also documentd in the Hub Client Library Docs.

2. Automatic Optimizer

In this quickstart you'll be optimizing Qwen/Qwen2.5-0.5B-Instruct, which has many model files in the Hugging Face repo for different precisions that are not required by Olive.

Run the automatic optimization:

olive optimize \
    --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
    --precision int4 \
    --output_path models/qwen

Tip

PowerShell Users

Line continuation between Bash and PowerShell are not interchangable. If you are using PowerShell, then you can copy-and-paste the following command that uses compatible line continuation.

olive optimize `
   --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct `
   --output_path models/qwen `
   --precision int4

The automatic optimizer will:

Acquire the model from the the Hugging Face model repo.
Quantize the model to int4 using GPTQ.
Capture the ONNX Graph and store the weights in an ONNX data file.
Optimize the ONNX Graph.

Olive can automatically optimize popular model architectures like Llama, Phi, Qwen, Gemma, etc out-of-the-box - see detailed list here. Also, you can optimize other model architectures by providing details on the input/outputs of the model (io_config).

3. Inference on the ONNX Runtime

The ONNX Runtime (ORT) is a fast and light-weight cross-platform inference engine with bindings for popular programming language such as Python, C/C++, C#, Java, JavaScript, etc. ORT enables you to infuse AI models into your applications so that inference is handled on-device.

The sample chat app to run is found as model-chat.py in the onnxruntime-genai Github repository.

🎓 Learn more

🤝 Contributions and Feedback

We welcome contributions! Please read the contribution guidelines for more details on how to contribute to the Olive project.
For feature requests or bug reports, file a GitHub Issue.
For general discussion or questions, use GitHub Discussions.

⚖️ License

Licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 1,809 Commits
.azure_pipelines		.azure_pipelines
.github		.github
assets/cost_models		assets/cost_models
docs		docs
notebooks		notebooks
olive		olive
scripts		scripts
test		test
.coveragerc		.coveragerc
.editorconfig		.editorconfig
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.lintrunner.toml		.lintrunner.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NEWS.md		NEWS.md
NOTICE.txt		NOTICE.txt
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Model Optimization Toolkit for the ONNX Runtime

📰 News Highlights

🚀 Getting Started

✨ Quickstart

1. Install Olive CLI

2. Automatic Optimizer

3. Inference on the ONNX Runtime

🎓 Learn more

🤝 Contributions and Feedback

⚖️ License

Pipeline Status

About

Uh oh!

Releases 24

Packages

Uh oh!

Contributors 78

Uh oh!

Languages

License

microsoft/Olive

Folders and files

Latest commit

History

Repository files navigation

AI Model Optimization Toolkit for the ONNX Runtime

📰 News Highlights

🚀 Getting Started

✨ Quickstart

1. Install Olive CLI

2. Automatic Optimizer

3. Inference on the ONNX Runtime

🎓 Learn more

🤝 Contributions and Feedback

⚖️ License

Pipeline Status

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 24

Packages 0

Uh oh!

Contributors 78

Uh oh!

Languages

Packages