ExSearch

If you like our project, please give us a star ⭐ on GitHub for the latest update.

This work proposes the ExSearch which enable the LLMs as search agents, which can actively seek information, select key knowledge and record useful evidence iteratively and summarize the final answer. Compared with previous RAG methods, the key of our agentic search is the reasoning technique. Here is a concrete example of our agentic search.

News

[2025.9.19] Our paper has been accepted by the NeurIPS 2025 🎉🎉🎉!
[2025.5.25] Our code was released, including main code for evaluation and training script.
[2025.5.20] The first version of our paper has been released in arxiv. See our paper in this link.

Environment

Install the necessary Python libraries by running the following commands.

conda create -n exsearch python=3.10
conda activate exsearch
pip install -r requirements.txt 

# [Optional]
pip install pytrec_eval -i https://pypi.tuna.tsinghua.edu.cn/simple

# [Optional] set the `vllm` environment variable when using it.
VLLM_WORKER_MULTIPROC_METHOD=spawn

You can customize your own torch, vllm and transformers version based on the backbone LLM you want to use. For Qwen and Mistral, we suggest vllm=0.6.3, torch=2.4.0+cu118 and transformers=4.45.0.

Set up the retrieval module. We follow previous work and use the Wikipedia as our document corpus, which can be found in DPR repo (Link). We use the ColBERT as the retrieval model to pair each query with top-20 documents. The pre-trained ColBERT checkpoint can be downloaded in either its official repo or its link. You can deploy the ColBERT retrieval or other customized retrieval model in your local environment to pre-process the dataset.

In this project, the code retrieval folder is directly copied from ColBERT. You can following the README.md in retrieval to set up the retriever.

[Optional] Please login the wandb if use it to record the loss.

wandb login

(wandb login --relogin to force relogin)

Incentivize Your Search LLMs via Expectation-Maximization

Warmup training

Before the iterative E&M training, we first train the LLM with warm-up dataset, similar to the cold start process in previous work. In this warmup training, the LLM is trained on a small set of synthetic data, learning basic search and answer generation pattern.

PROCEDURE=sft CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 nohup torchrun   --nproc_per_node=8 --master_port=11021 ./src/run.py \
--model_name_or_path HUGGINGFACE_MODEL_NAME_OR_LOCAL_MODEL_CHECKPOINT \
--dataset_name_or_path  WARM_UP_DATA_PATH \
--deepspeed ./src/script/ds_z3_config.json \
--output_dir ./mistral24_100 \
--overwrite_cache True \
--warmup_ratio 0.1 \
--report_to wandb \
--run_name test_run \
--logging_steps 1 \
--cutoff_len 8192 \
--max_samples 200000 \
--save_steps  1000 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 8 \
--learning_rate 2.0e-6 \
--num_train_epochs 2 \
--lr_scheduler_type cosine \
--bf16 True \
--resume_from_checkpoint /root/paddlejob/workspace/env_run/output/SearchAgent/agent2_musique/checkpoint-200  &

Set the WARM_UP_DATA_PATH to your own data path such as ./data/hotpot-traj.100.json.

E-step and M-step Training

E-step: Trajectory Exploration

mkdir ./log
PROCEDURE=inference CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6 python ./src/run.py \
--model_name_or_path HUGGINGFACE_MODEL_NAME_OR_LOCAL_MODEL_CHECKPOINT \
--input_file TRAINING_OR_EVALUATION_FILE_PATH \
--output_dir ./log \
--left 0 \
--right 10000

Set the TRAINING_OR_EVALUATION_FILE_PATH to your own training data file or evaluation data file, such as data/eval_data/hotpotqa_dev.json.

Once your finish the above command, the output of LLM is stored into a local file. Please use this file as the argument for OUTPUT_FILE below.

PROCEDURE=entropy CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python ./src/run.py \
--model_name_or_path HUGGINGFACE_MODEL_NAME_OR_LOCAL_MODEL_CHECKPOINT \
--inference_file OUTPUT_FILE  \
--output_dir ./log  \
--epo EPO

Here the EPO denotes the current training iteration, such as 1 for the $1^{st}$ iteration and 2 for the $2^{nd}$ iteration.

M-step: Re-weighted Trajectory Learning

PROCEDURE=align CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7  nohup torchrun   --nproc_per_node=8  --master_port=11021 ./src/run.py \
--model_name_or_path HUGGINGFACE_MODEL_NAME_OR_LOCAL_MODEL_CHECKPOINT \
--dataset_name_or_path  TRAINING_DATA \
--deepspeed ./src/script/ds_z3_config.json \
--output_dir OUTPUT_CHECKPOINT_FOLDER \
--overwrite_cache True \
--warmup_ratio 0.1 \
--report_to wandb \
--run_name test_run \
--logging_steps 1 \
--cutoff_len 8192 \
--max_samples 300000 \
--save_steps  200 \
--per_device_train_batch_size  2 \
--gradient_accumulation_steps 16 \
--learning_rate 2.0e-6 \
--num_train_epochs 2 \
--lr_scheduler_type cosine \
--bf16 True

Note that:

Adding the --resume_from_checkpoint OUTPUT_CHECKPOINT_FOLDER argument if the training is broken and you want to continue the training.
You can customize the arguments like per_device_train_batch_size, gradient_accumulation_steps and training epoch based on your own computational resource.

Acknowledgement

We sincerely thank prior work, including ColBERT, RankGPT, and Llama-Factory.

Dataset

See ./data folder for more details. We have released the annotated cold-start data. Please download it!

Citation

@article{shi2025iterative,
  title={Iterative self-incentivization empowers large language models as agentic searchers},
  author={Shi, Zhengliang and Yan, Lingyong and Yin, Dawei and Verberne, Suzan and de Rijke, Maarten and Ren, Zhaochun},
  journal={arXiv preprint arXiv:2505.20128},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
assets/image		assets/image
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ExSearch

News

Environment

Incentivize Your Search LLMs via Expectation-Maximization

Warmup training

E-step and M-step Training

E-step: Trajectory Exploration

M-step: Re-weighted Trajectory Learning

Acknowledgement

Dataset

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mangopy/SearchLM

Folders and files

Latest commit

History

Repository files navigation

ExSearch

News

Environment

Incentivize Your Search LLMs via Expectation-Maximization

Warmup training

E-step and M-step Training

E-step: Trajectory Exploration

M-step: Re-weighted Trajectory Learning

Acknowledgement

Dataset

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages