If you like our project, please give us a star ⭐ on GitHub for the latest update.
This work proposes the ExSearch which enable the LLMs as search agents, which can actively seek information, select key knowledge and record useful evidence iteratively and summarize the final answer. Compared with previous RAG methods, the key of our agentic search is the reasoning technique. Here is a concrete example of our agentic search.
- [2025.9.19] Our paper has been accepted by the NeurIPS 2025 🎉🎉🎉!
- [2025.5.25] Our code was released, including main code for evaluation and training script.
- [2025.5.20] The first version of our paper has been released in arxiv. See our paper in this link.
- Install the necessary Python libraries by running the following commands.
conda create -n exsearch python=3.10
conda activate exsearch
pip install -r requirements.txt
# [Optional]
pip install pytrec_eval -i https://pypi.tuna.tsinghua.edu.cn/simple
# [Optional] set the `vllm` environment variable when using it.
VLLM_WORKER_MULTIPROC_METHOD=spawnYou can customize your own
torch,vllmandtransformersversion based on the backbone LLM you want to use. For Qwen and Mistral, we suggestvllm=0.6.3,torch=2.4.0+cu118andtransformers=4.45.0.
- Set up the retrieval module. We follow previous work and use the Wikipedia as our document corpus, which can be found in DPR repo (Link). We use the ColBERT as the retrieval model to pair each query with top-20 documents. The pre-trained ColBERT checkpoint can be downloaded in either its official repo or its link. You can deploy the ColBERT retrieval or other customized retrieval model in your local environment to pre-process the dataset.
In this project, the code retrieval folder is directly copied from ColBERT. You can following the README.md in retrieval to set up the retriever.
- [Optional] Please login the
wandbif use it to record the loss.
wandb login(wandb login --relogin to force relogin)
Before the iterative E&M training, we first train the LLM with warm-up dataset, similar to the cold start process in previous work. In this warmup training, the LLM is trained on a small set of synthetic data, learning basic search and answer generation pattern.
PROCEDURE=sft CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 nohup torchrun --nproc_per_node=8 --master_port=11021 ./src/run.py \
--model_name_or_path HUGGINGFACE_MODEL_NAME_OR_LOCAL_MODEL_CHECKPOINT \
--dataset_name_or_path WARM_UP_DATA_PATH \
--deepspeed ./src/script/ds_z3_config.json \
--output_dir ./mistral24_100 \
--overwrite_cache True \
--warmup_ratio 0.1 \
--report_to wandb \
--run_name test_run \
--logging_steps 1 \
--cutoff_len 8192 \
--max_samples 200000 \
--save_steps 1000 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 8 \
--learning_rate 2.0e-6 \
--num_train_epochs 2 \
--lr_scheduler_type cosine \
--bf16 True \
--resume_from_checkpoint /root/paddlejob/workspace/env_run/output/SearchAgent/agent2_musique/checkpoint-200 &Set the WARM_UP_DATA_PATH to your own data path such as ./data/hotpot-traj.100.json.
mkdir ./log
PROCEDURE=inference CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6 python ./src/run.py \
--model_name_or_path HUGGINGFACE_MODEL_NAME_OR_LOCAL_MODEL_CHECKPOINT \
--input_file TRAINING_OR_EVALUATION_FILE_PATH \
--output_dir ./log \
--left 0 \
--right 10000Set the TRAINING_OR_EVALUATION_FILE_PATH to your own training data file or evaluation data file, such as data/eval_data/hotpotqa_dev.json.
Once your finish the above command, the output of LLM is stored into a local file. Please use this file as the argument for OUTPUT_FILE below.
PROCEDURE=entropy CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python ./src/run.py \
--model_name_or_path HUGGINGFACE_MODEL_NAME_OR_LOCAL_MODEL_CHECKPOINT \
--inference_file OUTPUT_FILE \
--output_dir ./log \
--epo EPOHere the EPO denotes the current training iteration, such as 1 for the 2 for the
PROCEDURE=align CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 nohup torchrun --nproc_per_node=8 --master_port=11021 ./src/run.py \
--model_name_or_path HUGGINGFACE_MODEL_NAME_OR_LOCAL_MODEL_CHECKPOINT \
--dataset_name_or_path TRAINING_DATA \
--deepspeed ./src/script/ds_z3_config.json \
--output_dir OUTPUT_CHECKPOINT_FOLDER \
--overwrite_cache True \
--warmup_ratio 0.1 \
--report_to wandb \
--run_name test_run \
--logging_steps 1 \
--cutoff_len 8192 \
--max_samples 300000 \
--save_steps 200 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 16 \
--learning_rate 2.0e-6 \
--num_train_epochs 2 \
--lr_scheduler_type cosine \
--bf16 True Note that:
- Adding the
--resume_from_checkpoint OUTPUT_CHECKPOINT_FOLDERargument if the training is broken and you want to continue the training. - You can customize the arguments like
per_device_train_batch_size,gradient_accumulation_stepsandtraining epochbased on your own computational resource.
We sincerely thank prior work, including ColBERT, RankGPT, and Llama-Factory.
See ./data folder for more details. We have released the annotated cold-start data. Please download it!
@article{shi2025iterative,
title={Iterative self-incentivization empowers large language models as agentic searchers},
author={Shi, Zhengliang and Yan, Lingyong and Yin, Dawei and Verberne, Suzan and de Rijke, Maarten and Ren, Zhaochun},
journal={arXiv preprint arXiv:2505.20128},
year={2025}
}