Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorder #143

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open

Reorder #143

wants to merge 35 commits into from

Conversation

XKTZ
Copy link
Contributor

@XKTZ XKTZ commented Sep 15, 2024

Pull Request Checklist

Reference Issue

This is a superset of issue Top Down. This PR reorganized the various reordering methods including sliding window, top down, as well as ListT5's tournament sort methodology. Now it is allowed to use command like --reorder_policy="top_down:{\"top_k\": 10, \"pivot\": ${PIVOT}, \"shuffle\": true, \"r\": 1}" to specify a reorderer.

ref:

Checklist Items

Before submitting your pull request, please review these items:

  • Have you followed the contributing guidelines?
  • Have you verified that there are no existing Pull Requests for the same update/change?
  • Have you updated any relevant documentation or added new tests where needed?

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Documentation content changes
  • Other...
    • Description:

Reproduce

Here is a small shell script helping to reproduce functionalities

DATASETS="dl19 dl20"
WINDOW_SIZE="20"

for dataset in $DATASETS; do

for window in $WINDOW_SIZE; do

if [[ $window == "20" ]]; then
  PIVOT=11
else
  PIVOT=9
fi

python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/rank_zephyr_7b_v1_full \
     --top_k_candidates=100 --dataset=${dataset} --retrieval_method=SPLADE++_EnsembleDistil_ONNX \
     --prompt_mode=rank_GPT --context_size=4096 --vllm_batched --batch_size=12\
     --variable_passages --reorder_policy="top_down:{\"top_k\": 10, \"pivot\": ${PIVOT}}"\
     --window_size=${window}


#python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/rank_zephyr_7b_v1_full \
#     --top_k_candidates=100 --dataset=${dataset} --retrieval_method=SPLADE++_EnsembleDistil_ONNX \
#     --prompt_mode=rank_GPT --context_size=4096 --vllm_batched --batch_size=12\
#     --variable_passages --reorder_policy="sliding_window:{\"step\": 10}"\
#     --window_size=${window}

#python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/rank_zephyr_7b_v1_full \
#     --top_k_candidates=100 --dataset=${dataset} --retrieval_method=SPLADE++_EnsembleDistil_ONNX \
#     --prompt_mode=rank_GPT --context_size=4096 --vllm_batched --batch_size=12\
#     --variable_passages --reorder_policy="tournament_sort:{\"step\": 10, \"r\": 1}"\
#     --window_size=${window}

done

done

File changes

rank_fid.py, rank_gpt.py, rank_listwise_os_llm.py: They are now directly using listwise_rankllm.py's rerank_batch function.

listwise_rankllm.py: the rerank batch deprecated the original method, using ModelFunction to catch the necessary methods to doing rerank, and pass this into the reorder policies for reordering.

reorder_policy.py: various reorder policies

top_down/tournament...: the implementation of different policies

xxx_reranker: Add a parameter of reorder policy, defaultly using sliding window

rankllm.py: Let create_prompt depends on select indices, instead of range

reranker.py, run_rank_llm.py: Parameter change

README.md Outdated
Comment on lines 206 to 207


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Comment on lines 167 to 171
"--step_size",
type=int,
default=10,
help="step size for the sliding window approach",
default=20,
help="window size for the LLM",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

step and window are different?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants