Skip to content

Commit

Permalink
update iterative_helper
Browse files Browse the repository at this point in the history
  • Loading branch information
Xueqing Wu committed Oct 3, 2024
1 parent b167875 commit ec90fe5
Show file tree
Hide file tree
Showing 3 changed files with 64 additions and 0 deletions.
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,30 @@ python infer_refine.py critic-infer.csv VDebugger/VDebugger-refiner-generalist-1
```
Then you can execute the programs in `critic-refine-infer.csv` as in step 2 of [Generation and Execution of Visual Programs](https://github.com/shirley-wu/vdebugger/tree/main?tab=readme-ov-file#generation-and-execution-of-visual-programs)

## Run VDebugger for Multiple Iterations

To run VDebugger for `T` iterations (`T` > 1), you first need to generate the initial programs and collect their execution feedback as in step 1 and 3 in [Generation and Execution of Visual Programs](https://github.com/shirley-wu/vdebugger/tree/main?tab=readme-ov-file#generation-and-execution-of-visual-programs). Then, you need to repeat the steps below for `T` times:
1. Infer critic, as in [Inference of VDebugger](https://github.com/shirley-wu/vdebugger/tree/main?tab=readme-ov-file#inference-of-vdebugger);
2. Infer refiner, as in [Inference of VDebugger](https://github.com/shirley-wu/vdebugger/tree/main?tab=readme-ov-file#inference-of-vdebugger);
3. Collect execution feedback for the new programs generated by refiner, as in step 2 in [Generation and Execution of Visual Programs](https://github.com/shirley-wu/vdebugger/tree/main?tab=readme-ov-file#generation-and-execution-of-visual-programs). The next iteration will be run on top of the feedback collected in this step.

Then after `T` iterations, evaluate the final programs as in step 2 in [Generation and Execution of Visual Programs](https://github.com/shirley-wu/vdebugger/tree/main?tab=readme-ov-file#generation-and-execution-of-visual-programs).

Since the major computational overhead comes from program execution (i.e. step 3 in each iteration), you can use the helper scripts `remove_dup.py` and `merge_csv.py` in [vdebugger/interative_helper](vdebugger/iterative_helper) to reduce the redundant execution:
* Before step 3 in each iteration, remove the programs that are duplicate as last iteration, by executing:
```bash
python remove_dup.py PROGRAM_CSV_FROM_LAST_ITERATION PROGRAM_CSV_FOR_THIS_ITERATION
```
which will produce a dedup file `PROGRAM_CSV_FOR_THIS_ITERATION_DEDUP`
* Then collect execution feedback for the resulted `PROGRAM_CSV_FOR_THIS_ITERATION_DEDUP`.
* After collecting their feedback, merge the execution feedback from the last iteration and the current iteration:
```bash
python merge_csv.py EXECUTION_FEEDBACK_FROM_LAST_ITERATION EXECUTION_FEEDBACK_FOR_THIS_ITERATION_DEDUP
```
which will produce a file `EXECUTION_FEEDBACK_FOR_THIS_ITERATION_MERGED` containing merged execution results. For the next iteration, use the merged execution results.

There will still be some repeated computation within step 1 and 2 in this iteration, but that will be tolerable. If you are concerned, you can modify the scripts by yourself to avoid the computation.

## Training of VDebugger

If you want to reproduce our training of VDebugger, please use `vdebugger/training_scripts/train_{critic, refiner}.sh`. You will need to install `deepspeed==0.14.0`.
Expand Down
21 changes: 21 additions & 0 deletions vdebugger/iterative_helper/merge_csv.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import argparse
import os

import pandas as pd

parser = argparse.ArgumentParser()
parser.add_argument('orig')
parser.add_argument('new')
args = parser.parse_args()

orig = pd.read_csv(args.orig)
new = pd.read_csv(args.new)
new.loc[new['code'] == '[', 'result'] = orig.loc[new['code'] == '[', 'result']
if 'traced' in new:
new.loc[new['code'] == '[', 'traced'] = orig.loc[new['code'] == '[', 'traced']
new.loc[new['code'] == '[', 'code'] = orig.loc[new['code'] == '[', 'code']

out_fname = args.new.replace('.csv', '.merged.csv')
print("Dump to", out_fname)
assert not os.path.exists(out_fname), "File {} exists".format(out_fname)
new.to_csv(out_fname)
19 changes: 19 additions & 0 deletions vdebugger/iterative_helper/remove_dup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import argparse
import os

import pandas as pd

parser = argparse.ArgumentParser()
parser.add_argument('orig')
parser.add_argument('new')
args = parser.parse_args()

orig = pd.read_csv(args.orig)
new = pd.read_csv(args.new)
new.loc[new['code'] == orig['code'], 'code'] = '['
print("Remaining valid code", (new['code'] != '[').sum())

out_fname = args.new.replace('.csv', '.remove-dup.csv')
print("Dump to", out_fname)
assert not os.path.exists(out_fname), "File {} exists".format(out_fname)
new.to_csv(out_fname)

0 comments on commit ec90fe5

Please sign in to comment.