Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

results file for datasets #12

Open
alomrani opened this issue Jan 10, 2024 · 8 comments
Open

results file for datasets #12

alomrani opened this issue Jan 10, 2024 · 8 comments

Comments

@alomrani
Copy link

Hi there,

After running your code, I am getting 67 EM for WebQSP with gpt-3.5-turbo compared to 76 EM reported in the paper. I was wondering if you can share your results file for comparison.

Thanks,
Mohammad

@liyichen-cly
Copy link

Hi,

I am experiencing a similar issue, with my results hovering around 0.69 for WebQSP and 0.37 for CWQ. I would greatly appreciate it if the authors could provide some insight into the challenges of reproducing the results.

Best regards,
Liyi

@zh-qifan
Copy link

zh-qifan commented Feb 22, 2024

Hi,

I am also facing the same issue. I run the experiment for CWQ twice and got around 37% accuracy for gpt-3.5, compared to 57.1% mentioned in the paper. Could you please provide some suggestions in reproducing the result of the paper.

Best,
Qifan

@GasolSun36
Copy link
Collaborator

Hi,
Sorry for the late reply, we did not save the previous results, but here are some tips to reproduce the results of the paper:

  1. The current version of the eval.py file has some problems, and we will fix them as soon as possible.
  2. The chatgpt model we use is gpt-3.5-turbo-0613, and the performance may fluctuate slightly from the current updated model.
  3. CWQ test is a file with alias that we built and will be updated later.

@liyichen-cly
Copy link

Thank you very much for your reply! I have already corrected the retrieval code and adjusted the version of ChatGPT. However, my experimental results did not improve much and are similar to the previous ones. I hope the alias file can be provided and the eval file can be corrected for reproduction as soon as possible.

Best,
Liyi

@willer-lu
Copy link

Hi, Sorry for the late reply, we did not save the previous results, but here are some tips to reproduce the results of the paper:

  1. The current version of the eval.py file has some problems, and we will fix them as soon as possible.
  2. The chatgpt model we use is gpt-3.5-turbo-0613, and the performance may fluctuate slightly from the current updated model.
  3. CWQ test is a file with alias that we built and will be updated later.

Which version of GPT-4 did you use?

@GasolSun36
Copy link
Collaborator

Hi, Sorry for the late reply, we did not save the previous results, but here are some tips to reproduce the results of the paper:

  1. The current version of the eval.py file has some problems, and we will fix them as soon as possible.
  2. The chatgpt model we use is gpt-3.5-turbo-0613, and the performance may fluctuate slightly from the current updated model.
  3. CWQ test is a file with alias that we built and will be updated later.

Which version of GPT-4 did you use?
Hi,

We use gpt-4-0613 for all the experiments setting.

@yindahu87
Copy link

非常感谢您的回复!我已经更正了检索代码并调整了 ChatGPT 的版本。然而,我的实验结果并没有太大的改善,并且与以前的结果相似。我希望可以提供别名文件,并且可以尽快更正 eval 文件以进行复制。

最好的,丽艺

你好,我在复现代码的过程中遇到了一些困难,你能指点我一下吗 感谢

@youngsasa2021
Copy link

非常感谢您的回复!我已经更正了检索代码并调整了 ChatGPT 的版本。然而,我的实验结果并没有太大的改善,并且与以前的结果相似。我希望可以提供别名文件,并且可以尽快更正 eval 文件以进行复制。
最好的,丽艺

你好,我在复现代码的过程中遇到了一些困难,你能指点我一下吗 感谢

你好,我也在复现过程中遇到了一些问题,可以一起交流一下吗?非常感谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants