-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
results file for datasets #12
Comments
Hi, I am experiencing a similar issue, with my results hovering around 0.69 for WebQSP and 0.37 for CWQ. I would greatly appreciate it if the authors could provide some insight into the challenges of reproducing the results. Best regards, |
Hi, I am also facing the same issue. I run the experiment for CWQ twice and got around 37% accuracy for gpt-3.5, compared to 57.1% mentioned in the paper. Could you please provide some suggestions in reproducing the result of the paper. Best, |
Hi,
|
Thank you very much for your reply! I have already corrected the retrieval code and adjusted the version of ChatGPT. However, my experimental results did not improve much and are similar to the previous ones. I hope the alias file can be provided and the eval file can be corrected for reproduction as soon as possible. Best, |
Which version of GPT-4 did you use? |
We use gpt-4-0613 for all the experiments setting. |
你好,我在复现代码的过程中遇到了一些困难,你能指点我一下吗 感谢 |
你好,我也在复现过程中遇到了一些问题,可以一起交流一下吗?非常感谢 |
Hi there,
After running your code, I am getting 67 EM for WebQSP with gpt-3.5-turbo compared to 76 EM reported in the paper. I was wondering if you can share your results file for comparison.
Thanks,
Mohammad
The text was updated successfully, but these errors were encountered: