Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Scored Output Files from Algorithm Execution #292

Open
ruixing76 opened this issue Dec 8, 2024 · 2 comments
Open

Request for Scored Output Files from Algorithm Execution #292

ruixing76 opened this issue Dec 8, 2024 · 2 comments

Comments

@ruixing76
Copy link

ruixing76 commented Dec 8, 2024

Is your feature request related to a problem? Please describe.

I need aggregated results so I can analyze helpfulness on Notes level. I think the only way so far is to run the algorithm from scratch so I’m reproducing results using the downloaded data (notes, ratings, notes history, and user enrollment) on a 64-core Intel(R) Xeon(R) Gold 6448H CPU with 500GB memory (correct me if I am wrong). However, after 20 hours, the pre-scoring phase still hasn’t completed. It looks like it won't finish within one day which stops me from working on further analysis.

Since the algorithm runs every hour or so on the server, may I know:

  1. would it be possible to share the output files (scored_notes.tsv, helpfulness_scores.tsv, note_status_history.tsv, and aux_note_info.tsv)?
  2. and the hardware requirement and expected running time if I want to generate aggregated scores for notes myself?

This would greatly help for research analysis, as running the algorithm locally to aggregate helpfulness scores has been quite challenging.

Describe the solution you'd like
Would it be possible to share the output files (scored_notes.tsv, helpfulness_scores.tsv, note_status_history.tsv, and aux_note_info.tsv)? They don’t need to be the latest versions—files aligned with the current download page would be fine.

Describe alternatives you've considered
It would be nice to share the hardware requirement and expected running time if I want to generate aggregated scores for notes from scratch, or any intermediate process.

Additional context
Thank you so much for your contribution on this amazing project! I am a PhD student working on fact-checking in Natural Language Processing and I am very happy to explore and contribute more. I am actively working on this and any help in above questions would be much appreciated!

@ashilgard
Copy link

hi - it's not surprising that the job might take that long when run sequentially. since you seem not to be resource-bound, you could try running with the parallel flag set to True. Let us know if that helps!

@ruixing76
Copy link
Author

hi - it's not surprising that the job might take that long when run sequentially. since you seem not to be resource-bound, you could try running with the parallel flag set to True. Let us know if that helps!

Hi @ashilgard, many thanks for your reply, I will try that! Actually I do have resource limitations and normally we don't have that much CPU and memory (64G at most), I queued for a very long time to run the algorithm It would be great if it's possible to share results and descriptions of the output formats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants