Request for Scored Output Files from Algorithm Execution #292

ruixing76 · 2024-12-08T19:27:34Z

Is your feature request related to a problem? Please describe.

I need aggregated results so I can analyze helpfulness on Notes level. I think the only way so far is to run the algorithm from scratch so I’m reproducing results using the downloaded data (notes, ratings, notes history, and user enrollment) on a 64-core Intel(R) Xeon(R) Gold 6448H CPU with 500GB memory (correct me if I am wrong). However, after 20 hours, the pre-scoring phase still hasn’t completed. It looks like it won't finish within one day which stops me from working on further analysis.

Since the algorithm runs every hour or so on the server, may I know:

would it be possible to share the output files (scored_notes.tsv, helpfulness_scores.tsv, note_status_history.tsv, and aux_note_info.tsv)?
and the hardware requirement and expected running time if I want to generate aggregated scores for notes myself?

This would greatly help for research analysis, as running the algorithm locally to aggregate helpfulness scores has been quite challenging.

Describe the solution you'd like
Would it be possible to share the output files (scored_notes.tsv, helpfulness_scores.tsv, note_status_history.tsv, and aux_note_info.tsv)? They don’t need to be the latest versions—files aligned with the current download page would be fine.

Describe alternatives you've considered
It would be nice to share the hardware requirement and expected running time if I want to generate aggregated scores for notes from scratch, or any intermediate process.

Additional context
Thank you so much for your contribution on this amazing project! I am a PhD student working on fact-checking in Natural Language Processing and I am very happy to explore and contribute more. I am actively working on this and any help in above questions would be much appreciated!

ashilgard · 2024-12-11T22:02:44Z

hi - it's not surprising that the job might take that long when run sequentially. since you seem not to be resource-bound, you could try running with the parallel flag set to True. Let us know if that helps!

ruixing76 · 2024-12-12T07:19:02Z

hi - it's not surprising that the job might take that long when run sequentially. since you seem not to be resource-bound, you could try running with the parallel flag set to True. Let us know if that helps!

Hi @ashilgard, many thanks for your reply, I will try that! Actually I do have resource limitations and normally we don't have that much CPU and memory (64G at most), I queued for a very long time to run the algorithm It would be great if it's possible to share results and descriptions of the output formats.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Scored Output Files from Algorithm Execution #292

Request for Scored Output Files from Algorithm Execution #292

ruixing76 commented Dec 8, 2024 •

edited

Loading

ashilgard commented Dec 11, 2024

ruixing76 commented Dec 12, 2024

Request for Scored Output Files from Algorithm Execution #292

Request for Scored Output Files from Algorithm Execution #292

Comments

ruixing76 commented Dec 8, 2024 • edited Loading

ashilgard commented Dec 11, 2024

ruixing76 commented Dec 12, 2024

ruixing76 commented Dec 8, 2024 •

edited

Loading