You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We got the citations once, and saved the results in the citations.txt file. And now there are a few of new citations in the google scholar. We don't want to run the program all from the start again. Ideally, we can retrieve the citations incrementally. Moreover, during the process of continuous crawling, there may be new citations as well. We don't want to mess this up.
#23 is a good start to support this idea as it added google scholar id tags to bibtex items.
The text was updated successfully, but these errors were encountered:
Yes, I can give it a try this week. Have a couple of deadlines coming up but I can spend some time on this. I will try to minimize structural change.
One thing I will check is maybe we can first gather all of citation_ids by looping over the htmls, then do a diff with the citation_ids from the bib file, and then loop over the remaining citation_ids?
@yilihong No hurry. This is a non-profit program anyway. I appreciate your contribution very much.
I think the basic idea of looping over the htmls should work just fine. The logic is clear. My minor concern is that, we have to sleep 100 seconds between requests and as a result, we may not make real progress (by donwloading the BibTex page and/or pdf files) while we're looping over the htmls aka gather all of citation_idsf. If we get blocked somehow in this period, we may have wasted chance to at least get something. This period is about 1/10 of total running time. This is not a deal breaker though.
Building a set of gscholar ids from citation.bib file and check each citations on-the-fly seems not bad from this perspective.
We got the citations once, and saved the results in the
citations.txt
file. And now there are a few of new citations in the google scholar. We don't want to run the program all from the start again. Ideally, we can retrieve the citations incrementally. Moreover, during the process of continuous crawling, there may be new citations as well. We don't want to mess this up.#23 is a good start to support this idea as it added google scholar id tags to bibtex items.
The text was updated successfully, but these errors were encountered: