Retrieve the updated citations #19

liuml07 · 2016-05-28T23:29:43Z

We got the citations once, and saved the results in the citations.txt file. And now there are a few of new citations in the google scholar. We don't want to run the program all from the start again. Ideally, we can retrieve the citations incrementally. Moreover, during the process of continuous crawling, there may be new citations as well. We don't want to mess this up.

#23 is a good start to support this idea as it added google scholar id tags to bibtex items.

The text was updated successfully, but these errors were encountered:

liuml07 · 2016-09-19T05:38:00Z

As discussions in #23 go well, I think this issue is of good shape to do. Would you like to work on this @yilihong ? Thanks.

yilihong · 2016-09-19T05:50:06Z

@liuml07

Yes, I can give it a try this week. Have a couple of deadlines coming up but I can spend some time on this. I will try to minimize structural change.

One thing I will check is maybe we can first gather all of citation_ids by looping over the htmls, then do a diff with the citation_ids from the bib file, and then loop over the remaining citation_ids?

Any thoughts?

liuml07 · 2016-09-19T07:05:05Z

@yilihong No hurry. This is a non-profit program anyway. I appreciate your contribution very much.

I think the basic idea of looping over the htmls should work just fine. The logic is clear. My minor concern is that, we have to sleep 100 seconds between requests and as a result, we may not make real progress (by donwloading the BibTex page and/or pdf files) while we're looping over the htmls aka gather all of citation_idsf. If we get blocked somehow in this period, we may have wasted chance to at least get something. This period is about 1/10 of total running time. This is not a deal breaker though.

Building a set of gscholar ids from citation.bib file and check each citations on-the-fly seems not bad from this perspective.

liuml07 added the enhancement label May 28, 2016

liuml07 assigned shiqiezi May 28, 2016

liuml07 assigned yilihong and shiqiezi and unassigned shiqiezi Sep 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieve the updated citations #19

Retrieve the updated citations #19

liuml07 commented May 28, 2016 •

edited

Loading

liuml07 commented Sep 19, 2016 •

edited

Loading

yilihong commented Sep 19, 2016 •

edited

Loading

liuml07 commented Sep 19, 2016 •

edited

Loading

Retrieve the updated citations #19

Retrieve the updated citations #19

Comments

liuml07 commented May 28, 2016 • edited Loading

liuml07 commented Sep 19, 2016 • edited Loading

yilihong commented Sep 19, 2016 • edited Loading

liuml07 commented Sep 19, 2016 • edited Loading

liuml07 commented May 28, 2016 •

edited

Loading

liuml07 commented Sep 19, 2016 •

edited

Loading

yilihong commented Sep 19, 2016 •

edited

Loading

liuml07 commented Sep 19, 2016 •

edited

Loading