Co-occurrence null model - beginner issue #75

cegboy · 2018-08-22T20:19:00Z

Hi,
I am a complete beginner with R (as I've mentioned to Nick Gotelli already via email) and struggling to run the Co-occurrence null model.
Seems I managed to run the model but would appreciate if someone could check it what I am doing.

I have used the following to upload file and run:

Run the null model

test <- cooc_null_model(read.csv(choose.files()), algo="sim9",nReps=10000,burn_in = 500)

Summary and plot info

summary(test)
plot(test,type="burn_in")
plot(test,type="hist")
plot(test,type="cooc")

The file I used is attached as well as plots.
I got this summary:
Time Stamp: Wed Aug 22 21:10:11 2018
Reproducible:
Number of Replications:
Elapsed Time: 0.79 secs
Metric: c_score
Algorithm: sim9
Observed Index: 2.0895
Mean Of Simulated Index: 1.9061
Variance Of Simulated Index: 0.0033681
Lower 95% (1-tail): 1.8263
Upper 95% (1-tail): 2.0105
Lower 95% (2-tail): 1.8158
Upper 95% (2-tail): 2.0474
Lower-tail P = 0.994
Upper-tail P = 0.0079
Observed metric > 9921 simulated metrics
Observed metric < 60 simulated metrics
Observed metric = 19 simulated metrics
Standardized Effect Size (SES): 3.1605

Does it seem alright? Also, is there a preference for csv over txt?

test Marcelo.txt

Many thanks
Marcelo

cegboy · 2018-08-22T20:32:07Z

Hi,
further to the previous post I adventured running the full 247 samples x 452 spp and got these results.
Time Stamp: Wed Aug 22 21:25:09 2018
Reproducible:
Number of Replications:
Elapsed Time: 2.5 mins
Metric: c_score
Algorithm: sim9
Observed Index: 475.34
Mean Of Simulated Index: 403.97
Variance Of Simulated Index: 28.796
Lower 95% (1-tail): 401.41
Upper 95% (1-tail): 416.23
Lower 95% (2-tail): 401.35
Upper 95% (2-tail): 422.74
Lower-tail P > 0.9999
Upper-tail P < 1e-04
Observed metric > 10000 simulated metrics
Observed metric < 0 simulated metrics
Observed metric = 0 simulated metrics
Standardized Effect Size (SES): 13.299

Any thoughts?
Cheers
Marcelo

cegboy · 2018-08-24T13:29:58Z

Hi all, any thoughts on this? Especially the black squares?
cheers
Marcelo

emhart · 2018-08-24T15:24:56Z

Hi @cegboy

I took a look at your example and ran it, and it all looks good. One comment is that your file is tab separated, not comma (this was an easy fix, maybe it was uploading). Can you share the other file you're using? It's hard to dig into it without seeing the data, but my first (and possibly wrong) guess is that you're not using long enough burn in with the bigger data set. But I'm happy to check it out for you.

cegboy · 2018-08-24T15:53:58Z

Hi Hart, thanks. Yes, I have since changed to .csv. Here is the big test data I am using. I did try to increase the burn out to 1000 I think.
Thank you very much for your help.
Marcelo
(BTW
Test data EcoSim R.zip

"cegboy" used to be a call sign I used in Atari..)

cegboy · 2018-08-24T19:41:23Z

Hi Edmund @emhart , I run same full 247 samples x 452 spp dataset, this time with no spp or site names, just 1,2,3 etc for spps and site1, site2 etc for sites. Used nReps=10000 and burn_in= 100000.
The black squares was more a question of resolution to fix. I got these results now. Attached plots.
Does this look better? Also, is there any literature I can use besides R documentation to understand and interpret results?
Thanks
Marcelo

Time Stamp: Fri Aug 24 20:27:13 2018
Reproducible:
Number of Replications:
Elapsed Time: 19 mins
Metric: c_score
Algorithm: sim9
Observed Index: 4623706512
Mean Of Simulated Index: 4623705736
Variance Of Simulated Index: 851660
Lower 95% (1-tail): 4623704208
Upper 95% (1-tail): 4623707234
Lower 95% (2-tail): 4623704020
Upper 95% (2-tail): 4623707560
Lower-tail P = 0.7944
Upper-tail P = 0.2056
Observed metric > 7944 simulated metrics
Observed metric < 2056 simulated metrics
Observed metric = 0 simulated metrics
Standardized Effect Size (SES): 0.84054

cegboy · 2018-08-24T21:18:39Z

Now same dataset and parameters but with names of species and sites
Time Stamp: Fri Aug 24 21:10:17 2018

Reproducible:
Number of Replications:
Elapsed Time: 20 mins
Metric: c_score
Algorithm: sim9
Observed Index: 475.34
Mean Of Simulated Index: 401.76
Variance Of Simulated Index: 0.064255
Lower 95% (1-tail): 401.32
Upper 95% (1-tail): 402.17
Lower 95% (2-tail): 401.26
Upper 95% (2-tail): 402.29
Lower-tail P > 0.9999
Upper-tail P < 1e-04
Observed metric > 10000 simulated metrics
Observed metric < 0 simulated metrics
Observed metric = 0 simulated metrics
Standardized Effect Size (SES): 290.26

cegboy · 2018-08-25T22:59:07Z

Hi @emhart Edmond and @ngotelli Nick,
I run the model with variations of nRep and Burn-in using the full data set with site and species names with underscores between names e.g. Alouatta_belzebul.
Some observations:

C-scores for all 4 simulations were the same as you can see below.
Using the default Burn-in reached 50%. The other three reached 100%.
Only SES suffered noticeable differences.
Histograms graphs also where less visually enticing when using higher Burn-In due to scale.
Trace graphs also varied visually due to scale.
Simulated and observed graphs look mostly the same and the original issue of the black squares was solved by increasing resolution.
I am tending to use the nRep=1000 and Burn-In = 1000 to report results as all C-scores are the same, the graph scales look visually more informative. The major variation among simulations was the SES value. How important would that be for analysis of results?
Attached are the graphs as well. Thanks for all your help.
Cheers
Marcelo

cegboy · 2018-08-26T09:36:51Z

ngotelli · 2018-08-26T15:32:32Z

Dear Marcelo (and Ted): Sorry to be out of touch, but we at the end of the field season and the start of the academic semester, which is a very busy time of year. The analyses you have conducted look "correct", although as Ted noted, you need to increase the number of burn-in replications until the curve begins to flatten out. This will not change your results because, for such a large matrix, the observed is always very distant from the null. The graphics will show up black when there are too many species or sites to display, but it looks like you have solved that problem. The bigger issue is that you are exploring a paradigm with methods that are now over 20 years old, and the field of null model analysis has changed a lot during that time period. More emphasis now is put on identifying pairs of non-random species, and on combining other data on spatial location and habitat variables to tease apart the mechanisms for species non-random associations, which can include species interactions, habitat niches, and dispersal limitation. I appreciate that not all of these hypotheses may be easy to address with microbe data. I have attached some of my own papers in this literature, although there are now many other approaches to consider as well. I hope this helps guide you in your future analyses. Best wishes, Nick

…

________________________________ From: cegboy <notifications@github.com> Sent: Sunday, August 26, 2018 5:36 AM To: GotelliLab/EcoSimR Cc: Nicholas Gotelli; Mention Subject: Re: [GotelliLab/EcoSimR] Co-occurrence null model - beginner issue (#75) [nreps 1000 burn-in 500 sim ori]<https://user-images.githubusercontent.com/20757936/44626846-e66e4e80-a91b-11e8-9615-5b7d115fbfa9.jpeg> [nreps 1000 burn-in 500 hist]<https://user-images.githubusercontent.com/20757936/44626847-e66e4e80-a91b-11e8-8564-75147740b89f.jpeg> [nreps 1000 burn- in 500 trace]<https://user-images.githubusercontent.com/20757936/44626848-e66e4e80-a91b-11e8-8452-5e82d3e32685.jpeg> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#75 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AEso-oDelEREJUxay4Lmj5FpmlCtHPqTks5uUmwzgaJpZM4WIWCK>.

cegboy · 2018-08-26T19:20:17Z

Dear Nick @ngotelli , thank you very much for your insights.
In my case I was hoping to use the C-Scores to compare with different types of protected areas categories, testing if there would be any significant difference between the species community being conserved.
You mention that with such big datasets the observed will allows be very distant from the null model. So in this case, would you say such comparison, using Co-occurrence analysis would be useless? I do plan to add some habitat variable, possibly deforestation as well, but those are future plans.
Any thoughts on the usefulness of Co-occurrence analysis on this case or any other analysis with presence absence data you would suggest?
Can you attach your papers again please?
Many thanks
Marcelo

ngotelli · 2018-08-26T19:47:43Z

Dear Marcelo @cegboy<http://@cegboy> , It is really hard for me to say how the comparisons will change for different subgroups. Here are the papers again. They may give you ideas for further analyses. Best wishes, Nick

…

________________________________ From: cegboy <notifications@github.com> Sent: Sunday, August 26, 2018 3:20 PM To: GotelliLab/EcoSimR Cc: Nicholas Gotelli; Mention Subject: Re: [GotelliLab/EcoSimR] Co-occurrence null model - beginner issue (#75) Dear Nick @ngotelli<https://github.com/ngotelli> , thank you very much for your insights. In my case I was hoping to use the C-Scores to compare with different types of protected areas categories, testing if there would be any significant difference between the species community being conserved. You mention that with such big datasets the observed will allows be very distant from the null model. So in this case, would you say such comparison, using Co-occurrence analysis would be useless? I do plan to add some habitat variable, possibly deforestation as well, but those are future plans. Any thoughts on the usefulness of Co-occurrence analysis on this case or any other analysis with presence absence data you would suggest? Can you attach your papers again please? Many thanks Marcelo — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#75 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AEso-gX_a_eLPyvsfmOxLzVj077_PzdOks5uUvTygaJpZM4WIWCK>.

cegboy · 2018-08-27T03:57:58Z

Thanks Nick. Will do further tests. The papers didn't attach but no worries as I have now downloaded some from your site. Cheers Marcelo

…

On Sun, 26 Aug 2018 at 20:47, Nick Gotelli ***@***.***> wrote: Dear Marcelo @***@***.***> , It is really hard for me to say how the comparisons will change for different subgroups. Here are the papers again. They may give you ideas for further analyses. Best wishes, Nick ________________________________ From: cegboy ***@***.***> Sent: Sunday, August 26, 2018 3:20 PM To: GotelliLab/EcoSimR Cc: Nicholas Gotelli; Mention Subject: Re: [GotelliLab/EcoSimR] Co-occurrence null model - beginner issue (#75) Dear Nick @ngotelli<https://github.com/ngotelli> , thank you very much for your insights. In my case I was hoping to use the C-Scores to compare with different types of protected areas categories, testing if there would be any significant difference between the species community being conserved. You mention that with such big datasets the observed will allows be very distant from the null model. So in this case, would you say such comparison, using Co-occurrence analysis would be useless? I do plan to add some habitat variable, possibly deforestation as well, but those are future plans. Any thoughts on the usefulness of Co-occurrence analysis on this case or any other analysis with presence absence data you would suggest? Can you attach your papers again please? Many thanks Marcelo — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< #75 (comment)>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AEso-gX_a_eLPyvsfmOxLzVj077_PzdOks5uUvTygaJpZM4WIWCK >. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#75 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ATy9sFXk_8oxWB5tRthecZanlgDH6Aj0ks5uUvtfgaJpZM4WIWCK> .

-- Dr Marcelo Gonçalves de Lima Research Fellow - Center for Large Landscape Conservation Cambridge Conservation Forum - Connectivity Conservation Work Group Chair IUCN - WCPA member/Connectivity Conservation Specialist Group - Brazil Lead IUCN - CEM member ARPA - Amazon Region Protected Areas Programme Scientific Advisor Biologist, PhD in Ecology https://uk.linkedin.com/in/marcelo-lima-35b3ba20

cegboy changed the title ~~Co-occurrence null model - beginner issue with Error in speciesData[2, 1] : subscript out of bounds~~ Co-occurrence null model - beginner issue Aug 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Co-occurrence null model - beginner issue #75

Co-occurrence null model - beginner issue #75

cegboy commented Aug 22, 2018

cegboy commented Aug 22, 2018

cegboy commented Aug 24, 2018

emhart commented Aug 24, 2018

cegboy commented Aug 24, 2018

cegboy commented Aug 24, 2018

cegboy commented Aug 24, 2018

cegboy commented Aug 25, 2018

cegboy commented Aug 26, 2018

ngotelli commented Aug 26, 2018 via email

cegboy commented Aug 26, 2018

ngotelli commented Aug 26, 2018 via email

cegboy commented Aug 27, 2018 via email

Co-occurrence null model - beginner issue #75

Co-occurrence null model - beginner issue #75

Comments

cegboy commented Aug 22, 2018

Run the null model

Summary and plot info

cegboy commented Aug 22, 2018

cegboy commented Aug 24, 2018

emhart commented Aug 24, 2018

cegboy commented Aug 24, 2018

cegboy commented Aug 24, 2018

cegboy commented Aug 24, 2018

cegboy commented Aug 25, 2018

cegboy commented Aug 26, 2018

ngotelli commented Aug 26, 2018 via email

cegboy commented Aug 26, 2018

ngotelli commented Aug 26, 2018 via email

cegboy commented Aug 27, 2018 via email