Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Co-occurrence null model - beginner issue #75

Open
cegboy opened this issue Aug 22, 2018 · 12 comments
Open

Co-occurrence null model - beginner issue #75

cegboy opened this issue Aug 22, 2018 · 12 comments

Comments

@cegboy
Copy link

cegboy commented Aug 22, 2018

Hi,
I am a complete beginner with R (as I've mentioned to Nick Gotelli already via email) and struggling to run the Co-occurrence null model.
Seems I managed to run the model but would appreciate if someone could check it what I am doing.

I have used the following to upload file and run:

Run the null model

test <- cooc_null_model(read.csv(choose.files()), algo="sim9",nReps=10000,burn_in = 500)

Summary and plot info

summary(test)
plot(test,type="burn_in")
plot(test,type="hist")
plot(test,type="cooc")

The file I used is attached as well as plots.
I got this summary:
Time Stamp: Wed Aug 22 21:10:11 2018
Reproducible:
Number of Replications:
Elapsed Time: 0.79 secs
Metric: c_score
Algorithm: sim9
Observed Index: 2.0895
Mean Of Simulated Index: 1.9061
Variance Of Simulated Index: 0.0033681
Lower 95% (1-tail): 1.8263
Upper 95% (1-tail): 2.0105
Lower 95% (2-tail): 1.8158
Upper 95% (2-tail): 2.0474
Lower-tail P = 0.994
Upper-tail P = 0.0079
Observed metric > 9921 simulated metrics
Observed metric < 60 simulated metrics
Observed metric = 19 simulated metrics
Standardized Effect Size (SES): 3.1605

Does it seem alright? Also, is there a preference for csv over txt?
test iteration graph
test hist
test marcelo cooc
test Marcelo.txt

Many thanks
Marcelo

@cegboy
Copy link
Author

cegboy commented Aug 22, 2018

Hi,
further to the previous post I adventured running the full 247 samples x 452 spp and got these results.
Time Stamp: Wed Aug 22 21:25:09 2018
Reproducible:
Number of Replications:
Elapsed Time: 2.5 mins
Metric: c_score
Algorithm: sim9
Observed Index: 475.34
Mean Of Simulated Index: 403.97
Variance Of Simulated Index: 28.796
Lower 95% (1-tail): 401.41
Upper 95% (1-tail): 416.23
Lower 95% (2-tail): 401.35
Upper 95% (2-tail): 422.74
Lower-tail P > 0.9999
Upper-tail P < 1e-04
Observed metric > 10000 simulated metrics
Observed metric < 0 simulated metrics
Observed metric = 0 simulated metrics
Standardized Effect Size (SES): 13.299
full data iteration graph
full data hist
full data cooc graph

Any thoughts?
Cheers
Marcelo

@cegboy cegboy changed the title Co-occurrence null model - beginner issue with Error in speciesData[2, 1] : subscript out of bounds Co-occurrence null model - beginner issue Aug 22, 2018
@cegboy
Copy link
Author

cegboy commented Aug 24, 2018

Hi all, any thoughts on this? Especially the black squares?
cheers
Marcelo

@emhart
Copy link
Member

emhart commented Aug 24, 2018

Hi @cegboy

I took a look at your example and ran it, and it all looks good. One comment is that your file is tab separated, not comma (this was an easy fix, maybe it was uploading). Can you share the other file you're using? It's hard to dig into it without seeing the data, but my first (and possibly wrong) guess is that you're not using long enough burn in with the bigger data set. But I'm happy to check it out for you.

@cegboy
Copy link
Author

cegboy commented Aug 24, 2018

Hi Hart, thanks. Yes, I have since changed to .csv. Here is the big test data I am using. I did try to increase the burn out to 1000 I think.
Thank you very much for your help.
Marcelo
(BTW
Test data EcoSim R.zip

"cegboy" used to be a call sign I used in Atari..)

@cegboy
Copy link
Author

cegboy commented Aug 24, 2018

Hi Edmund @emhart , I run same full 247 samples x 452 spp dataset, this time with no spp or site names, just 1,2,3 etc for spps and site1, site2 etc for sites. Used nReps=10000 and burn_in= 100000.
The black squares was more a question of resolution to fix. I got these results now. Attached plots.
Does this look better? Also, is there any literature I can use besides R documentation to understand and interpret results?
Thanks
Marcelo
full 247 samples x 452 spp no names hist
full 247 samples x 452 spp no names iteration
full 247 samples x 452 spp no names

Time Stamp: Fri Aug 24 20:27:13 2018
Reproducible:
Number of Replications:
Elapsed Time: 19 mins
Metric: c_score
Algorithm: sim9
Observed Index: 4623706512
Mean Of Simulated Index: 4623705736
Variance Of Simulated Index: 851660
Lower 95% (1-tail): 4623704208
Upper 95% (1-tail): 4623707234
Lower 95% (2-tail): 4623704020
Upper 95% (2-tail): 4623707560
Lower-tail P = 0.7944
Upper-tail P = 0.2056
Observed metric > 7944 simulated metrics
Observed metric < 2056 simulated metrics
Observed metric = 0 simulated metrics
Standardized Effect Size (SES): 0.84054

@cegboy
Copy link
Author

cegboy commented Aug 24, 2018

Now same dataset and parameters but with names of species and sites
Time Stamp: Fri Aug 24 21:10:17 2018

Reproducible:
Number of Replications:
Elapsed Time: 20 mins
Metric: c_score
Algorithm: sim9
Observed Index: 475.34
Mean Of Simulated Index: 401.76
Variance Of Simulated Index: 0.064255
Lower 95% (1-tail): 401.32
Upper 95% (1-tail): 402.17
Lower 95% (2-tail): 401.26
Upper 95% (2-tail): 402.29
Lower-tail P > 0.9999
Upper-tail P < 1e-04
Observed metric > 10000 simulated metrics
Observed metric < 0 simulated metrics
Observed metric = 0 simulated metrics
Standardized Effect Size (SES): 290.26
full 247 samples x 452 spp with names
full 247 samples x 452 spp with names histogram
full 247 samples x 452 spp with names iteration

@cegboy
Copy link
Author

cegboy commented Aug 25, 2018

Hi @emhart Edmond and @ngotelli Nick,
I run the model with variations of nRep and Burn-in using the full data set with site and species names with underscores between names e.g. Alouatta_belzebul.
Some observations:

  • C-scores for all 4 simulations were the same as you can see below.
  • Using the default Burn-in reached 50%. The other three reached 100%.
  • Only SES suffered noticeable differences.
  • Histograms graphs also where less visually enticing when using higher Burn-In due to scale.
  • Trace graphs also varied visually due to scale.
  • Simulated and observed graphs look mostly the same and the original issue of the black squares was solved by increasing resolution.
    I am tending to use the nRep=1000 and Burn-In = 1000 to report results as all C-scores are the same, the graph scales look visually more informative. The major variation among simulations was the SES value. How important would that be for analysis of results?
    Attached are the graphs as well. Thanks for all your help.
    Cheers
    Marcelo
    results marcelo simulations
    nreps 5000 burn- in 5000 trace
    nreps 5000 burn-in 5000 hist
    nreps 5000 burn-in 5000 sim ori
    nreps 1000 burn- in 5000 trace
    nreps 1000 burn-in 5000 hist
    nreps 1000 burn-in 5000 sim ori
    nreps 1000 burn- in 1000 trace
    nreps 1000 burn-in 1000 hist
    nreps 1000 burn-in 1000 sim ori
    Uploading nReps 1000  Burn-in 500 sim ori.jpeg…
    Uploading nReps 1000  Burn-in 500 Hist.jpeg…
    Uploading nReps 1000 Burn- in 500 Trace.jpeg…

@cegboy
Copy link
Author

cegboy commented Aug 26, 2018

nreps 1000 burn-in 500 sim ori
nreps 1000 burn-in 500 hist
nreps 1000 burn- in 500 trace

@ngotelli
Copy link
Contributor

ngotelli commented Aug 26, 2018 via email

@cegboy
Copy link
Author

cegboy commented Aug 26, 2018

Dear Nick @ngotelli , thank you very much for your insights.
In my case I was hoping to use the C-Scores to compare with different types of protected areas categories, testing if there would be any significant difference between the species community being conserved.
You mention that with such big datasets the observed will allows be very distant from the null model. So in this case, would you say such comparison, using Co-occurrence analysis would be useless? I do plan to add some habitat variable, possibly deforestation as well, but those are future plans.
Any thoughts on the usefulness of Co-occurrence analysis on this case or any other analysis with presence absence data you would suggest?
Can you attach your papers again please?
Many thanks
Marcelo

@ngotelli
Copy link
Contributor

ngotelli commented Aug 26, 2018 via email

@cegboy
Copy link
Author

cegboy commented Aug 27, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants