Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threading issues with Puma #31

Open
cbenning opened this issue Jul 22, 2019 · 12 comments
Open

Threading issues with Puma #31

cbenning opened this issue Jul 22, 2019 · 12 comments
Labels

Comments

@cbenning
Copy link

Started seeing these:

#<NoMethodError: undefined method `borrow_or_take' for nil:NilClass> /usr/local/rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/aws-xray-sdk-0.11.2/lib/aws-xray-sdk/sampling/default_sampler.rb:85:in `process_matched_rule' /usr/local/rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/aws-xray-sdk-0.11.2/lib/aws-xray-sdk/sampling/default_sampler.rb:54:in `sample_request?' /usr/local/rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/aws-xray-sdk-0.11.2/lib/aws-xray-sdk/facets/helper.rb:36:in `should_sample?' /usr/local/rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/aws-xray-sdk-0.11.2/lib/aws-xray-sdk/facets/rack.rb:30:in `call' /usr/local/rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/puma-4.0.1/lib/puma/configuration.rb:228:in `call' /usr/local/rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/puma-4.0.1/lib/puma/server.rb:657:in `handle_request' /usr/local/rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/puma-4.0.1/lib/puma/server.rb:467:in `process_client' /usr/local/rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/puma-4.0.1/lib/puma/server.rb:328:in `block in run' /usr/local/rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/puma-4.0.1/lib/puma/thread_pool.rb:135:in `block in spawn_thread'

However we aren't using sampling so I'm not sure whats up.

@ss2305
Copy link

ss2305 commented Jul 22, 2019

@cbenning The sampling decision coming from trace_header always has the highest precedence. If the trace header doesn't contain sampling decision then it checks to see if sampling is enabled or no in the recorder. If not enabled it returns 'true'.

When you don't use sampling, the sampler looks for previously made decision based on default rule if
no path-based rule has been matched. I think that's why you see this error.

@cbenning
Copy link
Author

So I guess that means that sampling: true by default and I'm using it without realizing it?

@ss2305
Copy link

ss2305 commented Jul 22, 2019

@cbenning The default rule traces the first request each second, and five percent of any additional requests across all services sending traces to X-Ray. If the SDK can't reach X-Ray to get sampling rules, it reverts to a default local rule of the first request each second, and five percent of any additional requests per host. This can occur if the host doesn't have permission to call sampling APIs, or can't connect to the X-Ray daemon, which acts as a TCP proxy for API calls made by the SDK.

You can find more in the documentation here.

@cbenning
Copy link
Author

This error only happens occasionally, as far as we can tell it is working fine otherwise. Are you suggesting this is a connectivity/latency issue with the local xray daemon?

@ss2305
Copy link

ss2305 commented Jul 25, 2019

@cbenning If it's sporadic in how this breaks then yes. However if you can reproduce the issue consistently and share with us a sample it would be greatly appreciated.

@cbenning
Copy link
Author

Ok @ss2305 I can't reproduce it reliably so I will just treat it as intermittent for now and keep an eye on it.

thanks

@cbenning
Copy link
Author

cbenning commented Jul 29, 2019

@ss2305 Also, this triggers alerts for us, what is a safe way to suppress them? Can I add a default sampling rule? It still feels to me that this should not be stacktracing if this is just business-as-usual in this situation.

Would increasing the Concurrency setting improve this potentially? The x-ray daemon in our instance is using default config, but I don't see why it would be unreachable

@chanchiem
Copy link
Contributor

We're taking a deeper look at this and we will update it when we have any new findings. Thanks for letting us know about this issue.

@cbenning
Copy link
Author

cbenning commented Aug 1, 2019

FYI Increasing Concurrency from 8 -> 24 had no effect

@awssandra awssandra added the bug label Jan 10, 2020
@thegorgon
Copy link

We're experiencing the same issue - sporadically, requests will fail while making a sampling decision.

Looks like this issue is quite stale. Was there any update or mitigation discovered?

@cbenning
Copy link
Author

@thegorgon not that we have discovered. We've basically just turned the sampling rate down to almost nothing and have basically stopped using x-ray with ruby.

@willarmiros
Copy link
Contributor

@thegorgon I know this is a sporadic issue, but are there any patterns with which you've noticed this consistently fails? Any help in reproducing this like a sample app (even if the error only occurs intermittently) would be very useful. For some context from an initial inspection, it looks like this error must be happening because a sampling rule doesn't have a reservoir. Given that SamplingRules are always initialized with a non-Nil Reservoir, I'd guess something weird may be happening in this rule merge logic.

Are you using Centralized sampling? If so can you describe your use case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants