Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP POST request to destroy job times out #11

Closed
calebwin opened this issue Aug 4, 2021 · 7 comments
Closed

HTTP POST request to destroy job times out #11

calebwin opened this issue Aug 4, 2021 · 7 comments
Labels
banyan-jl Concerning Banyan.jl bug Something isn't working

Comments

@calebwin
Copy link
Contributor

calebwin commented Aug 4, 2021

This was the output:

Black Scholes: Error During Test at /home/calebwin/Projects/banyan-julia/BanyanArrays/test/test_black_scholes.jl:4
  Got exception outside of a @test
  IOError(Base.IOError("read: connection timed out (ETIMEDOUT)", -110) during request(https://hcohsbhhzf.execute-api.us-west-2.amazonaws.com/dev/destroy-job))
  
  Stacktrace:
   [25] post
      @ ~/.julia/packages/HTTP/IAI92/src/HTTP.jl:405 [inlined]
   [26] send_request_get_response(method::Symbol, content::Dict{String, Any})
      @ Banyan ~/Projects/banyan-julia/Banyan/src/utils.jl:313
   [27] destroy_job(job_id::String; failed::Nothing, force::Bool, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
      @ Banyan ~/Projects/banyan-julia/Banyan/src/jobs.jl:160
   [28] destroy_job
      @ ~/Projects/banyan-julia/Banyan/src/jobs.jl:139 [inlined]
   [29] with_job(f::var"#3#6"{var"#55#59"}; kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:job, :destroy_job_on_exit), Tuple{String, Bool}}})
      @ Banyan ~/Projects/banyan-julia/Banyan/src/jobs.jl:239
   [30] run_with_job(test_fn::var"#55#59", name::String)
      @ Main ~/Projects/banyan-julia/BanyanArrays/test/runtests.jl:71
   [31] macro expansion
      @ ~/Projects/banyan-julia/BanyanArrays/test/test_black_scholes.jl:7 [inlined]
   [32] macro expansion
      @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1151 [inlined]
   [33] top-level scope
      @ ~/Projects/banyan-julia/BanyanArrays/test/test_black_scholes.jl:7
   [34] include
      @ ./client.jl:444 [inlined]
   [35] include_tests_to_run
      @ ~/Projects/banyan-julia/BanyanArrays/test/runtests.jl:12 [inlined]
   [36] include_all_tests()
      @ Main ~/Projects/banyan-julia/BanyanArrays/test/runtests.jl:112
   [37] (::var"#11#12")(j::String)
      @ Main ~/Projects/banyan-julia/BanyanArrays/test/runtests.jl:134
   [38] with_job(f::var"#11#12"; kwargs::Base.Iterators.Pairs{Symbol, String, Tuple{Symbol}, NamedTuple{(:job,), Tuple{String}}})
      @ Banyan ~/Projects/banyan-julia/Banyan/src/jobs.jl:233
   [39] top-level scope
      @ ~/Projects/banyan-julia/BanyanArrays/test/runtests.jl:115
   [40] include(fname::String)
      @ Base.MainInclude ./client.jl:444
   [41] top-level scope
      @ none:6
   [42] eval
      @ ./boot.jl:360 [inlined]
   [43] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:261
   [44] _start()
      @ Base ./client.jl:485
  
  caused by: AWS.SimpleQueueService.NonExistentQueue -- The specified queue does not exist for this wsdl version.
  HTTP.ExceptionRequest.StatusError(400, "POST", "/054866216572/banyan_2021-08-01-07212011d1ae9ffc818930b16d25028ea332f0_gather.fifo", HTTP.Messages.Response:
  """
  HTTP/1.1 400 Bad Request
  x-amzn-RequestId: 4965cc12-14a0-5d26-bdec-959244a6248c
  Date: Sun, 01 Aug 2021 07:31:18 GMT
  Content-Type: application/json
  Content-Length: 197
  
  {"Error":{"Code":"AWS.SimpleQueueService.NonExistentQueue","Message":"The specified queue does not exist for this wsdl version.","Type":"Sender"},"RequestId":"4965cc12-14a0-5d26-bdec-959244a6248c"}""")
  
  Stacktrace:
    [1] request(::Type{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}, ::HTTP.URIs.URI, ::Vararg{Any, N} where N; kw::Base.Iterators.Pairs{Symbol, Union{Nothing, Integer}, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:iofunction, :verbose, :require_ssl_verification), Tuple{Nothing, Int64, Bool}}})
      @ HTTP.ExceptionRequest ~/.julia/packages/HTTP/IAI92/src/ExceptionRequest.jl:22
    [2] request(::Type{HTTP.MessageRequest.MessageLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}, method::String, url::HTTP.URIs.URI, headers::Base.Vector{Pair{SubString{String}, SubString{String}}}, body::String; http_version::VersionNumber, target::String, parent::Nothing, iofunction::Nothing, kw::Base.Iterators.Pairs{Symbol, Integer, Tuple{Symbol, Symbol}, NamedTuple{(:verbose, :require_ssl_verification), Tuple{Int64, Bool}}})
      @ HTTP.MessageRequest ~/.julia/packages/HTTP/IAI92/src/MessageRequest.jl:51
    [3] request(::Type{HTTP.BasicAuthRequest.BasicAuthLayer{HTTP.MessageRequest.MessageLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}}, method::String, url::HTTP.URIs.URI, headers::Base.Vector{Pair{SubString{String}, SubString{String}}}, body::String; kw::Base.Iterators.Pairs{Symbol, Integer, Tuple{Symbol, Symbol}, NamedTuple{(:verbose, :require_ssl_verification), Tuple{Int64, Bool}}})
      @ HTTP.BasicAuthRequest ~/.julia/packages/HTTP/IAI92/src/BasicAuthRequest.jl:28
    [4] macro expansion
      @ ~/.julia/packages/AWSCore/wNWgl/src/http.jl:42 [inlined]
    [5] macro expansion
      @ ~/.julia/packages/Retry/vS1bg/src/repeat_try.jl:192 [inlined]
    [6] http_request(request::Dict{Symbol, Any})
      @ AWSCore ~/.julia/packages/AWSCore/wNWgl/src/http.jl:20
    [7] macro expansion
      @ ~/.julia/packages/AWSCore/wNWgl/src/AWSCore.jl:411 [inlined]
    [8] macro expansion
      @ ~/.julia/packages/Retry/vS1bg/src/repeat_try.jl:192 [inlined]
    [9] do_request(r::Dict{Symbol, Any}; return_headers::Bool)
      @ AWSCore ~/.julia/packages/AWSCore/wNWgl/src/AWSCore.jl:394
   [10] do_request
      @ ~/.julia/packages/AWSCore/wNWgl/src/AWSCore.jl:391 [inlined]
   [11] service_query(aws::Dict{Symbol, Any}; args::Base.Iterators.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:service, :version, :operation, :args), Tuple{String, String, String, Dict{String, Any}}}})
      @ AWSCore ~/.julia/packages/AWSCore/wNWgl/src/AWSCore.jl:249
   [12] sqs
      @ ~/.julia/packages/AWSCore/wNWgl/src/Services.jl:3091 [inlined]
   [13] sqs
      @ ~/.julia/packages/AWSSQS/dkI0T/src/AWSSQS.jl:35 [inlined]
   [14] #sqs#1
      @ ~/.julia/packages/AWSSQS/dkI0T/src/AWSSQS.jl:36 [inlined]
   [15] sqs_receive_message(queue::Dict{Symbol, Any})
      @ AWSSQS ~/.julia/packages/AWSSQS/dkI0T/src/AWSSQS.jl:207
   [16] receive_next_message(queue_name::Dict{Symbol, Any})
      @ Banyan ~/Projects/banyan-julia/Banyan/src/queues.jl:23
   [17] compute(fut::Future)
      @ Banyan ~/Projects/banyan-julia/Banyan/src/requests.jl:204
   [18] collect(fut::Future)
      @ Banyan ~/Projects/banyan-julia/Banyan/src/requests.jl:324
   [19] (::var"#55#59")(job::String)
      @ Main ~/Projects/banyan-julia/BanyanArrays/test/test_black_scholes.jl:42
   [20] macro expansion
      @ ./timing.jl:210 [inlined]
   [21] (::var"#3#6"{var"#55#59"})(j::String)
      @ Main ~/Projects/banyan-julia/BanyanArrays/test/runtests.jl:76
   [22] with_job(f::var"#3#6"{var"#55#59"}; kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:job, :destroy_job_on_exit), Tuple{String, Bool}}})
      @ Banyan ~/Projects/banyan-julia/Banyan/src/jobs.jl:233
   [23] run_with_job(test_fn::var"#55#59", name::String)
      @ Main ~/Projects/banyan-julia/BanyanArrays/test/runtests.jl:71
   [24] macro expansion
      @ ~/Projects/banyan-julia/BanyanArrays/test/test_black_scholes.jl:7 [inlined]
   [25] macro expansion
      @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1151 [inlined]
   [26] top-level scope
      @ ~/Projects/banyan-julia/BanyanArrays/test/test_black_scholes.jl:7
   [27] include
      @ ./client.jl:444 [inlined]
   [28] include_tests_to_run
      @ ~/Projects/banyan-julia/BanyanArrays/test/runtests.jl:12 [inlined]
   [29] include_all_tests()
      @ Main ~/Projects/banyan-julia/BanyanArrays/test/runtests.jl:112
   [30] (::var"#11#12")(j::String)
      @ Main ~/Projects/banyan-julia/BanyanArrays/test/runtests.jl:134
   [31] with_job(f::var"#11#12"; kwargs::Base.Iterators.Pairs{Symbol, String, Tuple{Symbol}, NamedTuple{(:job,), Tuple{String}}})
      @ Banyan ~/Projects/banyan-julia/Banyan/src/jobs.jl:233
   [32] top-level scope
      @ ~/Projects/banyan-julia/BanyanArrays/test/runtests.jl:115
   [33] include(fname::String)
      @ Base.MainInclude ./client.jl:444
   [34] top-level scope
      @ none:6
   [35] eval
      @ ./boot.jl:360 [inlined]
   [36] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:261
   [37] _start()
      @ Base ./client.jl:485
Test Summary: | Error  Total
Black Scholes |     1      1

This could be related to #7. When this happened, destroy_job finished in <4s but it might not have scanceled the job causing the job to keep running.

@calebwin calebwin added bug Something isn't working banyan-jl Concerning Banyan.jl labels Aug 4, 2021
@calebwin
Copy link
Contributor Author

calebwin commented Aug 4, 2021

This happens rarely and was hard to reproduce.

@calebwin
Copy link
Contributor Author

calebwin commented Aug 7, 2021

This has occurred again. The best solution is probably having retries for this operation and other operations like it that are idempotent.

@calebwin
Copy link
Contributor Author

calebwin commented Aug 9, 2021

In another instance of this error, the following was printed out:

caused by: AWS.SimpleQueueService.NonExistentQueue -- The specified queue does not exist for this wsdl version.
  HTTP.ExceptionRequest.StatusError(400, "POST", "/054866216572/banyan_2021-08-08-021027dfa9bf5636896b50f427a2b4127bdebb_gather.fifo", HTTP.Messages.Response:
  """
  HTTP/1.1 400 Bad Request
  x-amzn-RequestId: e7a6525b-e3e8-595c-86e5-0bd19ab980cc
  Date: Mon, 09 Aug 2021 00:30:09 GMT
  Content-Type: application/json
  Content-Length: 197
  
  {"Error":{"Code":"AWS.SimpleQueueService.NonExistentQueue","Message":"The specified queue does not exist for this wsdl version.","Type":"Sender"},"RequestId":"e7a6525b-e3e8-595c-86e5-0bd19ab980cc"}""")
  
  Stacktrace:
    [1] request(::Type{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}, ::HTTP.URIs.URI, ::Vararg{Any, N} where N; kw::Base.Iterators.Pairs{Symbol, Union{Nothing, Integer}, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:iofunction, :verbose, :require_ssl_verification), Tuple{Nothing, Int64, Bool}}})
      @ HTTP.ExceptionRequest ~/.julia/packages/HTTP/IAI92/src/ExceptionRequest.jl:22
    [2] request(::Type{HTTP.MessageRequest.MessageLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}, method::String, url::HTTP.URIs.URI, headers::Base.Vector{Pair{SubString{String}, SubString{String}}}, body::String; http_version::VersionNumber, target::String, parent::Nothing, iofunction::Nothing, kw::Base.Iterators.Pairs{Symbol, Integer, Tuple{Symbol, Symbol}, NamedTuple{(:verbose, :require_ssl_verification), Tuple{Int64, Bool}}})
      @ HTTP.MessageRequest ~/.julia/packages/HTTP/IAI92/src/MessageRequest.jl:51
    [3] request(::Type{HTTP.BasicAuthRequest.BasicAuthLayer{HTTP.MessageRequest.MessageLayer{HTTP.ExceptionRequest.ExceptionLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}}, method::String, url::HTTP.URIs.URI, headers::Base.Vector{Pair{SubString{String}, SubString{String}}}, body::String; kw::Base.Iterators.Pairs{Symbol, Integer, Tuple{Symbol, Symbol}, NamedTuple{(:verbose, :require_ssl_verification), Tuple{Int64, Bool}}})
      @ HTTP.BasicAuthRequest ~/.julia/packages/HTTP/IAI92/src/BasicAuthRequest.jl:28
    [4] macro expansion
      @ ~/.julia/packages/AWSCore/wNWgl/src/http.jl:42 [inlined]
    [5] macro expansion
      @ ~/.julia/packages/Retry/vS1bg/src/repeat_try.jl:192 [inlined]
    [6] http_request(request::Dict{Symbol, Any})
      @ AWSCore ~/.julia/packages/AWSCore/wNWgl/src/http.jl:20
    [7] macro expansion
      @ ~/.julia/packages/AWSCore/wNWgl/src/AWSCore.jl:411 [inlined]
    [8] macro expansion
      @ ~/.julia/packages/Retry/vS1bg/src/repeat_try.jl:192 [inlined]
    [9] do_request(r::Dict{Symbol, Any}; return_headers::Bool)
      @ AWSCore ~/.julia/packages/AWSCore/wNWgl/src/AWSCore.jl:394
   [10] do_request
      @ ~/.julia/packages/AWSCore/wNWgl/src/AWSCore.jl:391 [inlined]
   [11] service_query(aws::Dict{Symbol, Any}; args::Base.Iterators.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:service, :version, :operation, :args), Tuple{String, String, String, Dict{String, Any}}}})
      @ AWSCore ~/.julia/packages/AWSCore/wNWgl/src/AWSCore.jl:249
   [12] sqs
      @ ~/.julia/packages/AWSCore/wNWgl/src/Services.jl:3091 [inlined]
   [13] sqs
      @ ~/.julia/packages/AWSSQS/dkI0T/src/AWSSQS.jl:35 [inlined]
   [14] #sqs#1
      @ ~/.julia/packages/AWSSQS/dkI0T/src/AWSSQS.jl:36 [inlined]
   [15] sqs_receive_message(queue::Dict{Symbol, Any})
      @ AWSSQS ~/.julia/packages/AWSSQS/dkI0T/src/AWSSQS.jl:207
   [16] receive_next_message(queue_name::Dict{Symbol, Any})
      @ Banyan ~/Projects/banyan-julia/Banyan/src/queues.jl:23
   [17] compute(fut::Future)
      @ Banyan ~/Projects/banyan-julia/Banyan/src/requests.jl:208
   [18] collect(fut::DataFrame)
      @ Banyan ~/Projects/banyan-julia/Banyan/src/requests.jl:328
   [19] (::var"#15#23")(job::String)
      @ Main ~/Projects/banyan-julia/BanyanDataFrames/test/test_small_dataset.jl:15
   [20] (::var"#3#6"{var"#15#23"})(j::String)
      @ Main ~/Projects/banyan-julia/BanyanDataFrames/test/runtests.jl:77
   [21] with_job(f::var"#3#6"{var"#15#23"}; kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:job, :destroy_job_on_exit), Tuple{String, Bool}}})
      @ Banyan ~/Projects/banyan-julia/Banyan/src/jobs.jl:258
   [22] run_with_job(test_fn::var"#15#23", name::String)
      @ Main ~/Projects/banyan-julia/BanyanDataFrames/test/runtests.jl:72
   [23] macro expansion
      @ ~/Projects/banyan-julia/BanyanDataFrames/test/test_small_dataset.jl:2 [inlined]
   [24] macro expansion
      @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1151 [inlined]
   [25] top-level scope
      @ ~/Projects/banyan-julia/BanyanDataFrames/test/test_small_dataset.jl:2
   [26] include
      @ ./client.jl:444 [inlined]
   [27] include_tests_to_run
      @ ~/Projects/banyan-julia/BanyanDataFrames/test/runtests.jl:14 [inlined]
   [28] include_all_tests()
      @ Main ~/Projects/banyan-julia/BanyanDataFrames/test/runtests.jl:99
   [29] (::var"#9#10")(j::String)
      @ Main ~/Projects/banyan-julia/BanyanDataFrames/test/runtests.jl:114
   [30] with_job(f::var"#9#10"; kwargs::Base.Iterators.Pairs{Symbol, String, Tuple{Symbol}, NamedTuple{(:job,), Tuple{String}}})
      @ Banyan ~/Projects/banyan-julia/Banyan/src/jobs.jl:258
   [31] top-level scope
      @ ~/Projects/banyan-julia/BanyanDataFrames/test/runtests.jl:104
   [32] include(fname::String)
      @ Base.MainInclude ./client.jl:444
   [33] top-level scope
      @ none:6
   [34] eval
      @ ./boot.jl:360 [inlined]
   [35] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:261
   [36] _start()

So it seems like some queue is getting deleted too soon.

@calebwin
Copy link
Contributor Author

On another occassion, this error because a queue does not exist:

caused by: AWS.SimpleQueueService.NonExistentQueue -- The specified queue does not exist for this wsdl version.
  HTTP.ExceptionRequest.StatusError(400, "POST", "/054866216572/banyan_2021-08-10-133354c6c53b402907ff3728141462d685c573_gather.fifo", HTTP.Messages.Response:
  """
  HTTP/1.1 400 Bad Request
  x-amzn-RequestId: fe1329bc-c982-5985-be9f-70304ccddf18
  Date: Tue, 10 Aug 2021 13:45:51 GMT
  Content-Type: application/json
  Content-Length: 197
  
  {"Error":{"Code":"AWS.SimpleQueueService.NonExistentQueue","Message":"The specified queue does not exist for this wsdl version.","Type":"Sender"},"RequestId":"fe1329bc-c982-5985-be9f-70304ccddf18"}""")

@cailinw Perhaps this queue is getting destroyed in multiple places?

@calebwin
Copy link
Contributor Author

This just now failed for describe-jobs as well in the midst of a POST request.

@cailinw
Copy link
Contributor

cailinw commented Oct 14, 2021

This is now resolved. This was because logs are streamed from the cluster to the client side in multiple parts. The client side did not wait to destroy the job until all the parts of the log were received.

@cailinw cailinw closed this as completed Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
banyan-jl Concerning Banyan.jl bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants