Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby: Thread and Fiber #21

Open
hungle00 opened this issue Nov 9, 2024 · 0 comments
Open

Ruby: Thread and Fiber #21

hungle00 opened this issue Nov 9, 2024 · 0 comments

Comments

@hungle00
Copy link
Owner

hungle00 commented Nov 9, 2024

The content of this article is my last tech sharing with my team at https://pixta.vn/.

  1. Fibers and Threads
  2. Example: HTTP request
  3. Example: HTTP server
  4. Fiber Scheduler
  5. Concluding

Fibers and Threads

Thread

thread = Thread.new do
  #...
end
thread.join

Fiber

fiber = Fiber.new do
  #...
end
fiber.resume # transfer / Fiber.schedule

As you can see, they have quite similar syntax, so what are the differences between them?

  • The level:
    • Threads are created 1:1 with threads on OS.
    • Fibers are implemented at the programming language level, multiple fibers can run inside a thread.
  • Scheduling mechanism:
    • Threads are run pre-emptive by almost modern OS.
    • Fibers are referred to as a mechanism for cooperative concurrency.

Threads will run automatically, they are scheduled by OS.
With Thread, programmers are just allowed to create new Threads, make them do some tasks, and use the join method to get the return from execution. The OS will run threads and decide when to run and pause to achieve concurrency.

[
  Thread.new { # code },
  Thread.new { # code }
].each(&:join)

Meanwhile, Fiber gives us more control
With Fiber, programmers are free to start, pause, and resume them.

  • Fiber.new { } : create new fiber, started with resume
  • Fiber.yield: pause current Fiber, moves control to where fiber was resumed
  • After suspension, Fiber can be resumed later at the same point with the same execution state.
fib2 = nil

fib = Fiber.new do
  puts "1 - fib started"
  fib2.transfer
  Fiber.yield
  puts "3 - fib resumed"
end

fib2 = Fiber.new do
  puts "2 - control moved to fib2"
  fib.transfer
end

fib.resume
puts ""
fib.resume
1 - fib started
2 - control moved to fib2

3 - fib resumed

Fiber over Thread

  • A fiber is lighter-weight than a thread, so we can spawn more fibers than threads
  • Less context-switching time ( the advantages of cooperative scheduling compare to preemptive scheduling

Fiber scheduler

Fibers were released in Ruby 1.9, but before Ruby 3, Fibers lacked the scheduler implementation to be useful.. Now it is officially supported from Ruby 3.
The Fiber Scheduler consists of two parts:

  • Fiber Scheduler interface ( what ruby 3 implements )
  • Fiber Scheduler implementation

If you want to enable the asynchronous behavior in Ruby, you need to set a Fiber Scheduler object.

Fiber.set_scheduler(scheduler)

The list of Fiber Scheduler implementations and their main differences can be found at Fiber Scheduler List project.

Async gem

  • One of the most mature and common Fiber Scheduler implementations is by Samuel Williams.
  • Furthermore, he not only implemented a Fiber Scheduler but created the gem called Async has the robust API to write concurrency code.

The next part will help you understand more about how to use Thread, Fiber, and Async gem to write concurrent HTTP requests.

HTTP requests example

For example, we will get a list of uuid from this site

require "net/http"

def get_uuid
  url = "https://httpbin.org/uuid"
  response = Net::HTTP.get(URI(url))
  JSON.parse(response)["uuid"]
end

This request will take about 1s to finish.

Sequentially version

def get_http_sequently
  results = []

  10.times.map do
    results << get_uuid
  end

  results
end

now = Time.now
puts get_http_sequently
puts "Fiber runtime: #{Time.now - now}" # about 11-12s

One request took about 1s so if we call sequentially, this code will take about 10s.

Ruby 3 concurrency tools

Concurrency version with thread

def get_http_via_threads
  results = []

  10.times.map do
    Thread.new do
      results << get_uuid
    end
  end.map(&:value)

  results
end
# => 1.3s

Concurrency version with fiber

require "async"

def get_http_via_fibers
  Fiber.set_scheduler(Async::Scheduler.new)
  results = []

  10.times do
    Fiber.schedule do
      results << get_uuid
    end
  end
  results
ensure
  Fiber.set_scheduler(nil)
end
# => 1.2s

Because all requests are called concurrently, the total time is about the time of the slowest request.

Ruby 3 concurrency tools

More about Async

Another implementation uses Async gem like that, we use Kernel#Async method instead of Async::Scheduler

def get_http_via_async
  results = []

  Async do
    10.times do
      Async do
        results << get_uuid
      end
    end
  end
  results
end

The general structure of Async Ruby programs:

  • You always start with an Async block which is passed a task.
  • That main task is usually used to spawn more Async tasks with task.async.
  • These tasks run concurrently with each other and the main task.

Screenshot 2024-11-07 at 14 15 23

The task is built on top of each Fiber.

HTTP server example

The minimal HTTP server in Ruby can be implemented by using the built-in class TCPServer, it'll look like this:

socket = TCPServer.new(HOST, PORT)
socket.listen(SOCKET_READ_BACKLOG)

loop do
  conn = socket.accept # wait for a client to connect
  request = RequestParser.call(conn)
  #... status, headers, body
end

Now we'll make the server handle more than 1 request per time.

Thread pool version

pool = ThreadPool.new(size: 5)
loop do
  conn = socket.accept # wait for a client to connect
  pool.schedule do
    # handle each request
    request = RequestParser.call(conn)
  end
end

The idea is to use a thread pool to limit the number of threads running concurrently.

Async version

Async do
  loop do
    conn = socket.accept # wait for a client to connect
    Async do
      # handle each request
      request = RequestParser.call(conn)
    end
   end
end

The Falcon is the most-known app server that uses async for connection pool Falcon.

More detail about implementation and benchmark testing on this repo

Concluding

  1. Threads and fibers allow programmers to write concurrent code, it's very useful for handling blocking-IO operations.
  2. As a Ruby developer, we don't use Thread directly most of the time. But in reality, for web development, a lot of tools use threads.
    • A web server like Puma or Webrick
    • A background job like Sidekiq, GoodJob, and SolidQueue
    • An ORM like ActiveRecord or Sequel
    • A Http client HTTParty or RestClient
  3. Fiber (+ FiberScheduler) is just been released from Ruby 3 maybe may have a bright future due to its advantages compared to Thread. Here's a couple of the most useful tools on top of fiber:
    • async-http a featureful HTTP client
    • falcon HTTP server built around Async core
    • ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant