Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export large file in Rails with Backgroud Job #5

Open
hungle00 opened this issue Apr 4, 2022 · 0 comments
Open

Export large file in Rails with Backgroud Job #5

hungle00 opened this issue Apr 4, 2022 · 0 comments
Labels
ruby About Ruby and Rails

Comments

@hungle00
Copy link
Owner

hungle00 commented Apr 4, 2022

In some systems, you'll have to do something very slow. Maybe you need to export a large dataset to a CSV file, or need to generate a thousand row Exel file that takes some minutes to complete?

Instead of try building these kinds of reports in a normal Rails controller action, this task should be moved into a background job for processing outside the life cycle of a single HTTP request.

But when you move work to a background job, code execution does not stop and you need a way to “connect” back to the user that caused the job to be created. It’s outside of a single request-response life cycle so you cannot rely on normal Rails controllers.

We often need a way to report back progress and send back information once the job is completed.

Example: Export large xlsx file

This sample uses Sidekiq for Backgroud Job and a Stimulus controller to handle client-side polling/updating.
First, we use this command to generate new job rails generate sidekiq:job ExportUser.

Now we move the export file logic to the export_job.rb file, and call the job inside the controller instead.

class UsersController < ApplicationController
  def export
    respond_to do |format|
      format.json do
        job_id = ExportUserJob.perform_async

        render json: {
          jid: job_id
        }
      end
    end
  end

So our controller will now be like this, the csv generation will now be handle in the background instead of in our main app

class ExportUserJob
  include Sidekiq::Job
  include Sidekiq::Status::Worker

  def perform(*args)
    users = User.pluck :id, :name, :email, :address, :phone
    total users.size
    # Create Axlsx package and workbook
    xlsx_package = Axlsx::Package.new
    xlsx_workbook = xlsx_package.workbook
    
    xlsx_workbook.add_worksheet(name: "Users") do |worksheet|
      worksheet.add_row %w(ID Name Email Address Phone)
      
      users.each.with_index(1) do |user, idx|
        worksheet.add_row user
        # Caculate percentage 
        at idx
      end
    end
    
    # Save file into tmp with suffix is jobId
    xlsx_package.serialize Rails.root.join("tmp", "users_export_#{self.jid}.xlsx")
  end
end

However, there was a problem with using Sidekiq, the job is executed in the background so it doesn’t inform the main server when it is completed. So we executed the job, got the exported file, and now what? We cannot give the file back to the server to return it to our client.

Polling + Progress

It is a repeating AJAX call to check the job status and a block of text

The first approach is to combine polling (periodically checking on the job via AJAX calls) with progress reporting. This approach is good in case the user wants to export large reports and wants to monitor progress.

In UserController, We need 2 methods, one for tracking job's status and one for downloading file when job is completed:

class UsersController < ApplicationController
 ...

  def export_status
    respond_to do |format|
      format.json do
        job_id = params[:job_id]
        # Check job status and percentage using JobId
        job_status = Sidekiq::Status.get_all(job_id).symbolize_keys

        render json: {
          status: job_status[:status],
          percentage: job_status[:pct_complete]
        }
      end
    end
  end
  
  def export_download
    job_id = params[:id]
    exported_file_name = "users_export_#{job_id}.xlsx"
    filename = "UserData_#{DateTime.now.strftime("%Y%m%d_%H%M%S")}.xlsx"

    respond_to do |format|
      format.xlsx do
        send_file Rails.root.join("tmp", exported_file_name), type: :xlsx, filename: filename
      end
    end
  end
end

Our Stimulus controller will be looks like this:

import { Controller } from "@hotwired/stimulus"

export default class extends Controller {
  static targets = ["download", "status"];

  export() {
    fetch('/export_user')
      .then(response => response.json())
      .then(data => {
        const jobId = data.jid;
        this.statusTarget.textContent = "Exporting ...";

        this.timer = setInterval(() => {
          this.checkJobStatus(jobId)
        }, 1000);
      });
  }

  checkJobStatus(jobId) {
    fetch(`/export_status?job_id=${jobId}`)
      .then(response => response.json())
      .then(data => {
        const percentage = data.percentage;
        this.statusTarget.textContent = `Exporting ${percentage}%`;
        if(data.status == "error") {
            this.stopCheckJobStatus();
        }else if(data.status === "complete") {
          this.stopCheckJobStatus()
          this.downloadTarget.href = `/export_download.xlsx?id=${jobId}`;
          this.downloadTarget.classList.remove("hidden");
        }
      })
  }

  stopCheckJobStatus() {
    if(this.timer) {
      clearInterval(this.timer);
    }
  }

  disconnect() {
    this.stopCheckJobStatus();
  }
}

This approach is extendable and flexible. You can implement the progress reporting in a variety of ways: a numeric progress bar, a single “status” message, or even a detailed log of progress.

Action Cable to broadcast

In order to return the csv file to the client, we will create a channel using Action Cable and broadcast the file back when it’s ready.
Use rails g channel VideoConversion to generate the new channel

class ExportUserChannel < ApplicationCable::Channel
  def subscribed
    stream_from "export_user"
  end
end

Then we will add the code to broadcast our file in the job so it will return the csv when the job is performed.

class ExportUserJob
  include Sidekiq::Job

  def perform(*args)
    ....
    # Save file into tmp with suffix is jobId
    xlsx_package.serialize Rails.root.join("tmp", "users_export_#{self.jid}.xlsx")

    ActionCable.server.broadcast "export_user", self.jid
  end
end

In controller, we just need 2 methods:

class UsersController < ApplicationController
  def export
    ExportUserJob.perform_async
    head :accepted
  end

  def export_download
  end
end

Next, let’s add the Stimulus controller that will listen for events from this channel and update the UI as it receives them. The important parts are the channel we subscribe to in the export method and the received method

import { Controller } from "@hotwired/stimulus"
import consumer from "../channels/consumer"

export default class extends Controller {
  static targets = ["status"];

  export() {
    fetch('/export_user')
      .then(() => {
        this.statusTarget.textContent = "Exporting ...";
        this.subscribe_channel(this.downloadTarget)
      })
      .catch((error) => {
        console.error('Error:', error);
      });
  }

  subscribe_channel() {
    this.channel = consumer.subscriptions.create("ExportUserChannel", {
      connected() {
        console.log("hello")
      },

      disconnected() {},

      received(data) {
        console.log(data)
        window.location.href = `/export_download.xlsx?id=${data}`
      }
    });
  }
} 

You can find the full source code for this guide on this repo, or for action cable approach in branch export_action-cablel

References

@hungle00 hungle00 added the ruby About Ruby and Rails label Apr 4, 2022
@hungle00 hungle00 changed the title Export large file in Rails with Backgroup Job Export large file in Rails with Backgroud Job Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ruby About Ruby and Rails
Projects
None yet
Development

No branches or pull requests

1 participant