Skip to content

Write a simple importer

Mark Bussey edited this page Feb 6, 2019 · 20 revisions

Simple Importer Goals:

  • Write a very simple CSV importer
  • Be able to point to the parts of the importer

Setup

(OPTIONAL) Save your current changes

If you have changes in your current branch -- you can check on this via git status -- you'll want to save those before starting this lesson (which uses a separate branch):

  • git checkout -b your_branch_name
  • git add .
  • git commit -m 'checkpoint before beginning simple importer'

Check out working branch

git checkout importer_setup

NOTE: If you make experimental changes and want to get back to the minimal code state necessary to run this lesson, you can check the starting code out again using:
git checkout importer_setup

1. Write a test for the importer

As you've come to expect by now, we're going to write a test for our simple importer first. Make a directory in the spec folder for our importer tests:

  mkdir spec/importers

Now, make a file in that folder called simple_importer_spec.rb and paste the following content into it:

# frozen_string_literal: true

require 'rails_helper'
require 'active_fedora/cleaner'

RSpec.describe SimpleImporter do
  let(:one_line_example)       { 'spec/fixtures/csv_files/one_line_example.csv' }
  let(:three_line_example)     { 'spec/fixtures/csv_files/three_line_example.csv' }

  before do
    DatabaseCleaner.clean
    ActiveFedora::Cleaner.clean!
  end

  it "imports a csv" do
    expect { SimpleImporter.new(three_line_example).import }.to change { Image.count }.by 3
  end

  it "puts the title into the title field" do
    SimpleImporter.new(one_line_example).import
    expect(Image.where(title: 'A Cute Dog').count).to eq 1
  end

  it "puts the url into the source field" do
    SimpleImporter.new(one_line_example).import
    expect(Image.where(source: 'https://www.pexels.com/photo/animal-blur-canine-close-up-551628/').count).to eq 1
  end

  it "creates publicly visible objects" do
    SimpleImporter.new(one_line_example).import
    imported_image = Image.first
    expect(imported_image.visibility).to eq 'open'
  end

  it "attaches files" do
    allow(AttachFilesToWorkJob).to receive(:perform_later)
    SimpleImporter.new(one_line_example).import
    expect(AttachFilesToWorkJob).to have_received(:perform_later).exactly(1).times
  end
end

Run your test: rspec spec/importers/simple_importer_spec.rb. You should see an error that says something like:

NameError:
  uninitialized constant SimpleImporter

2. Make a SimpleImporter class

Let's write just enough of our importer to make that error message change. Make a folder called importers in the app folder (mkdir app/importers), and within that make a file called simple_importer.rb. Paste this into it:

class SimpleImporter

  def initialize(file)
    @file = file
    @user = ::User.batch_user
  end

  def import
  end
end

Now run your test again (rspec spec/importers/simple_importer_spec.rb). The test will still fail, but for different reasons. Now it is able to find a class called SimpleImporter, but calling it does not produce the expected results. Writing our test first and making small changes to behavior, while running our test over and over to observe how it behaves is a good TDD habit that we're practicing here.

3. Process the CSV and make an object with metadata

Ideally we would make just one test pass at at time; to save time in this lesson, we're showing the completed code to pass four of our five tests. Replace your simple_importer.rb file with this one:

require 'csv'

class SimpleImporter

  def initialize(file)
    @file = file
    @user = ::User.batch_user
  end

  def import
    CSV.foreach(@file) do |row|
      image = Image.new
      image.depositor = @user.email
      image.title << row[1]
      image.source << row[2]
      image.visibility = Hydra::AccessControls::AccessRight::VISIBILITY_TEXT_VALUE_PUBLIC
      image.save
    end
  end
end

Run your tests again, and most of them should pass. The only one that still fails is the file attachment.

4. Attach the files

Replace simple_importer.rb again, with this version of the code:

require 'csv'

class SimpleImporter

  def initialize(file)
    @file = file
    @user = ::User.batch_user
  end

  def import
    CSV.foreach(@file) do |row|
      image = Image.new
      image.depositor = @user.email
      image.title << row[1]
      image.source << row[2]
      image.visibility = Hydra::AccessControls::AccessRight::VISIBILITY_TEXT_VALUE_PUBLIC
      # Attach the image file and run it through the actor stack
      # Try entering Hyrax::CurationConcern.actor on a console to see all of the
      # actors this object will run through.
      image_binary = File.open("#{::Rails.root}/spec/fixtures/images/#{row[0]}")
      uploaded_file = Hyrax::UploadedFile.create(user: @user, file: image_binary)
      attributes_for_actor = { uploaded_files: [uploaded_file.id] }
      env = Hyrax::Actors::Environment.new(image, ::Ability.new(@user), attributes_for_actor)
      Hyrax::CurationConcern.actor.create(env)
      image_binary.close
    end
  end
end

Now run your tests again and they should all pass.

5. Make a rake task and run your importer in development mode

Now that we have an importer, let's actually make it run in our development environment. Make a rake task so we can invoke it easily. Make a file called lib/tasks/simple_import.rake and paste this content into it:

CSV_FILE = "#{::Rails.root}/spec/fixtures/csv_files/three_line_example.csv"

namespace :csv_import do
  desc 'Import the three line sample CSV'
  task :simple_import => [:environment] do |_task|
    SimpleImporter.new(CSV_FILE).import
  end
end

Now invoke the rake task (rake csv_import:simple_import) and go to http://localhost:3000/catalog to see the objects that were created.

Note: You can see the changes we made in this section on github.

For discussion:

  1. What is Hydra::AccessControls::AccessRight::VISIBILITY_TEXT_VALUE_PUBLIC? Why use that instead of just saying "open"? What happens if you enter a different value?
  2. What happens if we add a header row to a future version of our CSV file?
  3. What happens if we change the order of the columns in our CSV file?
  4. What happens if we want to attach more than one file per object?
  5. What do you need to do if you want to add another of the core Hyrax metadata fields to the data?
  6. What is the actor stack? What are some of the things that it does?
  7. Can you identify the parts of an importer we talked about? Where is the:
  • top level kickoff?
  • parser?
  • mapper?
  • record importer?
  • logger?