How To Add To A Database With Activerecord
Data migration is a fragile and sometimes complicated and time-consuming process. Whether you are loading data from a legacy awarding to a new application or you just desire to move data from i database to another, you lot'll most likely demand to create a migration script that will be accurate, efficient, and fast to assist with the process — especially if you are planning to load a huge amount of data.
There are several means you can load data from an erstwhile Track app or other application to Rails. In this commodity, I'll explain a few means to load data to a PostgreSQL database with Rails. We'll become over their pros and cons, so yous can choose the method that works best for your situation.
Postgres is an innovative database. According to a recent study by DB-Engines (PDF), PostgreSQL'due south popularity rating increased by 65 percent from January 2016–January 2019, while the rating of MySQL, SQL Server, and Oracle decreased past 10–sixteen percent during the same menstruation.
PostgreSQL has a strong reputation for treatment large information sets. Nonetheless, with the wrong tools and solutions, its powers can be undermined. So what'due south the fastest fashion to load information to a Postgres database in your Rails app? Let'south wait at four different methods, then we'll see which is the fastest.
-
Inserting one tape at a fourth dimension to load data to your Postgres database
- Pros of single-row inserts with Postgres
- Cons of single-row inserts with Postgres
-
Bulk Inserts with Active Tape Import to load information to your Postgres database
- Pros of Bulk Inserts with Active Record in Ruby on Rails and Postgres
- Cons of Bulk Inserts with Active Record in Ruby on Rails and Postgres
-
Using PostgreSQL Re-create with Activerecord-copy to load information to your Postgres database
- Pros of using PostgreSQL Copy with Activerecord-copy
- Cons of using PostgreSQL Copy with Activerecord-copy
-
4. Using groundwork jobs to load information to your Postgres database
-
Final Thoughts Nearly Loading Large Information Sets into a PostgreSQL Database with Runway
-
Speed comparison of different means to load data into Postgres with Rails
-
Other articles and resources you might like
Inserting ane record at a time to load information to your Postgres database
I easy way to load data to a Postgres database is to loop through the data and insert them ane at a time.
Here's a sample code to do this in Rails, assuming we have the source data in a CSV file:
# lib/tasks/one_record_at_a_time.rake crave 'csv' require "benchmark" namespace :import do desc "imports data from csv to postgresql" task :single_record = > :surroundings practise #This part loops over the content of the csv file and creates a new record for each of them. def insert_user CSV .foreach(filename, headers: true ) practice |row| User .create(row) terminate end puts Benchmark .realtime {insert_user } #Here we are using criterion to measure out the speed end end
But there's a problem with this arroyo. Inserting data 1 at a fourth dimension into a PostgreSQL database is extremely tedious. I ran this Rake job to insert over a one thousand thousand records and measured it with Benchmark. The report came back with a result of over 1.3 hours, that's a long time. There'southward overhead in both the database and the awarding in processing rows one-by-one, and additional latency in waiting for the database circular trip for each row.
We'll run across a better arroyo in the next section, but for at present, here's a summary of the pros and cons of single-row inserts:
Pros of unmarried-row inserts with Postgres
- Doesn't crave an external dependency
Cons of unmarried-row inserts with Postgres
- Very ho-hum
- Might lock your session for a long time
- Non suitable for inserting large information sets
- If one insert fails, yous're stuck with partially loaded data
Bulk Inserts with Agile Record Import to load data to your Postgres database
Running a bulk insert query is a better and faster way to load information into your Postgres database, and the Runway jewel activerecord-import
makes it like shooting fish in a barrel to load massive information in bulk in a way that the Active Record ORM tin sympathize and dispense.
Instead of hitting your database multiple times, processing transactions, and doing all the back and forth with your app and database, the Active Record Import gem allows you to build upward large insert queries and run them at once.
Yous can install the Agile Record Import gem by calculation gem 'activerecord-import'
to your Gemfile
and running bundle install
in your terminal. This gem adds import
to Active Record classes. That means you'll just need to call the import method on your model classes to load the data into your database.
Here is an example:
# lib/tasks/active_record_import.rake crave 'csv' crave "benchmark" namespace :import do desc "imports data from csv to postgresql" users = [ ] task :batch_record = > :environment do CSV .foreach(filename, headers: true ) do |row| users < < row end newusers = users.map do |attrs| User . new (attrs) end time = Benchmark .realtime { User .import(newusers) } puts time end terminate
Find how we're building upwardly the tape in an array—users
—and passing the array to the import method on the User model— User.import(newusers)
.
That's actually all that needs to be done. However, you can choose to laissez passer but some specific columns and the values in an array to the import method if you desire to. For example, User.import columns values
where the columns will exist an assortment like ["first_name", "last_name"]
, while the values will be an assortment like [ ['Peter', 'Joseph'], ['Banabas', 'Bob Jones'] ]
.
I analyzed loading a million records into a Postgres database with Track using this method, and information technology took simply 5.i minutes. Remember the offset method took 1.three hours? This method is ane,529% ( ~15x ) faster. That'south impressive.
Pros of Bulk Inserts with Agile Record in Crimson on Rails and Postgres
- Follows Active Record Associations, meaning Rails ORM is able to do its magic with the loaded data
- Faster to load information into your PostgreSQL database
- Doesn't have per-row overhead
- If insert fails, your transaction will rollback the insert
Cons of Bulk Inserts with Active Record in Red on Rail and Postgres
- The activerecord-import gem might conflict with other gems that add
.import
method to the Active Record model. Notwithstanding, in cases where this might happen, you lot can utilize the.bulk_import
method besides attached to your model classes as an alternative.
See how batch import improved our speed by over 1,529%? That was incredible, right? There is withal a faster way to load data to a Postgres database.
Using PostgreSQL Re-create with Activerecord-re-create to load data to your Postgres database
COPY is the fastest way to load data to a PostgreSQL database; it uses the combined power of a majority insert and avoids some of the overhead of repeatedly parsing and planning an INSERT
.
The gem activerecord-copy provides an easy-to-utilise interface for implementing Copy in your Rails app. You'll need to add together the line gem 'activerecord-import'
to your Gemfile and run bundle install
in your terminal to install the gem and go ready to use it.
Here is a sample Rake task showing how yous can utilise information technology:
# lib/tasks/active_record_copy.rake require 'csv' require "benchmark" namespace :re-create exercise desc "imports data from csv to postgresql" task :data = > :environs do def insert_user users = [ ] CSV .foreach(filename, headers: truthful ) exercise |row| users < < row end time = Time .now.getutc User .copy_from_client [ :first_name , :last_name , :electronic mail , :created_at , :updated_at ] practise |re-create| users. each do |d| copy < < [d[ :first_name ] , d[ :last_name ] , d[ :electronic mail ] ,time, fourth dimension ] end stop end puts Criterion .realtime {insert_user} end finish
The activerecord-copy jewel adds a copy_from_client
method to all your model classes, as shown in the snippet to a higher place (yous'll accept to ascertain the columns and their values as shown).
Annotation that when you lot use the activerecord-copy gem, the time stamp is not created for y'all automatically. Y'all'll have to create this yourself. Yous'll also notice where I created the fourth dimension postage time = Fourth dimension.at present.getutc
; that's because Rails will not create time stamps for you lot automatically with Re-create.
Pros of using PostgreSQL Copy with Activerecord-re-create
- Doesn't have per-row overhead
- If insert fails, your transaction will rollback the insert
- Super fast
Cons of using PostgreSQL Re-create with Activerecord-copy
- Manually set time stamps (created_at, updated_at, etc.)
I analyzed the activerecord-re-create
functioning with a transaction of over one million records, as I did for other methods, and the speed is about ane.5 minutes. Insanely fast compared to the other methods we've seen in this article.
4. Using groundwork jobs to load information to your Postgres database
If you oftentimes load new data to your database, 1 neat way to improve your app'southward operation is to run your information loading using a background job. There are several tools that brand this possible, for case, Rails' delayed_job gem, sidekiq, and resque.
Still, just like Active Tape, Runway uses Active Jobs to allow yous to use any of these supported adapters inside your Rails app without bothering almost job-specific implementation. So yous could set up a script for Agile Record and run the script in a background job using Active Jobs and the delayed_job adapter. That mode, you lot'll exist running your data loading in the background.
Let's walk through how to set up your Active Task to run your background process:
- Since you're going to use the delayed_job adapter, install the delayed_job_active_record precious stone.
- Add
gem 'delayed_job_active_record'
to your Gemfile. - Run
bundle install
on your terminal/command line. - Run the post-obit command to create a delayed task migration for the delayed jobs tabular array:
rail g delayed_job:active_record rake db:drift
- Generate an Active Chore by running the following control:
rails generate job import_data
- Open up the file created in your
app/jobs
directory—app/jobs/import_data_job.rb
—and add your information loading code:
# app/jobs/import_data_job.rb form ImportDataJob < ApplicationJob queue_as :default def perform ( *args) # Write your lawmaking hither to load records to the database. Y'all tin use any of the fast methods we've discussed. end end
- In order for Rails to be aware of the Active Job adapter you lot want to utilise, yous need to add the adapter to your config file. Simply add together this line:
config.active_job.queue_adapter = :delayed_job_active_record
.
# config/awarding.rb module YourApp grade Application < Rails : : Application # Be sure to have the adapter's gem in your Gemfile # and follow the adapter'southward specific installation # and deployment instructions. config.active_job.queue_adapter = :delayed_job_active_record finish end
Depending on how often you want the job to run, you can gear up the job to be enqueued at a specific time or immediately, following the instructions in the Active Jobs documentation.
Ane fashion yous tin can do this is to allow the job to run asynchronously. Create a Rake chore, add together ImportDataJob.perform_later
to the task, and run it. Example:
namespace :active_jobs do desc "imports data from sql to postgresql" job :import = > :environment practice ImportDataJob .perform_later end finish
In one case this is done, yous tin at present run the task rake active_jobs:import
on your final.
Final Thoughts About Loading Large Data Sets into a PostgreSQL Database with Rails
When considering how to optimize your database performance, it's best to outset figure out the optimization options the database has already provided. Equally you may have noticed, most of the tools and techniques in this article leverage the hidden power of the PostgreSQL database. Sometimes, it might merely be your implementation slowing down your database performance.
Speed comparison of dissimilar ways to load information into Postgres with Rails
Here's a tabular array summarizing the various speeds of the methods discussed in this article.
Method | Speed | Corporeality of records |
---|---|---|
One record at a time insert | 1.3 hours | i,000,000 |
Majority inserts with Activerecord Import | five.1 minutes | 1,000,000 |
PostgreSQL Copy with Activerecord-copy | 1.v minutes | 1,000,000 |
Using Background Jobs | < one sec (perceived) | one,000,000 |
You've learned that if you're loading a huge amount of data into your PostgreSQL database, one insert at a time is slow and shouldn't even exist considered. For ultimate operation, you want to utilise COPY. Of course, you've as well seen the caveats of each method, and you should weigh all the pros and cons before making your final conclusion.
Share this article: If you lot liked this article we'd appreciate it if you'd tweet it to your peers.
Other articles and resources you might like
Using Postgres Row-Level Security in Cerise on Runway
Creating Custom Postgres Data Types in Runway
Efficient Search in Rails with Postgres (PDF eBook)
PostGIS vs. Geocoder in Runway
Avant-garde Active Record: Using Subqueries in Runway
Full Text Search in Milliseconds with Rail and PostgreSQL
Effectively Using Materialized Views in Ruby on Rails
Similarity in Postgres and Runway using Trigrams
Efficient GraphQL queries in Ruby on Rails & Postgres
How To Add To A Database With Activerecord,
Source: https://pganalyze.com/blog/fastest-way-importing-data-into-postgres-with-ruby-rails
Posted by: parkerthavercuris.blogspot.com
0 Response to "How To Add To A Database With Activerecord"
Post a Comment