Friday, 7 February 2014

Using Strings as ID fields in Rails

Scenario

Rails defaults to using Integers for ID's which works fine when you only working with your own internal data. Sometimes you need to periodically import external data that has a Primary Key (PK) of a string. Lets talk about the options we have.

Example

Let's say we import daily company and contacts from an external API. And we have tables and relationships of:

We record the string external company and contact ID into company_ext_id and contact_ext_id respectively.  Each time we import companies & contacts we need to check the existing keys exist to get the Rails ID for the object.


This gets worse when you are importing contacts that you will need to match the imported company_id (which will probably be a foreign key to companies table).

Now if we used the external tables string ID's, we can then use standard ActiveRecord:-


This is far easier and doesn't require a lookup of Company to get the companies integer ID.

SO, how do you get rails to use strings for IDs. Here is an example migration:-


Things to note here are:-

  • id: false       # in the create_table line
  • t.string :id   # the id needs to be defined now
  • t.string :company_id  # relationship needs to be a string as well  
  • We need to manually add the PK
The allows the standard ActiveRecord belongs_to to work just like normal, without any extra options.


The Issue

When we set a PK in the migration, it has no effect on the development or production environments (as they build from the migrations), although it does effect the test environment (which builds from the schema.rb). The schema.rb generated by rails will see the id field an see it is a PK and then automatically create the below in schema.rb


Rails convention says if its an ID and a PK that its an integer. Note, it does not have the id field listed and the ":force => true" is defaulting the ID to an integer.
So when you:

rake db:test:prepare

It uses the schema.rb and this will make the test db have an integer for an ID causing your tests to fail (if you have any).

To avoid this we need use a feature that Rails supports that allows the schema dump to be sql statements instead of ruby. http://guides.rubyonrails.org/migrations.html#types-of-schema-dumps. To do this add the following line to your config/application.rb

config.active_record.schema_format = :sql

This will now ignore the schema.rb (and no longer write changes to it) and will now produce a db/structure.sql

Now when you load your test DB structure it will correctly build the ID's PK's with strings.

Another option, is to not use a PK (just add a unique index) in your migration and schema.rb will generate the correct format, this seems to be a little bit hacky...

Conclusions

Most people believe that using a string for a PK reduces performance, I have discussed this with a few DBA's and they say it has none to very little effect.
I have used both methods and found recording the external ID and constantly relating very painful and slow. Using strings for ID's takes a little more initial setup although after that it makes no difference in your general development.

No comments:

Post a Comment