Gem Packaging: Best Practices

Posted by Josh Peek, September 1, 2009 @ 7:12 pm

Understand Ruby’s Load Path

When you call load or require a new file, Ruby searches through the files in its load path. This allows you to require files relative to the load path without specifying the files full system path.

The initial load path contains paths for Ruby’s standard library. There are three aliases that point to Rubys global load path array: $:, $-I, $LOAD_PATH. You can append or prepend you own libraries to this list. The load path can also be modified from the command line with the -I flag.

Here is the initial load path on my Mac.

  => ["/Users/josh/.rip/active/lib",

There are a few far-too-common mistakes people make with load paths.

Respect the global load path

When you package up your new rubygem to share with the world, you need to be careful with the files you place directly in lib/. Rubygems (and almost all other ruby package mangers) will add your gem’s lib/ to the load path. This means any file placed in the top level of lib/ will be directly requirable by anyone using the gem.

Bad gem example:
  `-- lib
      |-- foo
      |   `-- cgi.rb
      |-- foo.rb
      |-- erb.rb
      `-- set.rb

It may seem harmless to call files whatever you’d like in your package because you are “namespaced” in your own package. But if lib/ is prepended to $LOAD_PATH it will clobber Ruby’s built in erb and set libs. require 'erb' would no longer require Ruby’s builtin erb library, but this package’s version of it.

The safe (and correct) way would be to namespace your files under another directory. Its conventional to create a folder within lib with the same name as your gem. Then we would put all our dependency files under lib/foo/ instead of at lib/ root.

This is sort of a gray area. There is no strict rule that you must put all your files under a folder with your package name. It is okay to have multiple files at your root lib directory as long as you intend for people to require them separately. Namespace internal dependency files that you don’t expect for people to require directly.

Requiring other files relative to each other
  require File.join(File.dirname(__FILE__), "foo", "bar")
  # or
  require File.expand_path(File.join(File.dirname(__FILE__), "foo", "bar"))

If you’re using File.dirname(__FILE__) with require, you’re doing something wrong.

The fix is simple, require files relative to the load path.

  require "foo/bar" 

Its interesting that the 3 previous require examples are totally different. Ruby is only able to track which files it has required by the exact path you gave it. The first is relative your current directory (”./lib/foo/bar”), the second is the full expanded system path (”/usr/local/lib/ruby/gems/foo/lib/foo/bar”), and the third is relative to the load path (”foo/bar”). require treats each as a different file, so it may end up loading the file multiple times.

Depending on files outside the load path

This is a more severe case of the previous example.

  module Rack
    module Test
      VERSION =, "..", "..", "VERSION")).strip
      # ...

Your gem’s folders may be separated and reorganized on install. If someone wants to “vendor” your library, they should only have to copy everything under lib/. Everything outside lib/ is not important for running the code. Never expect your lib or test folders to be one level up. A minimalist installer, such as rip, will only install your bin and lib directories. Any file your package needs to access should be under lib and properly namespaced in a folder to avoid collisions. If you try install this version of rack-test with rip, require 'rack/test' will fail because ../VERSION doesn’t exist.

Libs don’t need to mange $LOAD_PATH

Its not the package’s responsibility to setup and manage the load path. Instead rely on the package manger to set it up for. When rubygems activates a gem, it adds your package’s lib folder to the $LOAD_PATH ready to be required normally by another lib or application. Its safe to assume you can relative require any file in your lib folder.

  unless $LOAD_PATH.include?(File.expand_path(File.dirname(__FILE__)))

It should be safe to remove code like that.

TIP: Setup your test runner to configure your paths for local development

If your trying to develop your lib locally, dealing with all this load path stuff seems like a pain in the ass. This is where rake tests come in handy.

Rake’s test task will automatically push lib into your path when your test unit tests. So you don’t need File.join(File.dirname(FILE), ”..”, “lib”, “foo”) anywhere in your tests. You may want to consider adding the test directory to your path if you have “test_helper” that you need to require. do |t|
    t.libs << 'test'

Unfortunately Rspec doesn’t even add lib to your path for you. You can fix this with: do |t|
    t.libs << 'lib'

If you want to run a single test, you can add lib to the $LOAD_PATH with a command line flag. (This started a long debate when the change was made to Rails)

  ruby -Ilib test/test_foo.rb
Provide a VERSION constant

If you release the Awesome gem, provide Awesome::VERSION. When using Rubygems, it’s possible to ask Rubygems for the version of the gem you’re using, but when using an alternate package manager, the only way to find out what version is loaded is by introspection into the Ruby code itself.

Don’t depend on rubygems

When I use your library, deploy your app, or run your tests I may not want to use rubygems. When you “require ‘rubygems’” in your code, you remove my ability to make that decision. I cannot unrequire rubygems, but you can not require it in the first place. – Ryan Tomayko

Its safe to remove require "rubygems" from your lib since code loading it probably did this already. It makes it harder (but not impossible) for people to use alternative ways of setting up the load path because there is no way to “unrequire” rubygems. There are many other package management solutions out there, like rip, bundler, or managing $LOAD_PATH by hand.

Avoid declaring rubygem dependencies in lib/. This means removing gem "foo". This creates a hard dependency on Rubygems when you should be specifying you gem dependencies in your gemspec. Rubygems already hooks into require and will automatically resolve these dependencies at runtime. A simple require will trigger the lookup via rubygems, or will just require it if it’s been added to the load path by another system. Moving dependency specification outside the lib is more flexible for other package managers can do so at install time.

In addition to removing gem, do not wrap your load checks with rescue Gem::LoadError or rescue Gem::Exception. If you need to gracefully skip over load errors, Gem::LoadError inherits from LoadError so replacing it with rescue LoadError will work.

  # Bad
    gem "rack" 
    require "rack" 
  rescue Gem::LoadError
    puts "Could not load 'rack'" 

  # Good
    require "rack" 
  rescue LoadError
    puts "Could not load 'rack'" 

Why should I care?

I wasn’t aware of any of these issues until Ryan wrote up his post on why requiring rubygems is wrong at the beginning of 2009. And I feel many other rubyists have just overlooked these issues since rubygems has been our only solution. The foundation of Ruby library management is built on the load path system and its important for every ruby gem author to understand how it works.

We’re working on it

Fixing every ruby library is easier said than done (There are 12,000+ hosted on RubyForge at the time of posting). Rails is currently in violation of a few of these rules. But we’re working hard to fix them. We also need to fix all the gems that we depend on. Ideally, we’d like Rails 3 to boot w/o rubygems and allow you use to whatever package management strategy you’d like.

Further reading

Rubygems Good Practice « Katz Got Your Tongue?
Why require ‘rubygems’ In Your Library/App/Tests Is Wrong