Copyright © 2010 The G String. All Rights Reserved. Snowblind by Themes by bavotasan.com. Powered by WordPress.
Archive for January, 2010
I’m currently working on a project to develop a content management system for a leading publisher, that generates content for a DVD. The system, developed in Ruby on Rails, naturally has a significant amount of background processing to do, with various assets respresented with large, binary files, often requiring server-side processing. For instance, we have a need to re-size a file that contains the scanned image of a page in a book, and cut it up into individual tiles for use in a Google-maps-style map.
Previously, I had been using backgroundrb for background tasks in ruby quite happily in projects. I was wooed, however, by the design of the workling project, which works as an adapter between several background schedulers, including Starling. The main attraction was the ability to defer the decision of which background scheduler to use until the last possible moment (an important part of agile development), and to switch between different schedulers in different environments. Now, I use Spawn in development and Starling Background job in production. Best of all, I can use the NotRemoteRunner in test mode, which means I can run my tests synchronously, waiting for the results of execution, without changing any code or running a seperate background process.
A good theory- but like all things in life using working had its share of problems. First of all, due to an issue with memcache and rails 2.3, a customised configuration of workling is required, and not well covered by the documentation. Place these snippets in your environment file of choice.
For Spawn runner:
config.after_initialize do
require ‘memcache’
Workling::Remote::Runners::SpawnRunner.options = { :method => :spawn }
Workling::Remote.dispatcher = Workling::Remote::Runners::SpawnRunner.new
end
For NotRemoteRunnner (good for testing!)
config.after_initialize do
require ‘memcache’
Workling::Remote.dispatcher = Workling::Remote::Runners::NotRemoteRunner.new
end
For Starling
config.after_initialize do
Workling::Remote.dispatcher = Workling::Remote::Runners::StarlingRunner.new
end
But the real gotcha is when it comes to dealing with Files, or any IO object in Ruby, asynchronously. Because workling makes no assumptions about which background processor you are using, it will call on the Ruby Marshal class to send your objects to your worker class: which in the case of Starling could be a server on a completely different machine. In my case, handling image files with paperclip required sending a file over the wire, but the documentation is scant on this presumably common use case. The solution is to convert your File handle into a binary String, which can be sent over the wire.
file = File.open(“path_to_file”, “rb”)
file_data = file.read
HardWorker.async_foo(file: => file_data)
This will send the binary data to the HardWorker, which can handle it any way it likes. In my case, I used the RMagick library to read it in from a blob:
image_file = Magick::ImageList.new
image_file.from_blob(file_data)
Be warned! The Starling system is built upon memcache, which by design is hard-coded to limit its memory buckets to 1MB each. This has the effect of limiting any marshalled data to a workling worker using Starling to 1MB, which is going to be a problem if you’re dealing with hi-res images or video. In the end, I used background job for my production image processing needs.
The main thing I learnt from background processing though is that synchronously has a ‘h’ in it. I had to re-factor about five classes :/
Continue Reading »