A few words on rewriting and migrating systems, based on experience. This is because the painful rejuvination of Delicious under Avos is very much at the front of my mind right now — I’m both a user caught in the backwash, and also an industry professional who’s dealt with these things before. While Chad and Steve’s best known experience is building something new (i.e. YouTube) my experience is more with taking first-generation systems and transforming them into next-generation systems (the parallel with Delicious).
There are a few pertinent lessons:
- Roll out slowly
- Let systems co-exist
- Always have a rollback plan
1. Roll out slowly
I’ve written before about the almost-inevitable pain of big bang launches, following Telegraph.co.uk’s update in 2007. Those lessons are true still. The only way to avoid that pain while still having a big bang launch is with a vast amount of additional resources (cost and time), which are rarely worth the value of what you’re gaining.
An approach with more likely success is to roll out piecemeal, replacing one user feature, or one system component, at a time. This is not sexy for engineers, who typically want greenfield projects and new technology, but nor is it sexy working 18 hour days to fix a torrent of bugs under a hail of angry user feedback.
This approach means breaking up the system into discrete components and replacing those. If the system doesn’t have such discrete components then the first job is to create them, going through the code and making logical fences. No this isn’t sexy, but let’s repeat: neither is working 18 hour days.
2. Let systems co-exist
It’s a computer science joke that every known software problem can be solved by adding an extra layer of indirection. However, sometimes that’s also a practical solution. This allows you to build an old and a new system side by side, and funnel specific users/activity/processes/etc to either one. As you build up the capabilities of the new system you can funnel more and more activity to it.
Two architecture cast studies:
In the first (Architecture 1, above) we needed to replace a proprietary application server with one over which we had much more control. Critically, the new application server needed to be able to execute the same scripts, of which there were thousands.
Our first-release replacement app server had only the most basic capability, and so could only execute a tiny fraction of the scripts. So we put in a layer of indirection which filtered requests to either the old or the new server. As we enhanced the capability of the of new application server we filtered more and more requests to it, until we were finally able to switch off the old one.
In the second (Architecture 2) we needed to migrate our user data to a new, more flexible database (MongoDB), despite some of the application code being rather old and tied closely to the legacy database. The solution was to introduce an internal API, ensure all code used that API instead of direct database access, and then add the new database under that. The two systems co-existed for a long time as their capabilities were brought together and tested for consistency.
3. Always have a rollback plan
Once we have a step-by-step approach to moving from one system to another we have to recognise that some steps may not work flawlessly. In this case we have to be able to switch back, because fixing a problem in situ may take too long. This is something that can be built into the indirection layer’s config system, or into the release mechanism (allowing easy rereleaes of the previous version), or be switchable in the application’s own config (see Flickr’s approach as one example).
It’s still difficult
None of this is easy, of course, but it’s the kind of thing were the effort is in the up-front planning and the initial (possibly time-consuming) prep to tidy up the old system, all for the sake of a smoother rollout. Launches are exciting, but they can be exciting in the intended way at the same time as going boringly to plan.
Meanwhile the folks at Delicious are slowly restoring many lost features, and I wish them well, because we’ll all win if they do.