Decoupling Ruby with RabbitMQ at SF.rb 30 Apr 2016

On Tuesday I spoke at the SF.rb meetup at InstaCart HQ in San Francisco. Here are the slides that I presented from:

Other great talks were from Pam Nangan about the technology gap faced by nonprofit organizations, and by Lillie Chilen on the benefits of having a well-run internship program (and stories of good and bad experiences leading to a well-run program). This is a fantastic meetup, with a focus on diversity of speaker backgrounds and experience. Thank you so much for having me talk! ...

Sharding into big integers 24 Feb 2016

One day you wake up, you grab your laptop, you open your email, you see a strange alert, and you open up your exception tracking service. There you see the following text.

ActiveRecord::StatementInvalid: PG::NumericValueOutOfRange: ERROR: integer out of range

You then close your laptop and climb back into bed. If anyone asks later, you never saw that message. What even is computer?

By default when Rails migrations create id columns in Postgres, it uses the serial primary field native type, an integer. When reference columns are specified, Rails uses an integer. The range of numbers covered by the integer numeric type in Postgres is:

-2147483648 to +2147483647

The range of numbers for bigints is:

-9223372036854775808 to 9223372036854775807

For most datasets, integers are more than enough for the id space of most tables. In the lifetime of a company, the upper bound of integers might never be reached in any database. Unfortunately (or fortunately), we have run into this problem several times now. The first was caused by a bug, where data synchronized between two of our applications unintentionally deleted and recreated the data on a daily basis. A moderate-sized dataset of a few hundred million rows had overflowed the integer space of its ids. Whoops!

This turned out to be a very insidious and unexpected bug, which was particularly interesting to me because of where we have not run into integer overflows in the past. We have services with more than three billion records in a single table space. In those applications we have sharded the data heavily, using thousands of Postgres schemas as logical shards across a set of database zones. Because of the nature of that data, however, we were able to generate unique identifiers for each row based on the data in the row, in the form of base62 encoded strings. Doing so allowed us to shard data without having to worry about unique identifier generation—as a side effect, it completely obviated the possibility of integer overflow errors.

The second case where we have had to migrate from integers to bigints was another scalability project. In order to scale writes to one of our internal Rails applications, we decided to split the entire database into multiple shards. The upside was that this was quite easy to do using the Multidb gem; Multidb does not natively do everything we need, but the code is extremely simple, readable, and the missing bits were easy for us to fill in ourselves. The downside was that in order to do this sharding, we needed to generate unique ids across shards. ...

Decoupling Distributed Ruby applications with RabbitMQ 16 Feb 2016

A lot of people might be surprised to hear how long it took us to stand up our message bus infrastructure at Wanelo. One of the downsides of focusing on iteration is that fundamental architecture changes can seem extremely intimidating. Embracing iteration means embracing the philosophy that features should be developed with small changes released in small deployments. The idea of spinning up unfamiliar technology, with its necessary Chef code, changing our applications to communicate in a new way, then production-izing the deployment through all the attendant failures and misunderstandings seems... dangerous.

Fortunately "Danger" is our middle name at Wanelo. (Actually, our middle name is "Action." Wanelo Action Jackson). Having now developed using a messaging infrastructure for almost a year and a half, I would no longer develop applications in any other way. ...

Code for replication delay 11 Jun 2014

After some recent changes to autovacuum settings on our main PostgreSQL databases, we’ve encountered regular significant replication delay on our four streaming replicas. Why this is happening is an interesting subject for another blog post, but it reminded me of some assumptions built into our codebase, as well as some interesting complications of API design. ...

One SMF to Rule Them All 27 Nov 2013

Over the past few years I've become a big fan of SmartOS, a distribution of Illumos built specifically with cloud IaaS in mind. One of the features it inherits from Solaris is the Service Management Facility, SMF. For various reasons, SMF is my service management framework of choice—one reason we prefer deployment on SmartOS at Wanelo is the feature-richness and stability of daemon management under SMF.

One complaint that we had was the management of service families, however. Recently, we started using service dependencies to help manage groups of services, particularly in emergency situations when we need to stop or restart many services at once. ...

Threads in a GIL 25 Nov 2013

MRI Ruby has a global interpreter lock (GIL), meaning that even when writing multi-threaded Ruby only a single thread can be on-CPU at any given point in time. Other distributions of Ruby have done away with the GIL, but even in MRI threads can be useful. The Sidekiq background worker gem takes advantage of this, running multiple workers in separate threads within a single process.

If the workload of a job blocks on I/O, Ruby can context-switch to other threads and do other work until the I/O finishes. This could happen when the workload reaches out to an external API, shells out to another command, or is accessing the file system. Depending on how a job is written, writes to external storage like an NFS server could block on I/O.

If the workload of a process does not block on I/O, it will not benefit from thread switching under a GIL, as it will be, instead, CPU-bound. In this case, multiple processes will be more efficient, and will be able to take better advantage of multi-core systems.

So… why not skip threads and just deal with processes? A number of reasons. ...

Today we went to the bad place… 24 Nov 2013

You know the Bad Place. It starts with the words “How hard could it be?” and ends with me in the shower, fully clothed under a stream of hot water, crying. You think you’re in for a nice, quick, pleasant plane ride, but suddenly Brian Blessed is dressed like this and waving a pointy stick at you and you can never go home again…

I think this is how Brian Blessed dresses every day

This Bad Place involved the migration from Selenium running via Capybara in Minitest to capybara-webkit running in RSpec. We have various reasons for preferring this at my work, many of them subjective and based on the collected positive experiences of several people working in this codebase. We believe that the end result will make us happier in the long run.

We’re not yet all the way to the other side, but I thought I would share some learnings. Some of them were more obvious than others, but hopefully others can learn from today’s strange coding journey.

  • We will run javascript specs and unit tests in the same spec runner. The database cleaning configuration required to make this work and run quickly is non-obvious.
  • Some of our testing gems have not been updated since the launch of this project. Some gems conflict with newer versions of other testing gems, blocking upgrade.
  • Some of our Selenium tests do not require Javascript execution or CSS rendering in order to pass
  • Some of our Selenium tests depend on stubbing at the server
  • We over-use Thing.any_instance.stub ...

New Job At Wanelo 20 Sep 2012

I have made the move. Starting Monday I'm going to be working at Wanelo, a social media startup working on connecting people with e-commerce. It's an interesting change for me, as two years ago I promised myself I would never work for a social network company. A lot has changed in the last two years, however, and I thought I would take some time to put it down in words (for my own benefit as well as yours, illustrious reader).

For the past two years I have worked at ModCloth, an e-commerce fashion company that works with small-medium sized fashion designers to sell vintage-inspired clothing. In the course of my time there, I worked on various features including Be the Buyer, a program to crowd-source product acquisition and merchandizing. I led a team to re-architect the product categories to be based on search indexes, not only making the process by which new products appear on the site more automated and robust (before, every product was manually added and sorted in every category by hand), but allowing us to implement faceted drill-down on almost every category—customers can easily find in-stock dresses that are their color, size and price range. In the end, I worked on systems automation to build new servers in an automated and quick fashion. Hopefully I had some influence in the long term strategy of organizing code into separate applications and services. ...

New Blog 18 Sep 2012

Well, it's time. While the old blog had some relevant data (at least to me), it's time to shuck off the old and begin with the new. As such, I'm going to rely on the Internet Archive to provide historical content.

The analytics of my previous blog and the random connections it would attract were interesting, but there are a few deciding factors to guide this along:

  • I have not updated it in about 2 1/2 years
  • Most of the content is technologically dated
  • Since last writing, I have switched jobs twice (more on that in a later post)