Gogalixir 20 Feb 2024

For a long time I've wanted there to be a single-track conference in the Bay Area focused on Elixir and Erlang, in the vein of gogaruco. I love Code BEAM (and am attending this year!), but there's something to be said for smaller, locally-organized conferences where you get to meet people in your regional programming community—where you don't have to decide between conference talks, but have a single shared experience among all attendees.

My friend and business partner Erik Hanson and I have been talking about this conference for years. We were excited enough about it that we even bought a domain. What we didn't have was any experience in planning or organizing a large event.

This winter, we decided that we could solve the problem of not knowing what we're doing by starting smaller. Thus was the Golden Gate Elixir Meetup born.

read more…

Migrating Databases with Ecto 25 Nov 2022

On a new project I'm working to migrate data between Azure SQL (aka MSSQL) and PostgreSQL, and to migrate a business's software from Visual Basic and C# to Elixir. In the first few weeks of development I've discovered some features of Ecto that I was previously unaware of. Combining these features will allow us to ship a resilient, well-tested application while constraining the possibility of accidentally altering data in production.

defmodule Test.Integration.CrossRepoTransactions do
  use Test.DataCase, async: true

  test "sharing database connections" do
    {:ok, alice} =
        Test.Fixtures.azure_person(:alice)
        |> Azure.Person.changeset()
        |> Azure.WriteableRepo.insert!()

    assert {:ok, _} = Azure.ReadOnlyRepo.get(Azure.Person, alice.id)
  end
end
read more…

ExUnit Patterns for Ease of Mind 12 Sep 2022

In recent months I have been working on a number of projects with my primary collaborator of recent years, Erik Hanson. Some of those projects have led to a number of open source Elixir libraries. Others may be eventually be open sourced, but are currently private.

In this post, I'd like to share a set of patterns that work together to ease test setup and organization. The examples shown will be for ExUnit and Phoenix, but could be adapted to other languages and frameworks.

defmodule Web.ProfileLiveTest do
  use Test.ConnCase, async: true

  @tag page: :alice, profile: [:alice]
  test "shows profile info", %{pages: %{alice: page}, profiles: %{alice: alice}} do
    page
    |> Test.Pages.ProfilePage.visit()
    |> Test.Pages.ProfilePage.assert_here()
    |> Test.Pages.ProfilePage.assert_profile_info(alice.name, alice.email)
  end
end
read more…

CodeBEAM V America 2021 21 Jun 2021

Back in March I gave a talk at CodeBEAM V America about some of the work that I did at Geometer in 2020. I started work at Geometer the same week that the SF Bay Area began its lockdown for Covid-19—while the purpose of Geometer was to be a startup incubator, on my first day there was an all-hands where Rob, the founder, said that for the forseeable future we would devote our efforts to pandemic relief.

While working on these projects, I saw parts of the US health system that I knew existed, but had no idea just how discouraging they would be. I also met (virtually) health care workers and departments of health officials who were throwing themselves into work to help save lives, sometimes in spite of the technology that should have been solving their problems, but instead caused other greater problems.

TLDR;

  • Before deploying a Broadway pipeline to production, really try to understand GenStage.
  • AWS promotes Lambda as a general purpose data processing tool for high scale workloads. I found Lambda to be incredibly difficult to monitor or debug, with quirks in the runtime that were only testable through trial and error.
  • Broadway/Oban/Flow could easily handle much greater scale than we were solving for, in a resilient runtime that was much easier to inspect and debug.
  • Be thoughtful about naming.
    • I started by grouping data pipelines into high-level domains specific to ETL. Pipelines pulling data into our system were grouped into Extract. Pipelines putting data into an external system were grouped into Load. While technically true, this was the opposite of what new teammates expected when seeing the word “load.”
    • A more clear vocabulary might have been that used by Membrane Framework, ie source, filter, and sink.
read more…

Freshening Up 17 Jun 2021

It being 2021, I have decided to air out the website and tune it up. With a working development and deployment pipeline, my intention is to start doing regular updates.

  • Site updates
  • What's been going on?
  • Elixir
read more…

Decoupling Ruby with RabbitMQ at SF.rb 30 Apr 2016

On Tuesday I spoke at the SF.rb meetup at InstaCart HQ in San Francisco. Here are the slides that I presented from:

Other great talks were from Pam Nangan about the technology gap faced by nonprofit organizations, and by Lillie Chilen on the benefits of having a well-run internship program (and stories of good and bad experiences leading to a well-run program). This is a fantastic meetup, with a focus on diversity of speaker backgrounds and experience. Thank you so much for having me talk!

read more…

Sharding into big integers 24 Feb 2016

One day you wake up, you grab your laptop, you open your email, you see a strange alert, and you open up your exception tracking service. There you see the following text.

ActiveRecord::StatementInvalid: PG::NumericValueOutOfRange:
ERROR: integer out of range

You then close your laptop and climb back into bed. If anyone asks later, you never saw that message. What even is computer?

By default when Rails migrations create id columns in Postgres, it uses the serial primary field native type, an integer. When reference columns are specified, Rails uses an integer. The range of numbers covered by the integer numeric type in Postgres is:

-2147483648 to +2147483647

The range of numbers for bigints is:

-9223372036854775808 to 9223372036854775807

For most datasets, integers are more than enough for the id space of most tables. In the lifetime of a company, the upper bound of integers might never be reached in any database. Unfortunately (or fortunately), we have run into this problem several times now. The first was caused by a bug, where data synchronized between two of our applications unintentionally deleted and recreated the data on a daily basis. A moderate-sized dataset of a few hundred million rows had overflowed the integer space of its ids. Whoops!

This turned out to be a very insidious and unexpected bug, which was particularly interesting to me because of where we have not run into integer overflows in the past. We have services with more than three billion records in a single table space. In those applications we have sharded the data heavily, using thousands of Postgres schemas as logical shards across a set of database zones. Because of the nature of that data, however, we were able to generate unique identifiers for each row based on the data in the row, in the form of base62 encoded strings. Doing so allowed us to shard data without having to worry about unique identifier generation—as a side effect, it completely obviated the possibility of integer overflow errors.

The second case where we have had to migrate from integers to bigints was another scalability project. In order to scale writes to one of our internal Rails applications, we decided to split the entire database into multiple shards. The upside was that this was quite easy to do using the Multidb gem; Multidb does not natively do everything we need, but the code is extremely simple, readable, and the missing bits were easy for us to fill in ourselves. The downside was that in order to do this sharding, we needed to generate unique ids across shards.

read more…

Decoupling Distributed Ruby applications with RabbitMQ 16 Feb 2016

Note that this is a repost of an article written for the Wanelo blog. Take a look there for other great content.

A lot of people might be surprised to hear how long it took us to stand up our message bus infrastructure at Wanelo. One of the downsides of focusing on iteration is that fundamental architecture changes can seem extremely intimidating. Embracing iteration means embracing the philosophy that features should be developed with small changes released in small deployments. The idea of spinning up unfamiliar technology, with its necessary Chef code, changing our applications to communicate in a new way, then production-izing the deployment through all the attendant failures and misunderstandings seems... dangerous.

Fortunately "Danger" is our middle name at Wanelo. (Actually, our middle name is "Action." Wanelo Action Jackson). Having now developed using a messaging infrastructure for almost a year and a half, I would no longer develop applications in any other way.

read more…

Code for replication delay 11 Jun 2014

Note that this is a repost of an article written for the Wanelo blog. Take a look there for other great content.

After some recent changes to autovacuum settings on our main PostgreSQL databases, we’ve encountered regular significant replication delay on our four streaming replicas. Why this is happening is an interesting subject for another blog post, but it reminded me of some assumptions built into our codebase, as well as some interesting complications of API design.

read more…

One SMF to Rule Them All 27 Nov 2013

Over the past few years I've become a big fan of SmartOS, a distribution of Illumos built specifically with cloud IaaS in mind. One of the features it inherits from Solaris is the Service Management Facility, SMF. For various reasons, SMF is my service management framework of choice—one reason we prefer deployment on SmartOS at Wanelo is the feature-richness and stability of daemon management under SMF.

One complaint that we had was the management of service families, however. Recently, we started using service dependencies to help manage groups of services, particularly in emergency situations when we need to stop or restart many services at once.

read more…

Threads in a GIL 25 Nov 2013

MRI Ruby has a global interpreter lock (GIL), meaning that even when writing multi-threaded Ruby only a single thread can be on-CPU at any given point in time. Other distributions of Ruby have done away with the GIL, but even in MRI threads can be useful. The Sidekiq background worker gem takes advantage of this, running multiple workers in separate threads within a single process.

If the workload of a job blocks on I/O, Ruby can context-switch to other threads and do other work until the I/O finishes. This could happen when the workload reaches out to an external API, shells out to another command, or is accessing the file system. Depending on how a job is written, writes to external storage like an NFS server could block on I/O.

If the workload of a process does not block on I/O, it will not benefit from thread switching under a GIL, as it will be, instead, CPU-bound. In this case, multiple processes will be more efficient, and will be able to take better advantage of multi-core systems.

So… why not skip threads and just deal with processes? A number of reasons.

read more…

Today we went to the bad place… 24 Nov 2013

You know the Bad Place. It starts with the words “How hard could it be?” and ends with me in the shower, fully clothed under a stream of hot water, crying. You think you’re in for a nice, quick, pleasant plane ride, but suddenly Brian Blessed is dressed like this and waving a pointy stick at you and you can never go home again…

I think this is how Brian Blessed dresses every day

This Bad Place involved the migration from Selenium running via Capybara in Minitest to capybara-webkit running in RSpec. We have various reasons for preferring this at my work, many of them subjective and based on the collected positive experiences of several people working in this codebase. We believe that the end result will make us happier in the long run.

We’re not yet all the way to the other side, but I thought I would share some learnings. Some of them were more obvious than others, but hopefully others can learn from today’s strange coding journey.

  • We will run javascript specs and unit tests in the same spec runner. The database cleaning configuration required to make this work and run quickly is non-obvious.
  • Some of our testing gems have not been updated since the launch of this project. Some gems conflict with newer versions of other testing gems, blocking upgrade.
  • Some of our Selenium tests do not require Javascript execution or CSS rendering in order to pass
  • Some of our Selenium tests depend on stubbing at the server
  • We over-use Thing.any_instance.stub
read more…

New Job at Wanelo 20 Sep 2012

I have made the move. Starting Monday I'm going to be working at Wanelo, a social media startup working on connecting people with e-commerce. It's an interesting change for me, as two years ago I promised myself I would never work for a social network company. A lot has changed in the last two years, however, and I thought I would take some time to put it down in words (for my own benefit as well as yours, illustrious reader).

For the past two years I have worked at ModCloth, an e-commerce fashion company that works with small-medium sized fashion designers to sell vintage-inspired clothing. In the course of my time there, I worked on various features including Be the Buyer, a program to crowd-source product acquisition and merchandizing. I led a team to re-architect the product categories to be based on search indexes, not only making the process by which new products appear on the site more automated and robust (before, every product was manually added and sorted in every category by hand), but allowing us to implement faceted drill-down on almost every category—customers can easily find in-stock dresses that are their color, size and price range. In the end, I worked on systems automation to build new servers in an automated and quick fashion. Hopefully I had some influence in the long term strategy of organizing code into separate applications and services.

read more…

New Blog 18 Sep 2012

Well, it's time. While the old blog had some relevant data (at least to me), it's time to shuck off the old and begin with the new. As such, I'm going to rely on the Internet Archive to provide historical content.

The analytics of my previous blog and the random connections it would attract were interesting, but there are a few deciding factors to guide this along:

  • I have not updated it in about 2 1/2 years
  • Most of the content is technologically dated
  • Since last writing, I have switched jobs twice (more on that in a later post)