Bat Rotation

Motivation

We aim to manage issues in production without disrupting the heads-down time necessary for solving hard problems. The “Bat rotation” solves this by assigning a single rotating engineer, “the Bat”, to interruption-heavy work (bugs, support requests, etc.)

Responsibilities

The Bat’s responsibilities are listed below. While focusing on these responsibilities, the Bat is expected to put their regular work (i.e. in-flight feature work, etc.) on hold.

1. Exception Triage

The Bat is responsible for the identification, triage, and resolution of exceptions raised by the system. We track exceptions using Honeybadger. The Bat’s goal is to keep the number of unresolved exceptions in Honeybadger at zero.

Process

For any new exception…

  1. The Bat investigates immediately
  2. Assignment
    1. If someone on the team recently made a change which caused the exception, the Bat assigns that person
    2. If there is no obvious owner to an exception, the Bat assigns themself
  3. The assigned person addresses the exception in one of two ways:
    1. Fixes the issue (most cases)
    2. Determines the issue is not worth fixing immediately (rare) and adds a ticket to the Batwork epic in Shortcut.
  4. Unassign and resolve the exception

2. #user_support Requests

Wunder’s internal users request support for a variety of systems in the #user_support Slack channel.

Requests in this channel are monitored and triaged by the product team. If an issue requires support from engineering, the product team will notify the Bat.

3. Monitoring One-off Alerts

Ideally, all important alerts would be centralized in Honeybadger. However, the Bat must also monitor some non-standard alerts:

  • Various automated alerts in #bat-errors (e.g. invariant warnings)
  • Heroku maintenance emails (e.g. required Postgres maintenance). The bat is responsible for creating Shortcut tickets with a due date, using the Shortcut maintenance ticket template. The bat is expected to reply to the Heroku maintenance email alerting the team that a ticket has been created and is being tracked.

4. “Batwork” Backlog

When not handling urgent issues, the Bat’s goal is to make the Bat rotation better going forward. To do this the Bat should address long-term fixes for issues that arose that week or work on tickets in the Batwork epic in Shortcut.


Appendix

Expected Working Hours

The Bat is expected to be available during normal business hours (~8-6 MT Mon-Fri). The Bat’s responsibility does not extend to nights or weekends except in very rare circumstances (i.e. outage, security incident).

The bat week begins at the start of the workday on Monday morning and any new issues that come up before the bat handoff meeting on Monday should be addressed by the incoming bat. The previous week’s bat should use the time before the handoff meeting to wrap up in-progress batwork but explicitly not take on new batwork.

If you plan on being out-of-office during your Bat rotation you should find a replacement (usually not an issue). If you’ll be away from your computer over the weekend or in the evening, that does not require coverage.

The @bat Slack handle

Each week the Bat is assigned to the @bat Slack handle which is used in #internal-support to notify the Bat of requests.

Etymology

The exactly etymology of “the Bat” is lost to the tides of history. Some possible origins in order of likelihood:

  • “Bat” refers to a role in the military called “Batman”. The batman was tasked with “miscellaneous tasks the officer does not have time or inclination to do”.
  • “Bat” is shorthand for Batman, the vigilante superhero defending Gotham City against evil.
  • “Bat” has nothing to do with exception triage but fits into Wunder’s storied history of naming things after animals (see: Deducktions and Wellyfish).
  • “Bat” refers to being “at bat” (i.e. “@bat”) in America’s beloved pastime, baseball.

Like all useful language, the meaning of “the Bat” evolves alongside the people who use it. In the end, “the Bat” is whatever your soul wishes it to be.