Skip to content
Serafin Sanchez
bootsncats — a step sequencer you describe in plain English preview

bootsncats — a step sequencer you describe in plain English

A drum machine you build by describing the beat — "house beat at 124 with a tom fill at the end." Tweak it on a step grid, then export MIDI, a full mix, or stems. It runs in the browser and your samples stay on your machine.

Next.js
React
TypeScript
Gemini
Web Audio API

Project Overview

bootsncats is a step sequencer you build by describing the beat.

Programming a drum pattern by hand means clicking out every hit on a grid, which is fine once you know what you want and slow when you don't. Most of the time you can say the thing — "house beat at 124 with a tom fill at the end," "make the hats more swung," "add a clap on the backbeat" — long before you'd finish clicking it in. So you type that, and the pattern shows up on the grid.

From there it's a normal sequencer. You play it back, toggle steps, change the tempo, solo a lane to hear it on its own. When it's right, you export — MIDI to drop into a DAW, a full mix to share, or separate stems to mix. It runs in the browser, there's no account, and custom samples are decoded and played locally instead of uploaded.

The chat box above the step grid, with kits and export in the toolbar Describe a pattern, tweak it on the grid, pick a kit, export.

Key Features

💬 Describing the beat

  • Type a request in plain English and Gemini returns a pattern that loads straight onto the grid
  • Edits stack on what's already there — "add a tom fill" changes the tom, not the whole pattern — and each response says what it changed
  • Genre-aware: house, Dilla swing, UK garage, samba, and others
  • Transport runs through the same box: "play it," "set tempo to 128," "mute the hats"
  • Simple, unambiguous commands skip the model and run locally, so they're instant and deterministic

🎛️ The sequencer

  • Two-bar grid: 32 steps in sixteenth-note mode, 24 in triplet mode, switchable without losing the pattern, with tuplet support down to quintuplets and septuplets
  • Eight lanes — kick, snare, clap, rim, closed hat, open hat, tom, ride — each with solo and mute
  • Per-step velocity (0–127), pitch (±12 semitones), and note length
  • Free notes sit off the grid for ghost notes, flams, and micro-timing
  • Adjustable swing (0–100%), tempo from 60 to 200 BPM, and full undo/redo (⌘Z / ⇧⌘Z)

🥁 Kits and samples

  • 808, 909, or a custom kit
  • Upload your own one-shots per instrument — they're read and played in the browser, never sent to a server
  • Pitch any sample ±12 semitones, with a preview before you commit
  • Reset back to the stock kits in one click

📤 Export

  • MIDI as a standard .mid, two-bar loop, General MIDI percussion on channel 10 — preserves tempo, timing, velocity, and length
  • Full mix as a stereo 16-bit WAV, normalized to −0.1 dBFS
  • Stems as one WAV per instrument in a zip
  • Audio is rendered offline, so export is faster than real time, and swing and groove come through exactly as you hear them
  • Web MIDI out, for sending the pattern to hardware or another instrument live

User Flow

  1. Describe: type the beat you want, or start from one of the prompt ideas
  2. Listen: hit play and hear it back at your tempo
  3. Tweak: toggle steps, change the swing or BPM, solo a lane, ask for an edit
  4. Personalize: switch kits, or upload and pitch your own samples
  5. Export: MIDI, a full mix, or stems — then keep going in your DAW

Architecture & Backend

Generating patterns

  • The prompt is sent to Gemini 3.5 Flash through a Next.js API route, so the key stays server-side and never reaches the client. The model returns a structured set of operations against a typed pattern schema rather than free text, which is what makes the result land cleanly on the grid.
  • Not every request needs the model. A local fast path handles the unambiguous ones — tuplets, transport, plain transforms — without a round trip, which keeps them instant and predictable.
  • Generation quality is measured, not eyeballed. There's an evals harness (Vitest plus scripts) that scores prompts against a dataset and compares models, with seen / held-out tagging so the few-shot examples don't leak into the score.

Audio

  • Playback runs on a Web Audio scheduler. Pitch shifting uses SoundTouchJS, and export goes through an offline render so it's faster than real time and matches what you heard.
  • @tonejs/midi writes the MIDI file, wavefile handles the WAV encoding, and fflate zips the stems.

Data & privacy

  • There's no account, and patterns and custom samples stay in the browser.
  • The only thing stored server-side is the waitlist — email and an optional note — kept in InstantDB, with Resend for email. Nothing about your patterns is logged.

Technical Challenges Overcome

Making the model's output predictable

The hard part of a "describe it" interface is that the model has to produce something the app can use every time, not prose. Having Gemini return operations against a typed schema — and backing that with an evals harness that scores prompts against a dataset — turns "does this prompt work" from a guess into a number.

Knowing when not to call the model

A lot of what users type doesn't need an LLM. Routing the unambiguous commands to a local fast path makes them instant and deterministic, and keeps the model for the requests that actually benefit from it.

Keeping the export honest

The exported file has to match what you heard, including swing and per-step timing. Rendering offline from the same pattern data — rather than re-recording playback — keeps the preview and the file in agreement, and the MIDI maps to General MIDI percussion so it lands on the right drums in any DAW.

Custom samples without a backend

Letting people bring their own sounds usually means uploads and storage. Here the samples are read, pitched, and played entirely in the browser, so there's nothing to upload and nothing to store.

Tech Stack Breakdown

App

  • Next.js 15 (App Router, Turbopack), React 19, TypeScript, Tailwind CSS v4
  • Web Audio API for playback and an offline render for export
  • SoundTouchJS for pitch, @tonejs/midi for MIDI, wavefile for WAV, fflate for the stems zip

Generation

  • Gemini 3.5 Flash behind a server-side API route, returning operations against a typed pattern schema
  • A local fast path for unambiguous commands
  • Vitest plus an evals harness that scores prompts against a dataset and compares models

Backend

  • InstantDB for the waitlist, Resend for email — and nothing else server-side

Impact

bootsncats turns "click out every hit" into "say the beat, then fix what's off." It's a normal step sequencer once the pattern is there — the AI just gets you to a starting point faster. Most of the work went into the two things that make that trustworthy: the model returning something the app can always use, and the export matching what you heard.

Related Projects

Other projects you might find interesting

A modern, responsive developer portfolio built with Next.js 14, featuring MDX content management, dark mode, and a clean design system.
React
Next.js
TypeScript
+2
A sophisticated B2B/B2C e-commerce platform for a 50+ year family-owned manufacturer of custom metal products, featuring role-based pricing, custom orders, and vendor management.
Next.js
Supabase
TypeScript
+2
A sophisticated post-purchase feedback collection system with embeddable widgets for e-commerce integration, featuring real-time data collection, photo uploads, and seamless BigCommerce embedding.
Next.js
InstantDB
Embeddable Widgets
+2