Refactoring 3,000 Lines of CSS Without Breaking the App: Tokens, Variables, and a Same-Day Dark Mode

Three thousand lines of CSS, twelve different shades of gray hard-coded as literal hex codes scattered across files, no single source of truth for spacing, and someone wants dark mode added by Friday. If that sounds like your stylesheet, you already know the feeling — every change takes longer than it should, and the deadline just made the cost concrete.

Real example: that was the state of VORA's stylesheet — my browser-based meeting app — the morning a dark-mode deadline forced the refactor below.

This post is the unglamorous part of a redesign: the part that stops the cascade from eating an afternoon every time you change a button — plus a self-contained equalizer animation you can lift. I'll keep the "the app looks nicer now" portion brief because it's the least transferable thing here.

How the 3,000-line monolith accumulated

Nobody writes a 3,000-line CSS file on day one. It accretes. Looking back at the git history, the pattern was depressingly clear:

Component lands, styles go into the global file. "We'll modularize later." We never modularized later.
A specificity fight starts. A new rule doesn't apply, so someone adds a parent selector to win the cascade. .container .panel .button beats .button. Now .button is no longer a primitive — it's a selector that only works in three places.
The third copy of #1e293b shows up. Someone needs the same panel background somewhere new. They don't grep, they eyeball it. A week later it's #1e2a3b because someone eyeballed wrong.
A bug fix gets !important. Now every future fix in that subtree also needs !important, and you've created a private cascade with its own rules.
Two developers edit the file in parallel. One real commit in our log: "restore corrupted CSS file and correctly apply AI correction styles." The merge resolved cleanly at the syntax level but moved a selector boundary, so the AI-correction sparkle (✨) appeared on every transcript line for half a day. The file passed the linter. It just lied about which selector owned which block.

The forces that produce this are not laziness. They're (a) global namespace + (b) no tokens + (c) cascade-as-API. Each new rule interacts with every previous rule. Maintenance cost is super-linear in line count.

Before the refactor, the codebase had several signs of accumulated technical debt: hex codes scattered across files, repeated !important overrides, deeply nested selectors that required cascade archaeology to modify, and a noticeable lag between "I want to change this color" and "the change is in." I didn't measure these systematically — the diagnosis was the standard "every change takes longer than it should" feeling, plus a dark mode request with a deadline that made the cost concrete.

The file wasn't "long" in any meaningful sense — it was unsafe to change.

Step 1: Custom properties as the single source of truth

The first move was the cheapest and the highest leverage: define every design value once, in :root, and reference it everywhere else. No preprocessor. No build step. Just CSS custom properties.

We split tokens into four categories. Categories matter — flat token soup is barely better than hex codes.

:root {
  /* ---- Color: surfaces (semantic, not literal) ---- */
  --surface-0:        #0f172a;   /* page background */
  --surface-1:        #1e293b;   /* panel */
  --surface-2:        rgba(255, 255, 255, 0.04);  /* card lift */
  --surface-overlay:  rgba(15, 23, 42, 0.72);     /* glass */
 
  /* ---- Color: text ---- */
  --text-primary:    #e2e8f0;
  --text-secondary:  #94a3b8;
  --text-muted:      #64748b;
 
  /* ---- Color: accent ---- */
  --accent:          #6366f1;
  --accent-hover:    #818cf8;
  --accent-ring:     rgba(99, 102, 241, 0.35);
 
  /* ---- Spacing (4-px base, t-shirt sizes) ---- */
  --sp-1: 4px;  --sp-2: 8px;  --sp-3: 12px;
  --sp-4: 16px; --sp-5: 24px; --sp-6: 32px; --sp-7: 48px;
 
  /* ---- Typography ---- */
  --font-sans: "Noto Sans KR", system-ui, -apple-system, sans-serif;
  --fs-xs: 12px; --fs-sm: 13px; --fs-md: 15px; --fs-lg: 18px; --fs-xl: 22px;
  --fw-regular: 400; --fw-medium: 500; --fw-bold: 700; --fw-black: 900;
  --lh-tight: 1.25; --lh-body: 1.55;
 
  /* ---- Radius / motion ---- */
  --radius-sm: 6px; --radius-md: 10px; --radius-lg: 16px;
  --motion-fast: 120ms; --motion-base: 200ms;
  --ease-out: cubic-bezier(0.2, 0.8, 0.2, 1);
}

Two rules we enforced:

No raw hex below this block. A grep for #[0-9a-fA-F]{6} outside :root should return zero. We added a CI check.
Tokens are semantic, not literal. --surface-1 not --gray-800. The whole point is that we can swap the value without renaming the variable. --gray-800 would be a lie the moment we add a light theme.

Naming convention: --<category>-<role>[-state]. --text-primary, --accent-hover, --surface-overlay. Predictable enough that you don't need to open :root to guess a name.

Step 2: Kill specificity fights with a thin BEM

The second move was the unsexiest: rename selectors so the cascade stops being load-bearing.

We didn't adopt full BEM dogma. We adopted exactly the part that solves the problem — flat, single-class selectors with a predictable shape:

.block__element--modifier

Concretely, before and after for a single component (the recording button):

/* BEFORE — cascade as API */
.app .toolbar .controls button.recording { ... }
.app .toolbar .controls button.recording:hover { ... }
.app .toolbar .controls button.recording.disabled { ... }
.app .toolbar .controls button.recording svg { ... }
 
/* AFTER — flat selectors, modifiers as suffixes */
.rec-btn { ... }
.rec-btn:hover { ... }
.rec-btn--disabled { ... }
.rec-btn__icon { ... }

Specificity goes from (0,4,1) to (0,1,0). Anyone writing a new rule against .rec-btn doesn't need to know it's nested inside .app .toolbar .controls. They don't need to win a parent-selector arms race. They don't reach for !important.

We did not rename everything in one PR. The realistic flow:

Rename one component at a time, leaving the old class in place as a deprecated alias for one release.
New code uses the new name. Old code keeps working.
Once the deprecated alias has zero references, delete it.

Each component took a focused chunk to rename and verify — the kind of work you slot in alongside other tasks, not a sprint-blocking rewrite.

Step 3: Dark mode as a one-line theme swap

This is the payoff. Once tokens exist, "add dark mode" is not a project. It's overriding :root.

:root,
:root[data-theme="light"] {
  --surface-0: #f8fafc;
  --surface-1: #ffffff;
  --surface-2: rgba(15, 23, 42, 0.04);
  --text-primary: #0f172a;
  --text-secondary: #475569;
  /* ...accent stays the same; that's the point of semantic tokens */
}
 
:root[data-theme="dark"] {
  --surface-0: #0f172a;
  --surface-1: #1e293b;
  --surface-2: rgba(255, 255, 255, 0.04);
  --text-primary: #e2e8f0;
  --text-secondary: #94a3b8;
}
 
@media (prefers-color-scheme: dark) {
  :root:not([data-theme]) {
    --surface-0: #0f172a;
    --surface-1: #1e293b;
    --surface-2: rgba(255, 255, 255, 0.04);
    --text-primary: #e2e8f0;
    --text-secondary: #94a3b8;
  }
}

The toggle is two lines of JS, persisted in localStorage, applied before paint to avoid a flash:

// in <head>, inline, before first render
(function () {
  const saved = localStorage.getItem("theme");
  if (saved) document.documentElement.setAttribute("data-theme", saved);
})();

Component CSS does not change at all. .panel { background: var(--surface-1); color: var(--text-primary); } is correct in both themes because the tokens changed, not the components.

This is the test of whether your token layer is real: if adding a theme requires editing component CSS, the tokens aren't tokens, they're aliases.

Step 4: The equalizer (a self-contained piece you can lift)

The voice visualizer was the one piece of the redesign worth documenting in full, because it's small, replicable, and the trade-offs are real. Here is the complete implementation.

// equalizer.js
//
// 32 bars driven by FFT magnitudes from a MediaStream, with a
// per-bar sine offset so idle audio doesn't look frozen.
 
const BAR_COUNT       = 32;
const FFT_SIZE        = 256;     // -> 128 frequency bins
const SMOOTHING       = 0.75;    // analyser-side smoothing
const MIN_HEIGHT_PX   = 8;
const MAX_HEIGHT_PX   = 40;
const IDLE_WAVE_AMP   = 10;      // px contributed by the sine offset
const SPEECH_AMP      = 25;      // px contributed by audio energy
const FREQ_LOW_HZ     = 80;      // ignore subsonic rumble
const FREQ_HIGH_HZ    = 6000;    // ignore hiss above speech band
 
export async function startEqualizer(canvasContainer, mediaStream) {
  const ctx = new AudioContext();
  const source = ctx.createMediaStreamSource(mediaStream);
  const analyser = ctx.createAnalyser();
  analyser.fftSize = FFT_SIZE;
  analyser.smoothingTimeConstant = SMOOTHING;
  source.connect(analyser);
 
  // Build the bars once.
  const bars = Array.from({ length: BAR_COUNT }, () => {
    const el = document.createElement("span");
    el.className = "eq__bar";
    canvasContainer.appendChild(el);
    return el;
  });
 
  // Map BAR_COUNT bars onto a log-spaced subset of the FFT bins
  // restricted to the speech band. Linear binning wastes 90% of bars
  // on frequencies humans don't produce.
  const nyquist = ctx.sampleRate / 2;
  const binHz = nyquist / analyser.frequencyBinCount;
  const lowBin  = Math.max(1, Math.floor(FREQ_LOW_HZ  / binHz));
  const highBin = Math.min(analyser.frequencyBinCount - 1,
                           Math.ceil(FREQ_HIGH_HZ / binHz));
  const logLow  = Math.log(lowBin);
  const logHigh = Math.log(highBin);
  const binFor = (i) => {
    const t = i / (BAR_COUNT - 1);
    return Math.round(Math.exp(logLow + (logHigh - logLow) * t));
  };
  const binIndex = Array.from({ length: BAR_COUNT }, (_, i) => binFor(i));
 
  const freqData = new Uint8Array(analyser.frequencyBinCount);
  let raf = 0;
  const t0 = performance.now();
 
  function frame(now) {
    analyser.getByteFrequencyData(freqData);
    const time = (now - t0) / 1000;
 
    for (let i = 0; i < BAR_COUNT; i++) {
      const audioValue = freqData[binIndex[i]] / 255;          // 0..1
      const wave = Math.sin(time * 10 + i * 0.8) * 0.5 + 0.5;  // 0..1
      const dynamic = audioValue * SPEECH_AMP + wave * IDLE_WAVE_AMP;
      const h = Math.min(MAX_HEIGHT_PX,
                Math.max(MIN_HEIGHT_PX, MIN_HEIGHT_PX + dynamic));
 
      const bar = bars[i];
      bar.style.height  = `${h}px`;
      bar.style.opacity = (0.5 + audioValue * 0.5).toFixed(3);
    }
 
    raf = requestAnimationFrame(frame);
  }
  raf = requestAnimationFrame(frame);
 
  return () => {
    cancelAnimationFrame(raf);
    source.disconnect();
    ctx.close();
  };
}

.eq         { display: flex; gap: var(--sp-1); align-items: flex-end;
              height: 48px; }
.eq__bar    { width: 4px; min-height: 8px; border-radius: 2px;
              background: var(--accent);
              transition: opacity var(--motion-fast) linear; }

The trade-offs worth knowing:

Log binning, not linear. Linear FFT binning gives you 64 bars of "above human voice" and 4 bars of "where speech actually lives." Log spacing across 80 Hz–6 kHz is the speech band; everything else is decoration.
smoothingTimeConstant: 0.75. Lower values look twitchy on consonants. Higher values look like a lava lamp. 0.7–0.8 is the band where it reads as "responsive" without flickering.
One requestAnimationFrame loop, not 32 transitions. We tried CSS transitions on height per bar. At 60 Hz with 32 elements, layout thrash dominated. Driving heights manually inside one rAF is cheaper and visibly smoother.
Sine offset on top of real data. Pure FFT looks dead at idle (everything bottoms out near zero). Pure sine looks fake (it doesn't track speech). The sum is what reads as "alive but honest."
No canvas. 32 spans are fine. Reach for canvas at 200+ bars or per-pixel effects, not before.

Layout: 2-column vs 3-column, with the comparison we actually used

We moved from a 2-column layout to a 3-column layout. I want to put the comparison up because "we chose 3-column" without trade-offs is the kind of decision narrative that's useless to other people.

2-COLUMN (before)
┌──────────────────────┬──────────────────────┐
│                      │   Q&A                │
│   Transcript         │   ───────────        │
│   (live)             │   Summary            │
│                      │   (collapsed)        │
└──────────────────────┴──────────────────────┘
- Right column has to time-share Q&A and Summary.
- Users scroll inside the right pane to switch context.
- Works at >=1024px. Below that, Summary is hidden.
 
3-COLUMN (after)
┌────────────┬────────────┬────────────┐
│            │            │            │
│ Transcript │   Q&A      │  Summary   │
│  (live)    │            │            │
│            │            │            │
└────────────┴────────────┴────────────┘
- Three parallel tracks visible at once.
- Needs more horizontal room; we set the cutoff at 1280px.
- Below 1280px we collapse Summary into a tab.

Dimension	2-column	3-column
Min viewport for full layout	1024 px	1280 px
Context switches to read summary while talking	Yes (scroll/toggle)	No
Whitespace on a wide monitor	More empty space on the sides	Columns fill the canvas
Mobile fallback complexity	Simple (stack)	Same (stack)
Below 1280 px on desktop	Native	Collapse summary to tab

The 3-column layout wins on the use case where someone is actively in a meeting and needs all three streams visible. It loses on smaller laptops, which we handled by collapsing the third column to a tab below 1280 px. If your product isn't usually viewed full-screen on a 1440-plus monitor, 2-column is probably the better default — this isn't a universal call.

What the refactor bought us

What the refactor changed, qualitatively: changing a token now ripples through everywhere it's used, instead of requiring a find-and-replace on hex literals. Adding dark mode became a one-line theme switch instead of a parallel stylesheet. Cascade-depth disputes — where two selectors fight over which one wins — became rare enough that I noticed when one happened. Bundle size and Lighthouse scores moved in the right direction during the migration, but I can't attribute that cleanly to tokens versus dead-code removal versus other concurrent cleanup, so I'm not going to quote numbers I can't defend.

We don't have a controlled study on user perception either. The qualitative shift was that "Is this finished?" stopped showing up in feedback; we are not claiming a usability-test result.

When this is worth doing — and when it isn't

Adopt this pattern when:

You have a single-page app with a global stylesheet and no CSS-in-JS / module isolation.
You're seeing the same color/spacing values copy-pasted across files.
Cascade specificity has started showing up in code review comments.
You want a second theme (dark mode, high-contrast, brand variants) — even speculatively. Token discipline is the prerequisite, not a separate project.

Skip or defer when:

You're using CSS Modules, vanilla-extract, Tailwind, or Panda — they enforce or replace most of this at the tooling layer. Don't double-architect.
The codebase is under ~500 lines of CSS and one developer. You don't have the problem yet; introducing a token layer is overhead.
You're three weeks from launch and the styles work. Ship, then refactor on the first post-launch sprint with real usage data.

Where this goes next

Once the token layer exists, it unlocks per-tenant theming — the same dark/light toggle mechanism, but driven by a customer-supplied accent and surface set (runtime token injection without invalidating the static CSS bundle). The point for now: get tokens + flat selectors in place first, and "a second theme" stops being a project.

If you're naming the building blocks as you go, the layout vocabulary cheat sheet pairs well with this.

2026.02.17