← Back to blog

I Let My AI Run 72 Backtests While I Watched. It Picked the Winner.

Claude Code controlled my browser, ran 72 strategy backtests on my own site, debated the results, and upgraded my AI trader — all in one session. A non-developer's story.

by Jay13 min readVIBE.LOG
Warning: Disclaimer: Educational analysis only. Not financial advice. Past performance does not guarantee future results.

Last Tuesday, I watched an AI open a browser, navigate to my website, type JavaScript into a code editor, click "Run Backtest," wait for results, record the numbers, and repeat — 72 times.

I did not touch the keyboard once.

This is the story of how Claude Code controlled a real browser, tested 72 trading strategies on bt.vibed-lab.com, picked the best one through a structured debate, and then upgraded my AI stock trader to use it. All in one sitting. All without me writing a single line of code myself.

If you think AI assistants can only answer questions and generate text — keep reading.


🤖 The Setup: Claude Code Can Control Your Browser

Most people know Claude as a chatbot. Type a question, get an answer.

Claude Code is different. It is an AI that lives in your terminal and can do things. Read files. Write code. Run commands. And — the part that matters for this story — control a web browser.

This works through something called Playwright MCP (Model Context Protocol). In plain terms:

Claude Code can open a browser, click buttons, type text, read what is on screen, take screenshots, and navigate pages — exactly like a human would, but faster and without getting bored.

I had already built CryptoBacktest, a backtesting tool where you can write or describe trading strategies and test them against historical Bitcoin data. It has a Strategy Lab at bt.vibed-lab.com/lab — a code editor where you type in a strategy function, pick a date range, and hit "Run."

The question was: could I get Claude Code to use that tool, the way a human would?

The answer turned out to be yes. But getting there was an adventure.


🧪 The Experiment: 72 Strategies, Zero Human Keystrokes

Here was my goal: test as many creative trading strategies as possible across different time periods, then pick the best one.

The Strategy List

I asked Claude to come up with 12 diverse strategies — not just the textbook ones everyone tests, but creative combinations that might catch patterns others miss:

# Strategy Core Logic
1 Golden Cross SMA 50 crosses above SMA 200 → buy
2 RSI Bounce RSI drops below 30 then recovers → buy
3 MACD Momentum MACD crosses above signal line → buy
4 Bollinger Squeeze Price breaks above upper band after narrow squeeze → buy
5 Triple Confirm RSI + MACD + Stochastic all agree → buy
6 Volume Breakout Price breaks resistance on 2x average volume → buy
7 Mean Reversion Price drops 2+ standard deviations below SMA → buy
8 Dual Momentum Both absolute and relative momentum positive → buy
9 ATR Channel Price breaks above ATR-based channel → buy
10 Stoch + MACD Stochastic oversold + MACD cross → buy
11 Ichimoku Cloud Price breaks above the Ichimoku cloud → buy
12 Adaptive RSI RSI threshold adjusts based on recent volatility → buy

The Time Periods

Each strategy was tested across 6 different market conditions:

Period Market Character
2020–2021 COVID crash → massive bull run
2021–2022 Bull market peak → bear
2022–2023 Deep bear → early recovery
2023–2024 Recovery → new ATH
2024–2025 Recent consolidation
2020–2025 Full 5-year cycle

12 strategies × 6 periods = 72 backtests.

Each one required: opening the Strategy Lab, pasting in a strategy function, setting the date range, clicking Run, waiting for results, recording the output.

Doing this manually would have taken hours. Claude Code did it in about 40 minutes.


🎬 How It Actually Worked (The Technical Part, Made Simple)

Here is what happened under the hood, explained for humans:

Step 1: Claude Opens the Browser

Claude Code used Playwright to launch a real Chromium browser and navigate to bt.vibed-lab.com/lab. Just like you double-clicking Chrome and typing a URL.

Step 2: Claude Types Code Into the Editor

The Strategy Lab has a code editor (a <textarea> element with the id #strategy-code). Claude located it, focused on it, and typed in the strategy code — character by character, the same way your keyboard would.

Here is what a strategy function looks like in the Lab:

function strategy(data, params) {
  const buy = [], sell = [];
  const close = data.map(d => d.close);
  const rsi = Ind.rsi(close, 14);
  const macd = Ind.macd(close, 12, 26, 9);
  const stoch = Ind.stochastic(
    data.map(d => d.high),
    data.map(d => d.low),
    close, 14, 3
  );
 
  let position = false;
  for (let i = 1; i < data.length; i++) {
    const rsiBuy = rsi[i] < 42;
    const macdCross = macd.macd[i-1] < macd.signal[i-1]
                   && macd.macd[i] >= macd.signal[i];
    const stochBuy = stoch.k[i] < 42 && stoch.k[i] > stoch.d[i];
 
    if (!position && rsiBuy && (macdCross || stochBuy)) {
      buy.push(i);
      position = true;
    } else if (position && rsi[i] > 70
               && macd.macd[i] < macd.signal[i]) {
      sell.push(i);
      position = false;
    }
  }
  return { buy, sell };
}

That is the Triple Confirm strategy — the one that eventually won. It buys when three indicators simultaneously say "oversold and recovering." It sells when two indicators say "overbought and weakening."

Step 3: Set Date Range and Run

Claude changed the start and end date fields, clicked the "Run Backtest" button, and waited for the chart and results panel to populate.

Step 4: Record Results and Repeat

After each run, Claude read the results (total return, max drawdown, number of trades, win rate) from the page, stored them, then moved to the next strategy-period combination.

72 times.

The Part That Was Not Smooth

I would love to tell you this worked perfectly on the first try. It did not.

Problem 1: Wrong text field. The Lab page has two text areas — one for natural language prompts (#ai-prompt) and one for actual code (#strategy-code). Claude initially typed into the wrong one. Every strategy ran the same default code while only the dates changed. The results all looked suspiciously identical. It took us a few minutes to realize the mistake.

Problem 2: Wrong API format. The Strategy Lab expects a specific function signature: function strategy(data, params) { return { buy, sell }; }. Claude initially generated strategies using an older API style with await buy() and await sell() calls. Every test crashed with "await is only valid in async functions." We had to read the Lab's template code to discover the correct format.

Problem 3: React does not like programmatic typing. The code editor is a React-controlled component. Simply setting textarea.value = code does not trigger React's state update. Claude had to use document.execCommand('insertText') — a browser API that mimics actual keyboard input — to make React recognize the change.

Each problem took a few minutes to diagnose and fix. But the point is: Claude Code debugged its own browser automation in real time. It hit an error, read the error message, figured out the cause, and fixed the approach. I just watched.


📊 The Results: 72 Tests, One Clear Winner

After all 72 tests completed, Claude organized the results into a comparison. Here is a simplified version of what the top strategies looked like across the full 5-year period (2020–2025):

Strategy Total Return Max Drawdown Trades Win Rate
Triple Confirm +312% -18% 23 78%
MACD Momentum +245% -34% 67 52%
RSI Bounce +198% -28% 45 58%
Golden Cross +187% -22% 12 67%
Bollinger Squeeze +156% -41% 38 47%
Volume Breakout +134% -36% 29 51%

(Numbers are from backtests on BTC historical data. Past performance does not predict future results.)

Triple Confirm stood out in a specific way: fewest trades, highest win rate, lowest drawdown.

It did not trade often. But when it did, it was right most of the time. And when it was wrong, the losses were contained.

Some strategies like ATR Channel and Stoch+MACD returned 0% in certain periods — literally zero trades executed because their conditions were too strict. Others like MACD Momentum traded frequently but gave back gains through whipsaws.

Triple Confirm hit the sweet spot: strict enough to avoid bad trades, flexible enough to catch real moves.


🧠 The Debate: How Claude Picked the Winner

I did not just look at the numbers and pick the highest return. I asked Claude to debate it.

"Run a brainstorming session. Argue which strategy we should adopt for the AI trader. Consider not just returns, but consistency, risk, and adaptability."

Claude structured a multi-perspective analysis:

The Case for Triple Confirm:

  • Highest win rate (78%) means the strategy is reliable, not lucky
  • Lowest drawdown (-18%) means it preserves capital during crashes
  • Works across all market conditions (bull, bear, sideways)
  • Few trades = lower transaction costs and less emotional stress
  • Three independent indicators must agree = low false-positive rate

The Case Against:

  • Few trades means it misses some profitable moves
  • In a strong bull market, a simpler strategy like Golden Cross captures more upside

The Counter-Argument:

  • If you want more trades with Triple Confirm, you trade more symbols. Instead of running it on one stock, run it on ten. The strategy stays reliable; you just widen the net.

That last point sealed it. A strategy with a 78% win rate applied across many stocks is more valuable than a 52% strategy that trades one stock frequently.


🔧 The Upgrade: From Backtest to Production

Once we picked Triple Confirm, the next step was embedding it directly into my AI stock trader.

The AI trader already had all the raw indicators — RSI, MACD, Stochastic oscillator — computed for every stock it analyzed. What it lacked was the composite signal: the specific combination that Triple Confirm uses.

Claude made two changes to two files:

Change 1: Add Triple Confirm Signals to the Analysis Engine

In the technical analyzer, three new boolean fields were added to the analysis report:

  • triple_confirm_buy — True when RSI < 42 AND MACD just crossed bullish AND Stochastic is recovering from oversold
  • triple_confirm_sell — True when RSI > 70 AND MACD is below its signal line
  • macd_bullish_cross — True when MACD just crossed above its signal line (a weaker signal on its own)

These are computed from the indicators that were already being calculated. No new data sources. No new API calls. Just a smarter combination of existing data.

Change 2: Tell the AI to Prioritize Triple Confirm

The AI trader uses Google Gemini to make final trading decisions. It receives all the technical data and returns a buy/hold/sell decision with reasoning.

We updated the system prompt — the instructions Gemini follows — to include:

"When triple_confirm_buy=true appears in the technical analysis, this is a high-confidence signal validated by backtesting. Strongly consider buying if other conditions allow. When triple_confirm_sell=true appears, strongly consider taking profits or reducing position."

The AI still has full discretion. It can override the signal if fundamental analysis or market sentiment says otherwise. But now it knows that Triple Confirm is the highest-conviction technical signal in its toolkit.

Two files. About 40 lines of code total. The AI trader went from "look at a bunch of indicators and figure it out" to "here is a specific, backtested pattern — pay extra attention when you see it."


💡 What This Actually Means (For Non-Developers)

Let me zoom out, because this is not really a story about trading strategies.

This is a story about what AI tools can actually do right now — things that sound futuristic but are already working.

AI Can Use Your Software

Claude Code did not analyze data in a vacuum. It opened a real website, used a real code editor, clicked real buttons, and read real results. It used software the way a human intern would — except it worked for 40 minutes straight without checking Instagram.

AI Can Run Experiments at Scale

72 backtests is tedious for a human. For Claude, it was a loop. The same patience that makes testing boring for us makes it trivial for AI.

AI Can Synthesize and Debate

After collecting 72 data points, Claude did not just sort by "highest return." It considered drawdown, consistency, adaptability, transaction costs, and how the strategy could scale across multiple stocks. It argued both sides before recommending.

The Human Still Decides

I picked Triple Confirm. Claude recommended it, but I could have chosen differently. I could have said "I want more trades" and picked MACD Momentum. The AI presents the evidence and the reasoning. You make the call.

That is the workflow I have been refining across all of Vibed Lab's projects: AI does the heavy lifting, human keeps the steering wheel.


🎯 Try It Yourself

If you want to test trading strategies without writing code:

Strategy Lab at bt.vibed-lab.com/lab →

Describe your strategy idea in plain language, or write a strategy function directly. The Lab runs it against real historical data and shows you exactly how it would have performed.

If you want Claude Code to run the backtests for you — the same setup I used — you need:

  1. Claude Code installed (docs.anthropic.com)
  2. Playwright MCP enabled (comes bundled — just enable it in settings)
  3. A target URL (your own tool, or any web app you want the AI to interact with)

That is it. Claude handles the rest.


The most surprising part of this experiment was not the results. It was watching an AI use my own product better than I could have used it manually.

72 strategies. 6 time periods. One winner. Zero human keystrokes.

I built the Lab so that anyone could test strategies without coding. I did not expect the first power user to be my own AI.

2026.03.09

Written by

Jay

Licensed Pharmacist · Senior Researcher

Building production-grade AI tools across medicine, finance, and productivity — without a CS degree. Domain expertise first, code second.

About the author →
ShareX / TwitterLinkedIn