I Combined Two Open-Source Repos Into an AI That Plans, Builds, and Reviews Its Own Code
I merged the best parts of two GitHub repos — a forced review loop and a smart routing engine — into a custom Claude Code plugin called FRIDAY. It plans with Codex audit, routes implementation between Claude and Codex by size, enforces mandatory review, and does TDD. All in plain markdown files.
Series: VIBE.LOG
- 1. The Layout Vocabulary Cheat Sheet: What to Call That Thing on Your Screen
- 2. I Spent 3 Hours Trying to Proxy a Blog Subdomain. Here's My Descent Into Madness.
- 3. The Complete SEO Guide: How to Make Google Actually Notice Your Website
- 4. Why Your Next.js Favicon Isn't Showing (And the Three Ways to Actually Fix It)
- 5. GitHub Keeps Telling Me My Branch Is Fine. And Also Not Fine. At the Same Time.
- 6. Mobile-First Playground: Making an Astrology Grid Actually Work on a Phone (And Go Viral While Doing It)
- 7. Playground Is Live: The Destiny Grid, Real Astrology, and Why I'm Shipping a Toy Every Month
- 8. The Interactive Component Cheat Sheet: What to Call That Clickable Thing
- 9. Google Rejected My Site for 'Low-Value Content.' Here's What I Actually Fixed.
- 10. I Actually Fixed Everything. Here's What That Looked Like.
- 11. I Hired 131 AI Employees Today. Here's How.
- 12. I Let My AI Run 72 Backtests While I Watched. It Picked the Winner.
- 13. I Taught My AI to Stop Asking Questions. It Took Five Rewrites.
- 14. Obsidian Turned My Scattered Notes Into a Second Brain. Here's How to Set It Up.
- 15. The Destiny Grid Gets Its East Wing: I Rebuilt Saju (四柱八字) in TypeScript
- 16. Molecule Me: Your Personality, Encoded in Chemistry
- 17. OpenAI Just Built a Plugin for Their Competitor's Tool. I Installed It.
- 18. I Combined Two Open-Source Repos Into an AI That Plans, Builds, and Reviews Its Own Code ← you are here
- 19. I Built a Weekly Directory for Claude Code Agents (Because My Brain Couldn't Keep Up)
After installing OpenAI's Codex plugin for Claude Code (that's the previous post), I wanted more. The official plugin gives you review and task delegation. But the workflow — when to review, when to delegate, how to route, what to enforce — that's on you.
So I went looking for someone who had already figured this out. Found two repos. Liked different parts of each. Combined them. Built FRIDAY.
This is the story of a three-hour Frankenstein session that produced an autonomous development agent from 19 markdown files.
🔍 The Two Repos I Found
1. claude-review-loop (640 stars)
By hamelsmu. The killer feature: a stop hook that physically prevents Claude from exiting until a Codex review exists.
Here's how it works. Claude Code has a "Stop" hook — a script that runs when the agent tries to finish. This plugin hijacks that hook. When Claude says "I'm done," the hook checks: is there a review file on disk? No? Then you're not done. The hook returns {"decision": "block"} and Claude is forced to run the review first.
It's the AI equivalent of a bouncer checking your stamp before you leave. You're not going anywhere until the review is done.
The hook also spawns up to 4 parallel Codex agents — diff review, holistic review, framework-specific review, UX review — and consolidates their findings into a single file. Aggressive. I liked it.
What I didn't like: it's review-only. No planning phase. No smart routing. No TDD.
2. claude-codex (7 stars)
By ching-kuo. Almost nobody had found this one yet, but the architecture was surprisingly complete. It has:
- Smart routing: changes ≤2 files and ≤30 lines → Claude implements directly. Anything bigger → Codex implements via MCP.
- Plan → Implement → Review cycle: Codex audits the plan before you build. Then reviews the implementation after.
- TDD workflow: write failing tests (RED), Codex audits test quality, implement (GREEN), refactor, Codex reviews.
- Critical Evaluation Protocol: don't blindly accept all Codex feedback. Challenge incorrect findings. Discussion doesn't count against the iteration limit.
What I didn't like: no enforcement mechanism. The review is "mandatory" in the instructions, but there's nothing stopping Claude from skipping it. It's a rule, not a lock.
The Obvious Move
Take the enforcement from repo 1. Take the workflow from repo 2. Combine them into something that does everything.
🏗️ What I Built: The dx Plugin
I created a Claude Code plugin called dx (for "dual-cross") with these commands:
| Command | What It Does |
|---|---|
/dx:plan |
Claude creates a plan → Codex audits it (max 3 rounds) → saves to .claude/plan/ |
/dx:build |
Smart routing implementation → mandatory Codex review (stop hook enforced) |
/dx:review |
Standalone Codex review with --adversarial option |
/dx:tdd |
RED → Codex audit → GREEN → Refactor → Codex review |
/dx:friday |
Autonomous agent that chains all of the above |
The whole thing is 19 files. Zero lines of executable code except one bash script (the stop hook). Everything else is markdown instructions.
plugins/dx/
├── .mcp.json ← Codex MCP server auto-start
├── AGENTS.md ← Agent guidelines
├── agents/friday.md ← The autonomous orchestrator
├── commands/ ← 6 slash commands
│ ├── plan.md
│ ├── build.md
│ ├── review.md
│ ├── tdd.md
│ ├── friday.md
│ └── activate.md
├── hooks/
│ ├── hooks.json ← Stop hook registration
│ └── stop-hook.sh ← The bouncer
├── prompts/ ← Codex role injection
│ ├── analyzer.md ← Plan auditor persona
│ ├── architect.md ← Implementation persona
│ └── reviewer.md ← Code reviewer persona
└── skills/ ← Auto-loaded knowledge
├── codex-mcp/SKILL.md ← How to talk to Codex
├── smart-routing/SKILL.md ← When Claude vs. Codex
└── critical-eval/SKILL.md ← How to evaluate feedbackLet me walk through the interesting design decisions.
🧠 Smart Routing: Who Implements What?
This was the idea I liked most from the claude-codex repo. Not everything needs Codex. If you're adding a CSS class to one file, spinning up an MCP server, making an API call, waiting for Codex to write the file — that's overkill. Claude can do that faster with a direct Edit tool call.
The rule is simple:
| Condition | Route |
|---|---|
| ≤ 2 files AND ≤ 30 lines AND no new patterns | Route A: Claude implements directly |
| > 2 files OR > 30 lines OR new architecture | Route B: Codex implements via MCP |
In TDD mode, test files are excluded from the count. Only production code matters for routing. And Codex is never allowed to modify test files — only Claude writes tests. This prevents the two AIs from fighting over the same files.
Route B works through the Codex MCP server. Claude reads the architect.md prompt, injects it as developer-instructions, tells Codex "here's the plan, you have write access, go." Claude's job in Route B is context retrieval and review. It doesn't touch Edit or Write. Clear separation of concerns.
🔒 The Stop Hook: You Cannot Skip Review
This is the part I'm most proud of stealing — I mean, being inspired by.
The stop hook creates a two-phase lock:
Phase 1 (task): Claude finishes implementation. Tries to exit. Hook fires. Checks: is dx-workflow-state.md active? Yes? Then block exit. Tell Claude: "Run the Codex review. Save results to reviews/dx-review-<id>.md. Then you may leave."
Phase 2 (addressing): Claude runs the review, tries to exit again. Hook fires again. Checks: does the review file exist on disk? Yes → approve exit. No → block again (max 2 retries, then fail-open).
The fail-open part is critical. If the hook script crashes — bad JSON, missing jq, whatever — it always defaults to {"decision": "approve"}. Never trap the user. A broken hook that locks you in a loop is worse than no hook at all.
# ERR trap: always fail-open
trap 'printf "{\"decision\":\"approve\"}\n"; exit 0' ERROne line. Worth everything.
I also added path traversal prevention on the review ID. The ID format is YYYYMMDD-HHMMSS-hexhex and gets validated with a regex before it's used in any file path. Paranoid? Maybe. But the review ID comes from a shell command, and shell commands plus file paths are where injection bugs live.
🤖 FRIDAY: The Autonomous Orchestrator
Here's where things get fun.
Each command — plan, build, review, tdd — works independently. But the real workflow chains them:
/dx:plan → user approves → /dx:build → tests → review → commitDoing this manually means typing four commands and making decisions at each step. So I built an agent that does it automatically. I named it FRIDAY, because JARVIS was taken (by my own previous post, ironically).
FRIDAY's pipeline:
INTAKE → PLAN → GATE → BUILD → VERIFY → REVIEW → DELIVERThe agent auto-detects task type (feature/bugfix/refactor), picks the right mode (standard vs. TDD — it uses TDD if you mention tests or if the project already has >60% test coverage), runs the whole pipeline, and only stops twice to ask you anything:
- GATE — "Here's the plan. Proceed?" (You must approve before implementation starts.)
- DELIVER — "Here are the results. Commit?" (You must approve before anything is committed.)
Everything between those two gates is autonomous. FRIDAY picks the route, implements, runs tests, fixes failures, triggers Codex review, evaluates findings, fixes valid issues, challenges invalid ones, saves the review file — all without asking.
The decision framework is explicit:
| FRIDAY decides alone | FRIDAY asks you |
|---|---|
| Code style (matches existing) | Deleting files |
| Internal implementation details | Changing public APIs |
| Trivial Codex findings (LOW) | Adding dependencies |
| Test organization | Security decisions |
| Borderline finding, fix is <5 lines | BLOCKED after 3 iterations |
It's the balance I've been chasing since Jarvis Mode v1. Full autonomy where it's safe. Human judgment where it matters.
🧬 Critical Evaluation: Don't Trust Everything Codex Says
This is the thing most people miss about dual-AI workflows. Codex is a reviewer, not an oracle. It makes mistakes. It flags things that aren't real problems. It misses context that Claude has from reading your project.
The Critical Evaluation Protocol is a skill file that loads automatically:
For each Codex finding:
- Is it technically correct?
- Does Codex have enough context to judge?
- Does fixing it actually improve the code?
If incorrect: challenge via
codex-replywith evidence. Do NOT fix. If ambiguous: ask the user. If correct: fix it.
The key insight: challenging a finding doesn't count as a review iteration. Only fix-and-resubmit cycles count toward the 3-iteration limit. This prevents the situation where Codex says something wrong, Claude wastes an iteration "fixing" a non-problem, and now you've burned 1 of 3 chances on nothing.
It also prevents the opposite failure: Claude blindly accepting every Codex suggestion, including the bad ones, and the code getting worse through over-compliance. Claude is the architect. Codex is the reviewer. The architect listens to reviews but doesn't obey them unconditionally.
📦 How to Install It
The plugin lives at D:\coding\claude-codex-workflow. To use it in your own setup:
- Clone or copy the directory somewhere permanent.
- Register it in
~/.claude/plugins/known_marketplaces.json:{ "dx-workflow": { "source": { "source": "local", "path": "/path/to/claude-codex-workflow" }, "installLocation": "/path/to/claude-codex-workflow" } } - Enable in
~/.claude/settings.json:{ "enabledPlugins": { "dx@dx-workflow": true } } - Restart Claude Code or run
/reload-plugins. - Make sure Codex CLI is installed and authenticated:
npm install -g @openai/codex codex login status
The .mcp.json file in the plugin auto-starts the Codex MCP server when the plugin loads. No extra claude mcp add needed.
📊 What I Took From Where
| Feature | Source Repo | What I Changed |
|---|---|---|
| Stop hook enforcement | claude-review-loop | Added Windows/Git Bash compat, jq-optional fallback |
| Fail-open on errors | claude-review-loop | Same principle, new implementation |
| Smart routing | claude-codex | Extracted as standalone skill, shared across commands |
| Plan → Build → Review | claude-codex | Split into independent commands + combined in FRIDAY |
| TDD flow | claude-codex | Added stop hook enforcement |
| Critical Evaluation | claude-codex | Extracted as shared skill |
| Adversarial review | codex-plugin-cc (OpenAI) | Added as flag on /dx:review |
| Autonomous orchestrator | My own Jarvis Mode v5 | Rebuilt as FRIDAY with dual-model gates |
Nothing here is original. Every piece came from someone else's repo. I just stitched them together in a way that made sense to me and saved myself from typing four commands every time I want to ship something.
🧠 The Realization
The most powerful thing about Claude Code's plugin system isn't that it lets you code new features. It's that it lets you describe workflows in plain text and the AI follows them.
FRIDAY is 19 files. One bash script. The rest is markdown. There are more words in this blog post than in the entire plugin. And yet it orchestrates a pipeline that:
- Plans with codebase awareness
- Gets audited by a different AI model
- Routes implementation by change size
- Enforces mandatory cross-model review with a filesystem lock
- Runs TDD with test ownership rules
- Makes 90% of decisions autonomously
- Escalates the right 10% to the human
All of that. In markdown files. In folders.
I keep saying this, but it keeps being true: the most complex AI systems I build contain zero lines of real code. Just very specific instructions about what to do, what not to do, and what to prove before claiming you're done.
🔗 Resources
- openai/codex-plugin-cc — OpenAI's official Codex plugin for Claude Code
- hamelsmu/claude-review-loop — Stop hook forced review (640 stars, the enforcement inspiration)
- ching-kuo/claude-codex — Smart routing + TDD workflow (7 stars, the workflow inspiration)
- My Jarvis Mode post — The previous version of this idea, single-model
- Claude Code Plugin Docs — Official documentation
If you build something better than FRIDAY using these pieces, tell me. I'll steal that too.
2026.04.10
Written by
Jay
Licensed Pharmacist · Senior Researcher
Building production-grade AI tools across medicine, finance, and productivity — without a CS degree. Domain expertise first, code second.
About the author →Related posts