GitHub

Auto-claude-code-research-in-sleep for Claude Code |…

ARIS Logo

Hero

Score Progression

🌙 Let Claude Code do research while you sleep. Wake up to find your paper scored, weaknesses identified, experiments run, and narrative rewritten — autonomously.

🪶 Radically lightweight — zero dependencies, zero lock-in. The entire system is plain Markdown files. No framework to learn, no database to maintain, no Docker to configure, no daemon to babysit. Every skill is a single SKILL.md readable by any LLM — swap Claude Code for Codex CLI, OpenClaw, Cursor, Trae, Antigravity, Windsurf, or your own agent and the workflows still work. Fork it, rewrite it, adapt it to your stack.

💡 ARIS is a methodology, not a platform. What matters is the research workflow — take it wherever you go. 🌱

· · · · 💬 Join Community ·

Custom Claude Code skills for autonomous ML research workflows. These skills orchestrate cross-model collaboration — Claude Code drives the research while an external LLM (via Codex MCP) acts as a critical reviewer. 🔀 Also supports alternative model combinations (Kimi, LongCat, DeepSeek, etc.) — no Claude or OpenAI API required. For example, MiniMax-M2.7 + GLM-5 or GLM-5 + MiniMax-M2.7. 🤖 Codex CLI native — full skill set also available for OpenAI Codex. 🖱️ Cursor — works in Cursor too. 🖥️ Trae — ByteDance AI IDE. 🚀 Antigravity — Google's agent-first IDE. 🆓 Free tier via ModelScope — zero cost, zero lock-in.

💭 Why not self-play with a single model? Using Claude Code subagents or agent teams for both execution and review is technically possible, but tends to fall into local minima — the same model reviewing its own patterns creates blind spots.

Think of it like adversarial vs. stochastic bandits: a single model self-reviewing is the stochastic case (predictable reward noise), while cross-model review is adversarial (the reviewer actively probes weaknesses the executor didn't anticipate) — and adversarial bandits are fundamentally harder to game.

💭 Why two models, not more? Two is the minimum needed to break self-play blind spots, and 2-player games converge to Nash equilibrium far more efficiently than n-player ones. Adding more reviewers increases API cost and coordination overhead with diminishing returns — the biggest gain is going from 1→2, not 2→4.

Claude Code's strength is fast, fluid execution; Codex (GPT-5.4 xhigh) is slower but more deliberate and rigorous in critique. These complementary styles — speed × rigor — produce better outcomes than either model talking to itself.

🎯 More Than Just a Prompt

These are full pipelines — you can also use each workflow independently. Already have an idea? Skip to Workflow 1.5. Have results? Jump to Workflow 3. Got reviews? Jump to Workflow 4. See Quick Start for all commands and Workflows for the full breakdown.

Basic mode — give ARIS a research direction, it handles everything:

/research-pipeline "factorized gap in discrete diffusion LMs"

🔥 Targeted mode — got a paper you want to improve? Give ARIS the paper + the code:

/research-pipeline "improve method X" — ref paper: https://arxiv.org/abs/2406.04329, base repo: https://github.com/org/project

ARIS reads the paper → finds its weaknesses → clones the codebase → generates ideas that specifically fix those weaknesses with that code → runs experiments → writes your paper. Like telling a research assistant: "read this paper, use this repo, find what's missing, and fix it."

Mix and match: ref paper only = "what can be improved?", base repo only = "what can I build with this code?", both = "improve this paper using this code."

🔥 Rebuttal mode — reviews just dropped? Don't panic. ARIS reads every concern, builds a strategy, and drafts a rebuttal that's grounded, structured, and under the character limit:

/rebuttal "paper/ + reviews" — venue: ICML, character limit: 5000

Parameter	Default	What it does
`venue`	`ICML`	Target venue (ICML/NeurIPS/ICLR/CVPR/ACL/AAAI/ACM)
`character limit`	—	Required. Hard character limit for rebuttal text
`quick mode`	`false`	Stop after parsing + strategy (Phase 0-3). See what reviewers want before drafting
`auto experiment`	`false`	Auto-run supplementary experiments via `/experiment-bridge` when reviewers ask for new evidence
`max stress test rounds`	`1`	How many times GPT-5.4 xhigh stress-tests the draft
`max followup rounds`	`3`	Per-reviewer follow-up round limit

Three safety gates — rebuttal will NOT finalize if any fails:

🔒 No fabrication — every claim maps to paper/review/user-confirmed result
🔒 No overpromise — every promise is user-approved
🔒 Full coverage — every reviewer concern is tracked

Two outputs: PASTE_READY.txt (exact char count, paste to venue) + REBUTTAL_DRAFT_rich.md (extended version for manual editing).

After acceptance — your paper is in, now prepare the presentation:

/paper-slides "paper/"     # → Beamer PDF + PPTX + speaker notes + Q&A prep
/paper-poster "paper/"     # → A0/A1 poster PDF + editable PPTX + SVG

💡 From idea to paper to podium — one toolchain. 🌱

🏆 Papers Accepted with ARIS

Paper	Score	Venue	Author	Stack
CS Paper	8/10 "clear accept"	CS Conference	@DefanXue & @Monglitay	Claude Code + GPT-5.4
AAAI Paper	7/10 "good paper, accept"	AAAI 2026 Main Technical	@xinbo820-web	Pure Codex CLI

🎉 Built entirely with ARIS — from idea to acceptance. Full details + reviewer screenshots →

📢 What's New

2026-03-24 — 📝 Workflow 4: /rebuttal — post-submission rebuttal pipeline. Parse reviews → atomize → strategy → draft → safety check → GPT-5.4 stress test → finalize (strict + rich versions) → follow-up rounds. 3 safety gates (no fabrication, no overpromise, full coverage). quick mode for analysis only. auto experiment for supplementary experiments. Designed from 5 successful rebuttal case studies + 3 rounds GPT-5.4 xhigh design review
2026-03-23 — 🔧 3 skills integrated into core workflows: /training-check, /result-to-claim, /ablation-planner. 📦 compact mode — generate lean summary files for short-context models and session recovery (— compact: true). 🔄 research-refine checkpoint — auto-resume after interruption. Community contributions by @JingxuanKang & @couragec
2026-03-22 — 📋 Templates — input templates for every workflow. 📄 — CVPR, ACL, AAAI, ACM MM added. 🛡️ — Workflow 2 enforces DBLP → CrossRef → [VERIFY]. 🔗 — clone a GitHub repo as base codebase ()

2026-03-18 — 🎤 paper-slides + 🔁 Codex+Claude bridge + 🖱️ Cursor guide + 🤖 Codex CLI skills + 📝 grant-proposal + 🎨 paper-illustration (Gemini) + 📊 CitationClaw
2026-03-17 — 🔧 Git code sync + 🆓 ModelScope guide + parameter pass-through
2026-03-16 — 🔬 research-refine + experiment-plan — turn vague ideas into problem-anchored proposals with claim-driven experiment roadmaps. Now integrated into Workflow 1 (/idea-discovery). Community contribution by @zjYao36
2026-03-16 — 🇨🇳 Alibaba Coding Plan guide — one API key, 4 models (Kimi-K2.5 + Qwen3.5+ + GLM-5 + MiniMax-M2.5), dual-endpoint setup. Community contribution by @tianhao909
2026-03-15 — 🔀 Bring your own model! Any OpenAI-compatible API now works as reviewer via MCP server. GLM, MiniMax, Kimi, LongCat, DeepSeek all tested —

🚀 Quick Start

# 1. Install skills
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cp -r Auto-claude-code-research-in-sleep/skills/* ~/.claude/skills/

# 2. Set up Codex MCP (for review skills)
npm install -g @openai/codex
codex setup                    # set model to gpt-5.4 when prompted
claude mcp add codex -s user -- codex mcp-server

# 3. Use in Claude Code
claude
> /idea-discovery "your research direction"  # Workflow 1 — be specific! not "NLP" but "factorized gap in discrete diffusion LMs"
> /experiment-bridge                         # Workflow 1.5 — have a plan? implement + deploy + collect results
> /auto-review-loop "your paper topic or scope"  # Workflow 2: review → fix → re-review overnight
> /paper-writing "NARRATIVE_REPORT.md"       # Workflow 3: narrative → polished PDF
> /rebuttal "paper/ + reviews" — venue: ICML    # Workflow 4: parse reviews → draft rebuttal → follow-up
> /research-pipeline "your research direction"  # Full pipeline: Workflow 1 → 1.5 → 2 → 3 end-to-end

📝 Templates available! See templates/ for ready-to-use input templates for every workflow — research brief (Workflow 1), experiment plan (Workflow 1.5), narrative report (Workflow 3), paper plan (Workflow 3).

Tip: All pipeline behaviors are configurable via inline overrides — append — key: value to any command:

Parameter Default What it does
AUTO_PROCEED true Auto-continue at idea selection gate. Set false to manually pick which idea to pursue before committing GPU time
human checkpoint false Pause after each review round so you can read the score, give custom modification instructions, skip specific fixes, or stop early
sources all Which literature sources to search: zotero, obsidian, local, web, or all (comma-separated)
arxiv download false Download top relevant arXiv PDFs during literature survey. When false, only fetches metadata (title, abstract, authors)
DBLP_BIBTEX true Fetch real BibTeX from DBLP/ instead of LLM-generated entries. Eliminates hallucinated citations. Zero install

Parameter	Default	What it does
`AUTO_PROCEED`	`true`	Auto-continue at idea selection gate. Set `false` to manually pick which idea to pursue before committing GPU time
`human checkpoint`	`false`	Pause after each review round so you can read the score, give custom modification instructions, skip specific fixes, or stop early
`sources`	`all`	Which literature sources to search: `zotero`, `obsidian`, `local`, `web`, or `all` (comma-separated)
`arxiv download`	`false`	Download top relevant arXiv PDFs during literature survey. When `false`, only fetches metadata (title, abstract, authors)
`DBLP_BIBTEX`	`true`	Fetch real BibTeX from DBLP/ instead of LLM-generated entries. Eliminates hallucinated citations. Zero install

Important: Codex MCP uses the model from ~/.codex/config.toml, not from skill files. Make sure it says model = "gpt-5.4" (recommended). Other options: gpt-5.3-codex, gpt-5.2-codex, o3. Run codex setup or edit the file directly.

Want Codex to execute but Claude Code to review? See docs/CODEX_CLAUDE_REVIEW_GUIDE.md. That path installs the base skills/skills-codex/*, then overlays skills/skills-codex-claude-review/*, and routes review-heavy skills through the local claude-review MCP bridge.

Want Codex to execute but Gemini to review locally? See docs/CODEX_GEMINI_REVIEW_GUIDE.md and CN. That path installs the base skills/skills-codex/*, then overlays skills/skills-codex-gemini-review/*, and routes the reviewer-aware predefined skills through the local gemini-review MCP bridge using direct Gemini API by default.

See full setup guide for details and alternative model combinations if you don't have Claude/OpenAI API.

✨ Features

📊 31 composable skills — mix and match, or chain into full pipelines (/idea-discovery, /auto-review-loop, /paper-writing, /research-pipeline)
🔍 Literature & novelty — multi-source paper search (Zotero + Obsidian + local PDFs + arXiv/Scholar) + cross-model novelty verification
💡 Idea discovery — literature survey → brainstorm 8-12 ideas → novelty check → GPU pilot experiments → ranked report
🔄 Auto review loop — 4-round autonomous review, 5/10 → 7.5/10 overnight with 20+ GPU experiments
📝 Paper writing — narrative → outline → figures → LaTeX → PDF → auto-review (4/10 → 8.5/10), one command. Anti-hallucination citations via DBLP/CrossRef
🤖 Cross-model collaboration — Claude Code executes, GPT-5.4 xhigh reviews. Adversarial, not self-play
📝 Peer review — review others' papers as a conference reviewer, with structured scoring and meta-review
🖥️ Review-driven experiments — when GPT-5.4 says "run an ablation", Claude Code automatically writes the script, rsyncs to your GPU server, launches in screen, collects results, and folds them back into the paper. Just configure your server in CLAUDE.md (setup guide)

📈 Score Progression (Real Run)

A real overnight 4-round run on an ML research project, from borderline reject to submission-ready:

Round	Score	What Happened
Initial	5.0/10	Borderline reject
Round 1	6.5/10	Added standard metrics, discovered metric decoupling
Round 2	6.8/10	Key claim failed to reproduce, pivoted narrative
Round 3	7.0/10	Large seed study killed main improvement claim
Round 4	7.5/10 ✅	Diagnostic evidence solidified, submission ready

The loop autonomously ran 20+ GPU experiments, rewrote the paper's narrative framing, and killed claims that didn't hold up — all without human intervention.

🏆 Community Showcase — Papers Built with ARIS

Real projects where the ARIS pipeline was used end-to-end. If you've used ARIS to complete a paper, we'd love to feature it here — open an issue or PR!

Paper	Rating	Venue	Built by	Notes
CS Paper	8/10 — "Top 50% of accepted papers, clear accept"	CS Conference	@DefanXue & @Monglitay	Full ARIS pipeline: idea → experiments → auto-review → paper writing. Reviewer: "empirical findings are stark, well-supported, and expose a fundamental flaw"
AAAI 2026 Paper	7/10 — "Good paper, accept"	AAAI 2026 Main Technical	@xinbo820-web	Pure Codex CLI (ARIS-Codex skills). Accepted at AAAI 2026

🎉 Papers built entirely with ARIS — from idea to acceptance. Know more? Let us know!

🧩 Awesome Community Skills & Extensions

Domain-specific skills and external projects contributed by the community. PRs welcome — just add a skills/your-skill/SKILL.md and open a PR!

💡 How to use: Community skills are not auto-wired into core workflows. To use one, ask your executor (Claude Code / OpenClaw / etc.) to read the skill's SKILL.md, then plug it into the appropriate workflow stage based on the description below.

🎉 Community Skills (12): research-refine · experiment-plan · grant-proposal · paper-poster · paper-slides · mermaid-diagram · proof-writer · comm-lit-review · dse-loop · idea-discovery-robot · formula-derivation ·

🌐 External Projects & Docs (9): open-source-hardening-skills · CitationClaw · auto-hparam-tuning · Antigravity Adaptation Guide · OpenClaw Adaptation Guide · Cursor Adaptation Guide · Codex+Claude Review Bridge · Trae Adaptation Guide · paper-illustration

🙌 Thanks to every contributor! We fold the tables below to keep the README readable — but every skill and project here is equally valued. PRs always welcome!

Name	Domain	Description	Codex MCP?
🔬 `research-refine`	General	Turn a vague idea into a problem-anchored, implementation-oriented method proposal. Best inserted between `/idea-discovery` and `/auto-review-loop`	Yes
🧪 `experiment-plan`	General	Turn a refined proposal into a claim-driven experiment roadmap with ablations, budgets, and run order	No
🧭 `research-refine-pipeline`	General	One-shot chain: `/research-refine` → `/experiment-plan` for method refinement plus experiment planning	Yes
📝 `grant-proposal`	General	Grant proposal drafting (KAKENHI/NSF/NSFC/ERC/DFG/SNSF/ARC/NWO). Chains `/research-lit` → `/novelty-check` → →

Name	Domain	Description
🛡️ open-source-hardening-skills	DevOps / OSS	10-skill pipeline to harden research code into production-ready open-source projects — audit, refactor, test, CI, docs, review
📊 CitationClaw	General	Citation impact analysis — input paper title → citation crawling, scholar identification, tiered analysis, HTML dashboard
🚀 Antigravity Adaptation Guide	General	Use ARIS skills in Google Antigravity — native SKILL.md support, dual model (Claude Opus 4.6 / Gemini 3.1 Pro), MCP setup, EN + CN guides
🐾 OpenClaw Adaptation Guide	General	Use ARIS workflow methodology in OpenClaw — skill-to-stage mapping, file-based orchestration, no Claude Code CLI needed
🖱️ Cursor Adaptation Guide	General

🔄 Workflows

These skills compose into a full research lifecycle. The four workflows can be used independently or chained together:

Exploring a new area (e.g., writing a survey)? Start with Workflow 1 → /idea-discovery
Have a plan, need to implement and run? Workflow 1.5 → /experiment-bridge
Already have results, need iterative improvement? Workflow 2 → /auto-review-loop
Ready to write the paper? Workflow 3 → /paper-writing (or step by step: /paper-plan → /paper-figure → /paper-write → /paper-compile → /auto-paper-improvement-loop)
Got reviews back? Need to rebuttal? Workflow 4 → /rebuttal — parse reviews, draft safe rebuttal, follow-up rounds
Full pipeline? Workflow 1 → 1.5 → 2 → 3 → submit → 4 → /research-pipeline + /rebuttal — from idea to acceptance

⚠️ Important: These tools accelerate research, but they don't replace your own critical thinking. Always review generated ideas with your domain expertise, question the assumptions, and make the final call yourself. The best research comes from human insight + AI execution, not full autopilot.

Full Pipeline 🚀

/research-lit → /idea-creator → /novelty-check → /research-refine → /experiment-bridge → /auto-review-loop → /paper-writing → submit → /rebuttal → accept! 🎉
  (survey)      (brainstorm)    (verify novel)   (refine method)   (implement+deploy)  (review & fix)      (write paper)   (send)   (reply to reviewers)
  ├────────────── Workflow 1: Idea Discovery ──────────────┤ ├ Workflow 1.5 ─┤ ├── Workflow 2 ──┤ ├── Workflow 3 ──┤         ├── Workflow 4 ──┤

📝 Blog post: 梦中科研全流程开源

Workflow 1: Idea Discovery & Method Refinement 🔍

"What's the state of the art? Where are the gaps? How do we solve it?"

Don't have a concrete idea yet? Just give a research direction — /idea-discovery handles the rest:

📚 Survey the landscape (recent papers, open problems, recurring limitations)
🧠 Brainstorm 8-12 concrete ideas via GPT-5.4 xhigh
🔍 Filter by feasibility, compute cost, and quick novelty search
🛡️ Validate top ideas with deep novelty check + devil's advocate review
🧪 Pilot top 2-3 ideas in parallel on different GPUs (30 min - 2 hr each)
🏆 Rank by empirical signal — ideas with positive pilot results rise to the top
🔬 Refine the top idea into a problem-anchored proposal via iterative GPT-5.4 review
🧪 Plan claim-driven experiments with ablations, budgets, and run order

The output is a ranked IDEA_REPORT.md plus a refined proposal (refine-logs/FINAL_PROPOSAL.md) and experiment plan (refine-logs/EXPERIMENT_PLAN.md) for the top idea. Dead-end ideas are documented too, saving future exploration.

┌─────────────────────────────────────────────────────────────────┐
│              Idea Discovery & Method Refinement                  │
│                                                                  │
│   /research-lit    /idea-creator    /novelty-check               │
│   (find papers)    (brainstorm)     (verify novelty)             │
│         │               │                │                       │
│         ▼               ▼                ▼                       │
│   ┌──────────┐    ┌──────────┐     ┌──────────┐                │
│   │ Scan     │───▶│ Generate │────▶│ Check if │                │
│   │ local    │    │ 8-12     │     │ idea is  │                │
│   │ papers + │    │ ideas    │     │ novel    │                │
│   │ search   │    │ + rank   │     │          │                │
│   └──────────┘    └──────────┘     └──────────┘                │
│                         │                │                       │
│                         ▼                ▼                       │
│                   ┌──────────┐     ┌──────────┐                │
│                   │ Filter   │────▶│ External │                │
│                   │ by cost, │     │ LLM      │                │
│                   │ novelty  │     │ evaluates│                │
│                   └──────────┘     └──────────┘                │
│                                          │                       │
│                   /research-refine       ▼                       │
│                   (refine method)   ┌──────────┐                │
│                         │          │ Freeze   │                │
│                         ▼          │ problem  │                │
│                   ┌──────────┐     │ anchor + │                │
│                   │ Iterate  │◀───▶│ refine   │                │
│                   │ until    │     │ method   │                │
│                   │ score≥9  │     └──────────┘                │
│                   └──────────┘          │                       │
│                         │               ▼                       │
│                   /experiment-plan  ┌──────────┐                │
│                         │          │ Claim-   │                │
│                         ▼          │ driven   │                │
│                   ┌──────────┐     │ experiment│               │
│                   │ Plan     │────▶│ roadmap  │                │
│                   │ runs     │     └──────────┘                │
│                   └──────────┘                                  │
│                                                                  │
│   Typical flow:                                                  │
│   1. /research-lit "discrete diffusion models"                   │
│   2. /idea-creator "DLLMs post training"                         │
│   3. Review ranked ideas, pick top 2-3                           │
│   4. /novelty-check "top idea" (deep verification)               │
│   5. /research-review "top idea" (critical feedback)             │
│   6. /research-refine "top idea" (problem anchor + method)       │
│   7. /experiment-plan (claim-driven roadmap)                     │
│   8. /run-experiment → /auto-review-loop                         │
└─────────────────────────────────────────────────────────────────┘

Skills involved: research-lit + idea-creator + novelty-check + research-review + research-refine-pipeline

💡 One-command shortcut: /idea-discovery "your research direction" runs this entire workflow automatically.

🔄 Human-in-the-loop: Each phase presents results and waits for your feedback. Not happy? Tell it what's missing — it refines the prompt and regenerates. Trust the defaults? It auto-proceeds with the top-ranked option. You decide how hands-on to be.

⚙️ Pilot experiment budgets (max hours, timeout, GPU budget) are configurable — see Customization.

📝 Blog post: Claude Code 两月 NeurIPS 指北

Workflow 1.5: Experiment Bridge 🔗

"I have a plan. Now implement it, deploy it, and get me initial results."

Already have an experiment plan (from Workflow 1 or your own)? /experiment-bridge turns it into running code:

📋 Parse the experiment plan (refine-logs/EXPERIMENT_PLAN.md)
💻 Implement experiment scripts (reuse existing code, add proper argparse/logging/seeds)
🔍 GPT-5.4 code review — cross-model review catches logic bugs before wasting GPU hours (code review: true by default)
✅ Sanity check — run the smallest experiment first to catch runtime bugs
🚀 Deploy full experiment suite to GPU via /run-experiment
📊 Collect initial results and update the experiment tracker

┌─────────────────────────────────────────────────────────────────┐
│                Workflow 1.5: Experiment Bridge                    │
│                                                                  │
│   EXPERIMENT_PLAN.md                                             │
│         │                                                        │
│         ▼                                                        │
│   ┌──────────┐     ┌──────────┐     ┌──────────┐               │
│   │ Claude   │────▶│ GPT-5.4  │────▶│ Sanity   │               │
│   │ Code     │     │ xhigh    │     │ Check    │               │
│   │ writes   │     │ reviews  │     │ (1 GPU)  │               │
│   │ code     │     │ code     │     │          │               │
│   └──────────┘     └──────────┘     └──────────┘               │
│                                          │                       │
│                                          ▼                       │
│   ┌──────────┐     ┌──────────┐     ┌──────────┐               │
│   │ Collect  │◀────│ Monitor  │◀────│ Deploy   │               │
│   │ results  │     │ progress │     │ to GPUs  │               │
│   │          │     │ (+ W&B)  │     │          │               │
│   └──────────┘     └──────────┘     └──────────┘               │
│         │                                                        │
│         ▼                                                        │
│   Ready for /auto-review-loop                                    │
└─────────────────────────────────────────────────────────────────┘

Skills involved: experiment-bridge + run-experiment + monitor-experiment

💡 One-command shortcut: /experiment-bridge reads refine-logs/EXPERIMENT_PLAN.md automatically. Or point it to any plan: /experiment-bridge "my_plan.md".

⚙️ CODE_REVIEW, AUTO_DEPLOY, SANITY_FIRST, MAX_PARALLEL_RUNS are configurable — see Customization.

Workflow 2: Auto Research Loop 🔁 (sleep & wake up to results)

"Review my paper, fix what's wrong, repeat until it's good."

GPT-5.4 reviews → identifies weaknesses → suggests experiments → Claude Code writes scripts, deploys to GPU, monitors results, rewrites the paper — all while you sleep. Just add your GPU server config to CLAUDE.md.

┌─────────────────────────────────────────────────────────────┐
│                    Auto Review Loop                          │
│                                                              │
│   /research-review          /auto-review-loop                │
│   (single deep review)      (autonomous loop)                │
│         │                         │                          │
│         ▼                         ▼                          │
│   ┌──────────┐   ┌──────────┐   ┌──────────┐               │
│   │ External  │──▶│ Implement│──▶│ Monitor  │──▶ repeat     │
│   │ LLM      │   │ fixes    │   │ results  │    until       │
│   │ reviews  │   │ & run    │   │          │    score ≥ 6   │
│   └──────────┘   │ experiments│  └──────────┘               │
│                   └──────────┘                               │
│                                                              │
│   When reviewer suggests a new method direction:             │
│   /novelty-check — verify idea isn't already published       │
│                                                              │
│   Supporting skills:                                         │
│   /run-experiment    — deploy to local/remote GPU            │
│   /analyze-results   — interpret experiment outputs          │
│   /monitor-experiment — check progress, collect results      │
└─────────────────────────────────────────────────────────────┘

Skills involved: auto-review-loop + research-review + novelty-check + run-experiment + analyze-results + monitor-experiment

💡 One-command shortcut: /auto-review-loop "your paper topic" runs this entire workflow automatically.

What to pass as argument? A short topic or scope is enough — the skill automatically reads your project's narrative docs (NARRATIVE_REPORT.md), memory files, experiment results, and prior reviews to build the full context for GPT-5.4. Examples:

/auto-review-loop "factorized gap in discrete diffusion LMs" — broad topic, skill finds everything

/auto-review-loop "focus on Section 3-5, our CRF results are weak" — targeted scope with hints

/auto-review-loop — also works: skill reads project files and infers the topic

🛡️ Key safety features:

🔒 MAX_ROUNDS = 4 — prevents infinite loops; stops early if score threshold is met
⏱️ > 4 GPU-hour experiments skipped — won't launch massive jobs; flags them for manual follow-up
🧠 Prefer reframing over new experiments — when both can address a weakness, chooses the cheaper path
🪞 No hiding weaknesses — explicit rule: "Do NOT hide weaknesses to game a positive score"
🔧 Fix before re-review — must actually implement fixes before resubmitting; no empty promises
💾 Compact recovery — persists state (REVIEW_STATE.json) after each round. If the context window fills up and auto-compacts mid-loop, the workflow reads the state file and resumes from where it left off — no human intervention needed

⚙️ MAX_ROUNDS, score threshold, and GPU limits are configurable — see Customization.

📝 Blog post: 开源 | 睡觉 Claude 自动跑实验改文

Workflow 3: Paper Writing Pipeline 📝

"Turn my research narrative into a submission-ready PDF." Requires a local LaTeX environment — see Prerequisites.

┌─────────────────────────────────────────────────────────────┐
│                   Paper Writing Pipeline                      │
│                                                               │
│   /paper-plan      /paper-figure     /paper-write             │
│   (outline)        (plots & tables)  (LaTeX draft)            │
│        │                │                 │                   │
│        ▼                ▼                 ▼                   │
│   ┌──────────┐    ┌──────────┐     ┌──────────┐              │
│   │ Claims-  │───▶│ Generate │────▶│ Section  │──┐           │
│   │ Evidence │    │ figures, │     │ by       │  │           │
│   │ Matrix + │    │ tables,  │     │ section  │  │           │
│   │ Section  │    │ LaTeX    │     │ LaTeX    │  │           │
│   │ Plan     │    │ includes │     │ draft    │  │           │
│   └──────────┘    └──────────┘     └──────────┘  │           │
│        │                                          │           │
│        │         /paper-compile                   │           │
│        │         (build PDF)                      │           │
│        │              │                           │           │
│        ▼              ▼                           ▼           │
│   ┌──────────────────────────────────────────────────┐       │
│   │ NARRATIVE_REPORT.md ──► PAPER_PLAN.md ──► paper/ │       │
│   │    (input)             (outline)      (LaTeX+PDF)│       │
│   └──────────────────────────────────────────────────┘       │
│                                                               │
│   Typical flow:                                               │
│   1. Write NARRATIVE_REPORT.md (from Workflow 2 results)      │
│   2. /paper-plan (claims-evidence matrix + section plan)      │
│   3. /paper-figure (comparison tables, training curves, etc.) │
│   4. /paper-write (section-by-section LaTeX generation)       │
│   5. /paper-compile (build PDF, fix errors, page check)       │
│   6. /auto-paper-improvement-loop (review ×2 + format check)  │
└─────────────────────────────────────────────────────────────┘

Skills involved: paper-plan + paper-figure + paper-write + paper-compile + auto-paper-improvement-loop + (post-acceptance) paper-poster + paper-slides

One-command shortcut: /paper-writing "NARRATIVE_REPORT.md" runs this entire workflow automatically.

Input: A NARRATIVE_REPORT.md describing the research: claims, experiments, results, figures. The more detailed the narrative (especially figure descriptions and quantitative results), the better the output. See templates/NARRATIVE_REPORT_TEMPLATE.md for a complete example.

Output: A submission-ready paper/ directory with LaTeX source, clean .bib (only cited entries), and compiled PDF.

Key features:

📐 Claims-Evidence Matrix — every claim maps to evidence, every experiment supports a claim
📊 Auto figure generation — line plots, bar charts, comparison tables from JSON data
🧹 Clean bib — automated filtering removes uncited entries (948→215 lines in testing). Real BibTeX from DBLP/CrossRef instead of LLM-generated entries
📄 Flexible sections — 5-8 sections depending on paper type (theory papers often need 7)
🔍 GPT-5.4 review — each step optionally reviewed by external LLM
✂️ De-AI polish — removes AI writing patterns (delve, pivotal, landscape...)
🎯 Page verification — pdftotext-based precise check that main body fits page limit

⚠️ Figure generation scope: /paper-figure auto-generates data-driven plots (training curves, bar charts, heatmaps) and comparison tables from JSON/CSV. For architecture diagrams and method figures: illustration: gemini (default) uses Claude→Gemini→Nano Banana Pro for publication-quality diagrams; illustration: mermaid generates Mermaid diagrams for free; illustration: false skips AI figures entirely.

Gemini API setup (for illustration: gemini): Get your API key at Google AI Studio, then set it as an environment variable: export GEMINI_API_KEY="your-key". Or add to your shell profile (~/.zshrc / ~/.bashrc). No other dependencies needed.

Tested end-to-end: Generated a 9-page ICLR 2026 theory paper (7 sections, 29 citations, 4 figures, 2 comparison tables) from a single NARRATIVE_REPORT.md — zero compilation errors, zero undefined references.

Auto Paper Improvement Loop ✨

After Workflow 3 generates the paper, /auto-paper-improvement-loop runs 2 rounds of GPT-5.4 xhigh content review → fix → recompile, plus a final format compliance check, autonomously polishing the paper from rough draft to submission-ready.

Score Progression (Real Test — ICLR 2026 theory paper):

Round	Score	Key Changes
Round 0	4/10 (content)	Baseline
Round 1	6/10 (content)	Fixed assumptions, softened claims, renamed notation
Round 2	7/10 (content)	Added synthetic validation, stronger limitations
Round 3	5→8.5/10 (format)	Removed hero fig, appendix, compressed conclusion, float spacing

Final: 8 pages main body (ICLR limit: 9), 0 overfull hbox, ICLR-compliant. +4.5 points across 3 rounds.

CRITICAL — Assumption-model mismatch: A boundedness assumption contradicted the model's distributional family. Replaced with a tail-compatible assumption and added formal truncation bridge.
CRITICAL — Theory-practice gap: Theory assumes idealized encoders, experiments use learned nonlinear encoders. Softened "validate" → "demonstrate practical relevance" and added explicit disclaimer.
MAJOR — Missing quantitative metrics: Added parameter count table (latent vs total) with honest accounting of system cost.
MAJOR — Theorem not self-contained: Added "Interpretation" paragraph listing all dependencies explicitly.
MAJOR — Overclaim in novelty statement: Scoped a broad "first convergence guarantee" to precise conditions under which it holds.
MAJOR — Notation confusion: Renamed a symbol that clashed with another key variable. Added Notation paragraph.

MAJOR — Missing theory-aligned experiments: Added a synthetic validation subsection directly testing the two main theoretical predictions under controlled conditions.
MAJOR — Overclaim softening: Replaced strong equivalence claims with appropriately hedged language across all files.
MAJOR — Informal theoretical argument: Formalized an informal justification into a proper proposition with explicit error bounds.
MINOR — Weak limitations: Expanded to explicitly list all assumptions and acknowledge missing standard evaluations.

Removed hero figure block (saved ~0.7 pages)
Compressed conclusion from 15→9 lines
Moved synthetic validation to Appendix A
Moved comparison tables to Appendix B
Fixed overfull hbox (85pt) with \resizebox
Added compact float spacing (\captionsetup, \textfloatsep)
Inlined centered question block in introduction
Tightened itemize environments

Workflow 4: Rebuttal 📝 (reply to reviewers safely)

"Reviews are in. Help me draft a safe, grounded rebuttal."

Got reviews back? /rebuttal parses them, builds a strategy, and drafts a venue-compliant response:

📋 Parse — normalize reviews, validate venue rules (character limit, text-only, etc.)
🔍 Atomize — split each review into issue cards (type, severity, reviewer stance)
🗺️ Strategize — global themes, per-reviewer priorities, character budget, blocked claims
🧪 Evidence sprint — if auto experiment: true, auto-run supplementary experiments via /experiment-bridge
✍️ Draft — global opener + numbered per-reviewer responses + closing for meta-reviewer
🛡️ Safety check — 6 lints: coverage, provenance, commitment, tone, consistency, limit
🔬 GPT-5.4 stress test — internal skeptical review of the draft
📄 Finalize — two outputs: PASTE_READY.txt (exact character count) + REBUTTAL_DRAFT_rich.md (extended version for manual editing)
🔄 Follow-up rounds — delta replies for reviewer discussions, technically escalating

┌─────────────────────────────────────────────────────────────────┐
│                   Workflow 4: Rebuttal                            │
│                                                                  │
│   Reviews arrive                                                 │
│         │                                                        │
│         ▼                                                        │
│   ┌──────────┐     ┌──────────┐     ┌──────────┐               │
│   │ Parse &  │────▶│ Strategy │────▶│ Evidence  │               │
│   │ atomize  │     │ plan     │     │ sprint    │               │
│   │ reviews  │     │          │     │ (optional)│               │
│   └──────────┘     └──────────┘     └──────────┘               │
│                                          │                       │
│                                          ▼                       │
│   ┌──────────┐     ┌──────────┐     ┌──────────┐               │
│   │ Finalize │◀────│ GPT-5.4  │◀────│ Draft    │               │
│   │ 2 versions│    │ stress   │     │ rebuttal │               │
│   │          │     │ test     │     │          │               │
│   └──────────┘     └──────────┘     └──────────┘               │
│         │                                                        │
│         ▼                                                        │
│   PASTE_READY.txt (strict) + RICH.md (extended)                  │
│         │                                                        │
│         ▼                                                        │
│   Follow-up rounds (delta replies, per-reviewer threads)         │
└─────────────────────────────────────────────────────────────────┘

Skills involved: rebuttal

💡 Quick mode: /rebuttal — quick mode: true stops after parsing + strategy (Phase 0-3). See what reviewers want before committing to a full draft.

⚙️ VENUE, AUTO_EXPERIMENT, QUICK_MODE, MAX_STRESS_TEST_ROUNDS are configurable — see Customization.

Three safety gates — rebuttal will NOT finalize if any fails:

🔒 Provenance — every claim maps to paper/review/user-confirmed result. No fabrication.
🔒 Commitment — every promise is user-approved. No overpromising.
🔒 Coverage — every reviewer concern is tracked. Nothing disappears.

🧰 All Skills

🚀 Full Pipeline

Skill	Description	Codex MCP?
🏗️ `research-pipeline`	End-to-end: Workflow 1 → 1.5 → 2 → 3, from research direction to submission	Yes

🔍 Workflow 1: Idea Discovery & Method Refinement

Skill	Description	Codex MCP?
🔭 `idea-discovery`	Pipeline orchestrator — runs all skills below in sequence	Yes
├ 📚 `research-lit`	Multi-source literature search (Zotero + Obsidian + local PDFs + arXiv API + web)	No
├ 💡 `idea-creator`	Brainstorm 8-12 ideas, filter by feasibility, pilot on GPU, rank by signal	Yes
├ 🔍 `novelty-check`	Verify idea novelty against recent literature (multi-source + GPT-5.4 cross-check)	Yes
├ 🔬 `research-review`

🔗 Workflow 1.5: Experiment Bridge

Skill	Description	Codex MCP?
🔗 `experiment-bridge`	Read experiment plan → implement code → sanity check → deploy to GPU → collect initial results	No
├ 🚀 `run-experiment`	Deploy experiments to local (MPS/CUDA) or remote GPU servers	No
└ 👀 `monitor-experiment`	Monitor running experiments, check progress, collect results	No

🔁 Workflow 2: Auto Research Loop

Skill	Description	Codex MCP?
🔁 `auto-review-loop`	Pipeline orchestrator — autonomous review→fix→re-review (max 4 rounds)	Yes
├ 🔬 `research-review`	Deep review from external LLM (shared with Workflow 1)	Yes
├ 🔍 `novelty-check`	Verify novelty when reviewer suggests new directions	Yes
├ 🚀 `run-experiment`	Deploy experiments to local (MPS/CUDA) or remote GPU servers	No
├ 📊 `analyze-results`	Analyze experiment results, compute statistics, generate insights	No
└ 👀 `monitor-experiment`

📝 Workflow 3: Paper Writing

Skill	Description	Codex MCP?
📝 `paper-writing`	Pipeline orchestrator — runs all skills below in sequence	Yes
├ 📐 `paper-plan`	Claims-evidence matrix, section structure, figure plan, citation scaffolding	Yes
├ 📊 `paper-figure`	Publication-quality matplotlib/seaborn plots + LaTeX comparison tables	Optional
├ 🎨 `paper-illustration`	AI-generated architecture diagrams and method figures via Gemini (when `illustration: true`)	No (needs Gemini API)
├ ✍️ `paper-write`	Section-by-section LaTeX generation (ICLR/NeurIPS/ICML). Anti-hallucination BibTeX via DBLP/CrossRef

📝 Workflow 4: Rebuttal

Skill	Description	Codex MCP?
📝 `rebuttal`	Parse reviews → atomize → strategy → draft → safety check → stress test → finalize (2 versions) → follow-up	Yes

🛠️ Standalone / Utility

Skill	Description	Codex MCP?
📄 `arxiv`	Search, download, and summarize arXiv papers. Standalone or `/research-lit` supplement	No
🎨 `pixel-art`	Generate pixel art SVG illustrations for READMEs, docs, or slides	No
📱 `feishu-notify`	Feishu/Lark push (webhook) or interactive (bidirectional). Off by default	No

⚙️ Setup

Prerequisites

Claude Code installed

(For review skills) Codex CLI installed and configured as MCP server:

npm install -g @openai/codex
claude mcp add codex -s user -- codex mcp-server

(For Workflow 3: paper writing) LaTeX environment with latexmk and pdfinfo:

# macOS
brew install --cask mactex    # or: brew install basictex
brew install poppler          # provides pdfinfo

# Ubuntu/Debian
sudo apt install texlive-full latexmk poppler-utils

# Verify
latexmk --version && pdfinfo -v

If you only need Workflow 1 & 2 (idea discovery + auto review), LaTeX is not required.

Install Skills

git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep

# Install all skills globally
cp -r skills/* ~/.claude/skills/

# Or install specific skills
cp -r skills/auto-review-loop ~/.claude/skills/
cp -r skills/research-lit ~/.claude/skills/

Update Skills

cd Auto-claude-code-research-in-sleep
git pull

# Option A: Full update (overwrites all skills with latest version)
cp -r skills/* ~/.claude/skills/

# Option B: Safe update (only add NEW skills, keep your customizations)
cp -rn skills/* ~/.claude/skills/

# Option C: Update specific skills only
cp -r skills/experiment-bridge ~/.claude/skills/

💡 Which option? Use A if you haven't customized any skills. Use B if you've modified skills locally (new skills get added, your changes are preserved — but you'll miss upstream bug fixes in modified files). Use C to selectively update.

Usage

# Workflow 1: Idea Discovery
> /idea-discovery "your research direction"          # full pipeline
> /research-lit "topic"                              # just literature survey (all sources)
> /research-lit "topic" — sources: zotero, web        # mix and match sources
> /research-lit "topic" — arxiv download: true         # also download top arXiv PDFs
> /arxiv "discrete diffusion" — download               # standalone arXiv search + download
> /idea-creator "topic"                              # just brainstorm

# Workflow 2: Auto Research Loop
> /auto-review-loop "your paper topic"               # review → fix → repeat
> /research-review "your paper"                      # single deep review

# Workflow 3: Paper Writing
> /paper-writing "NARRATIVE_REPORT.md"               # full pipeline
> /paper-plan "NARRATIVE_REPORT.md"                  # just outline
> /paper-compile "paper/"                            # just compile

# Full Pipeline
> /research-pipeline "your research direction"       # Workflow 1 → 2 → 3 end-to-end

# Supporting Skills
> /run-experiment train.py --lr 1e-4 --epochs 100
> /analyze-results figures/*.json
> /monitor-experiment server5

🌙 Auto-Allow for Overnight Runs (Optional)

To run the auto-review loop without clicking permission prompts, add to .claude/settings.local.json:

{
  "permissions": {
    "allow": [
      "mcp__codex__codex",
      "mcp__codex__codex-reply",
      "Write",
      "Edit",
      "Skill(auto-review-loop)"
    ]
  }
}

When GPT-5.4 says "run an ablation study" or "add a baseline comparison", Claude Code automatically writes the experiment script and deploys it to your GPU server. For this to work, Claude Code needs to know your server environment.

Add your server info to your project's CLAUDE.md:

## Remote Server

- SSH: `ssh my-gpu-server` (key-based auth, no password)
- GPU: 4x A100
- Conda env: `research` (Python 3.10 + PyTorch)
- Activate: `eval "$(/opt/conda/bin/conda shell.bash hook)" && conda activate research`
- Code directory: `/home/user/experiments/`
- Use `screen` for background jobs: `screen -dmS exp0 bash -c '...'`

Claude Code reads this and knows how to SSH in, activate the environment, and launch experiments. GPT-5.4 (the reviewer) only decides what experiments to run — Claude Code figures out how based on your CLAUDE.md.

If you are already on the GPU server, you can add the following to your CLAUDE.md:

## GPU Environment

- This machine has direct GPU access (no SSH needed)
- GPU: 4x A100 80GB
- Experiment environment: `YOUR_CONDA_ENV` (Python 3.x + PyTorch)
- Activate before any Python command: `The command to activate your experiment environment` (uv, conda, etc.)
- Code directory: `/home/YOUR_USERNAME/YOUR_CODE_DIRECTORY/`

No server? The review and rewriting skills still work without GPU access. Only experiment-related fixes will be skipped (flagged for manual follow-up).

If you use Zotero to manage your paper library, /research-lit can search your collections, read your annotations/highlights, and export BibTeX — all before searching the web.

Recommended: zotero-mcp (1.8k⭐, semantic search, PDF annotations, BibTeX export)

# Install
uv tool install zotero-mcp-server   # or: pip install zotero-mcp-server

# Add to Claude Code (Local API — requires Zotero desktop running)
claude mcp add zotero -s user -- zotero-mcp -e ZOTERO_LOCAL=true

# Or use Web API (works without Zotero running)
claude mcp add zotero -s user -- zotero-mcp \
  -e ZOTERO_API_KEY=your_key -e ZOTERO_USER_ID=your_id

Get your API key at https://www.zotero.org/settings/keys

What it enables in /research-lit:

🔍 Search your Zotero library by topic (including semantic/vector search)
📂 Browse collections and tags
📝 Read your PDF annotations and highlights (what you personally found important)
📄 Export BibTeX for direct use in paper writing

Not using Zotero? No problem — /research-lit automatically skips Zotero and uses local PDFs + web search instead.

If you use Obsidian for research notes, /research-lit can search your vault for paper summaries, tagged references, and your own insights.

Recommended: mcpvault (760⭐, no Obsidian app needed, 14 tools, BM25 search)

# Add to Claude Code (point to your vault path)
claude mcp add obsidian-vault -s user -- npx @bitbonsai/mcpvault@latest /path/to/your/vault

Optional complement: obsidian-skills (13.6k⭐, by Obsidian CEO) — teaches Claude to understand Obsidian-specific Markdown (wikilinks, callouts, properties). Copy to your vault:

git clone https://github.com/kepano/obsidian-skills.git
cp -r obsidian-skills/.claude /path/to/your/vault/

What it enables in /research-lit:

🔍 Search your vault for notes on the research topic
🏷️ Find notes by tags (e.g., #paper-review, #diffusion-models)
📝 Read your processed summaries and insights (more valuable than raw papers)
🔗 Follow wikilinks to discover related notes

Not using Obsidian? No problem — /research-lit automatically skips Obsidian and works as before.

💡 Zotero + Obsidian together: Many researchers use Zotero for paper storage and Obsidian for notes. Both integrations work simultaneously — /research-lit checks Zotero first (raw papers + annotations), then Obsidian (your processed notes), then local PDFs, then web search.

arXiv Integration

/research-lit automatically queries the arXiv API for structured metadata (title, abstract, full author list, categories) — richer than web search snippets. No setup required.

By default, only metadata is fetched (no files downloaded). To also download the most relevant PDFs:

/research-lit "topic" — arxiv download: true              # download top 5 PDFs
/research-lit "topic" — arxiv download: true, max download: 10  # download up to 10

For standalone arXiv access, use the dedicated /arxiv skill:

/arxiv "attention mechanism"           # search
/arxiv "2301.07041" — download         # download specific paper

Get mobile notifications when experiments finish, reviews score, or checkpoints need your input — without sitting in front of the terminal.

Push Only (group cards)	Interactive (private chat)

Three modes — you choose per-project:

Mode	What happens	You need
Off (default)	Nothing. Pure CLI, no Feishu	Nothing
Push only	Webhook notifications at key events. Mobile push, no reply	Feishu bot webhook URL
Interactive	Full bidirectional. Approve/reject ideas, reply to checkpoints from Feishu	feishu-claude-code running

Group notifications with rich cards — experiment done, review scored, pipeline complete. Mobile push, no reply needed.

Step 1: Create a Feishu group bot

Open your Feishu group (or create a test group)
Group Settings → Bots → Add Bot → Custom Bot
Name it (e.g., ARIS Notifications), copy the Webhook URL
Security: add custom keyword ARIS (all notifications include this word), or leave unrestricted

Step 2: Create config file

cat > ~/.claude/feishu.json << 'EOF'
{
  "mode": "push",
  "webhook_url": "https://open.feishu.cn/open-apis/bot/v2/hook/YOUR_WEBHOOK_ID"
}
EOF

Step 3: Test it

curl -s -X POST "YOUR_WEBHOOK_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "msg_type": "interactive",
    "card": {
      "header": {"title": {"tag": "plain_text", "content": "🧪 ARIS Test"}, "template": "blue"},
      "elements": [{"tag": "markdown", "content": "Push mode working! 🎉"}]
    }
  }'

You should see a blue card in your group. Skills will now automatically send rich cards at key events:

Event	Card color	Content
Review scored ≥ 6	🟢 Green	Score, verdict, top weaknesses
Review scored < 6	🟠 Orange	Score, verdict, action items
Experiment complete	🟢 Green	Results table, delta vs baseline
Checkpoint waiting	🟡 Yellow	Question, options, context
Error	🔴 Red	Error message, suggested fix
Pipeline done	🟣 Purple	Score progression, deliverables

Everything Push mode does, plus bidirectional private chat with Claude Code via Feishu. Approve/reject ideas, reply to checkpoints, give custom instructions — all from your phone.

How it works: Push cards go to the group (everyone sees status). Interactive conversations happen in private chat with the bot (you reply, Claude Code acts on it).

Step 1: Complete Push setup above first (you'll keep both)

Step 2: Create a Feishu app on open.feishu.cn

Click Create Enterprise App → name it (e.g., ARIS Claude Bot) → create
Left menu → Add Capabilities → check Bot
Left menu → Permissions → search and enable these 5 permissions:

Permission	Scope	Why
`im:message`	Send & receive messages	Core messaging
`im:message:send_as_bot`	Send as bot	Bot replies
`im:message.group_at_msg:readonly`	Receive group @mentions	Group messages
`im:message.p2p_msg:readonly`	Receive private messages	⚠️ Easy to miss! Without this, the bot connects but never receives your messages
`im:resource`	Access attachments	Images/files

Left menu → Events & Callbacks → select Long Connection mode → add event: im.message.receive_v1 → save

⚠️ Important: The "Long Connection" page may show "未检测到应用连接信息" — this is normal. You need to start the bridge first (Step 3), then come back and save.

Left menu → Version Management → Create Version → fill description → Submit for Review

For personal/test Feishu organizations, approval is usually instant.

Step 3: Deploy the bridge

git clone https://github.com/joewongjc/feishu-claude-code.git
cd feishu-claude-code
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Configure
cp .env.example .env

Edit .env:

FEISHU_APP_ID=cli_your_app_id          # From app credentials page
FEISHU_APP_SECRET=your_app_secret      # From app credentials page
DEFAULT_MODEL=claude-opus-4-6          # ⚠️ Default is sonnet — change to opus for best results
DEFAULT_CWD=/path/to/your/project      # Working directory for Claude Code
PERMISSION_MODE=bypassPermissions      # Or "default" for safer mode

⚠️ Model matters: The default claude-sonnet-4-6 works but may struggle with complex project context. claude-opus-4-6 correctly identified 18 ARIS skills on first try where sonnet could not.

Start the bridge:

python main.py
# Expected output:
# ✅ 连接飞书 WebSocket 长连接（自动重连）...
# [Lark] connected to wss://msg-frontier.feishu.cn/ws/v2?...

For long-running use, put it in a screen session:

screen -dmS feishu-bridge bash -c 'cd /path/to/feishu-claude-code && source .venv/bin/activate && python main.py'

Step 4: Save event config — Go back to Feishu Open Platform → Events & Callbacks → the long connection should now show "已检测到连接" → Save

If you published the app version before the bridge was running, you may need to create a new version (e.g., 1.0.1) and re-publish after saving event config.

Step 5: Test private chat

In Feishu, find the bot in your contacts (search by app name)
Send it a message: 你好
It should reply via Claude Code

If the bot doesn't reply: Send /new to reset the session, then try again. Common issues:

Symptom	Cause	Fix
Bot connects but never receives messages	Missing `im:message.p2p_msg:readonly` permission	Add permission → create new version → publish
Bot replies but doesn't know your project	`DEFAULT_CWD` points to wrong directory	Edit `.env` → restart bridge
Bot replies but seems less capable	Using `claude-sonnet-4-6`	Change to `claude-opus-4-6` in `.env` → restart
Old session has stale context	Session cached from before config change	Send `/new` in chat to start fresh session
"未检测到应用连接信息" when saving events	Bridge not running yet	Start bridge first, then save event config

Step 6: Update ARIS config

cat > ~/.claude/feishu.json << 'EOF'
{
  "mode": "interactive",
  "webhook_url": "https://open.feishu.cn/open-apis/bot/v2/hook/YOUR_WEBHOOK_ID",
  "interactive": {
    "bridge_url": "http://localhost:5000",
    "timeout_seconds": 300
  }
}
EOF

Now skills will:

Push rich cards to the group (status notifications, everyone sees)
Private chat you for decisions (checkpoints, approve/reject, custom instructions)

Which skills send notifications?

Skill	Events	Push	Interactive
`/auto-review-loop`	Review scored (each round), loop complete	Score + verdict	+ wait for continue/stop
`/auto-paper-improvement-loop`	Review scored, all rounds done	Score progression	Score progression
`/run-experiment`	Experiments deployed	GPU assignment + ETA	GPU assignment + ETA
`/monitor-experiment`	Results collected	Results table	Results table
`/idea-discovery`	Phase transitions, final report	Summary at each phase	+ approve/reject at checkpoints
`/research-pipeline`	Stage transitions, pipeline done	Stage summary	+ approve/reject

Not using Feishu? No problem — without ~/.claude/feishu.json, all skills behave exactly as before. Zero overhead, zero side effects.

💡 Alternative IM platforms: The push-only webhook pattern works with any service that accepts incoming webhooks (Slack, Discord, DingTalk, WeChat Work). Just change the webhook_url and card format in feishu-notify/SKILL.md. For bidirectional support, see cc-connect (multi-platform bridge) or clawdbot-feishu.

🎛️ Customization

Skills are plain Markdown files. Fork and customize:

💡 Parameter pass-through: Parameters flow down the call chain automatically. For example, /research-pipeline "topic" — sources: zotero, arxiv download: true passes sources and arxiv download through idea-discovery all the way down to research-lit. You can set any downstream parameter at any level — just add — key: value to your command.
research-pipeline  ──→  idea-discovery      ──→  research-lit
                   ──→  experiment-bridge    ──→  run-experiment
                   ──→  auto-review-loop
                                            ──→  idea-creator
                                            ──→  novelty-check
                                            ──→  research-review

Full Research Pipeline (`research-pipeline`)

Constant	Default	Description	Pass-through
`AUTO_PROCEED`	true	Auto-continue with top-ranked option if user doesn't respond	→ `idea-discovery`
`ARXIV_DOWNLOAD`	false	Download top arXiv PDFs after literature search	→ `idea-discovery` → `research-lit`
`HUMAN_CHECKPOINT`	false	When `true`, pause after each review round for approval	→ `auto-review-loop`
`WANDB`	false	Auto-add W&B logging to experiments	→ `experiment-bridge` → `run-experiment`
`CODE_REVIEW`	true	GPT-5.4 reviews experiment code before deployment	→ `experiment-bridge`
`BASE_REPO`	false	GitHub repo URL to clone as base codebase for experiments	→ `experiment-bridge`
`COMPACT`	false	Generate compact summary files for short-context models and session recovery

Override inline: /research-pipeline "topic" — auto proceed: false, illustration: mermaid

Auto Review Loop (`auto-review-loop`)

Constant	Default	Description
`MAX_ROUNDS`	4	Maximum review→fix→re-review iterations
`POSITIVE_THRESHOLD`	6/10	Score at which the loop stops (submission-ready)
`> 4 GPU-hour skip`	4h	Experiments exceeding this are flagged for manual follow-up

Idea Discovery (`idea-discovery` / `idea-creator`)

Constant	Default	Description	Pass-through
`PILOT_MAX_HOURS`	2h	Skip any pilot estimated to take longer per GPU	—
`PILOT_TIMEOUT_HOURS`	3h	Hard timeout — kill runaway pilots, collect partial results	—
`MAX_PILOT_IDEAS`	3	Maximum number of ideas to pilot in parallel	—
`MAX_TOTAL_GPU_HOURS`	8h	Total GPU budget across all pilots	—
`AUTO_PROCEED`	true	Auto-continue with top-ranked option if user doesn't respond	—
`ARXIV_DOWNLOAD`	false	Download top arXiv PDFs after literature search	→ `research-lit`

Override inline: /idea-discovery "topic" — pilot budget: 4h per idea, sources: zotero, arxiv download: true

Experiment Bridge (`experiment-bridge`)

Constant	Default	Description
`CODE_REVIEW`	true	GPT-5.4 xhigh reviews code before deployment. Catches logic bugs before wasting GPU hours
`AUTO_DEPLOY`	true	Automatically deploy experiments after implementation + review. Set `false` to manually inspect
`SANITY_FIRST`	true	Run smallest experiment first to catch setup bugs before full deployment
`MAX_PARALLEL_RUNS`	4	Maximum experiments to deploy in parallel (limited by available GPUs)
`WANDB`	false	Auto-add W&B logging. Requires `wandb_project` in CLAUDE.md
`BASE_REPO`	false	GitHub repo URL to clone as base codebase for experiments

Override inline: /experiment-bridge — base repo: https://github.com/org/project

Literature Search (`research-lit`)

Constant	Default	Description
`PAPER_LIBRARY`	`papers/`, `literature/`	Local directories to scan for PDFs before searching online
`MAX_LOCAL_PAPERS`	20	Max local PDFs to scan (first 3 pages each)
`SOURCES`	`all`	Which sources to search: `zotero`, `obsidian`, `local`, `web`, or `all` (comma-separated)
`ARXIV_DOWNLOAD`	false	When `true`, download top relevant arXiv PDFs to PAPER_LIBRARY after search
`ARXIV_MAX_DOWNLOAD`	5	Maximum number of PDFs to download when `ARXIV_DOWNLOAD = true`

Override inline: /research-lit "topic" — sources: zotero, web, /research-lit "topic" — arxiv download: true, max download: 10

Paper Writing (`paper-write`)

Constant	Default	Description
`DBLP_BIBTEX`	true	Fetch real BibTeX from DBLP/CrossRef instead of LLM-generated entries
`TARGET_VENUE`	`ICLR`	Target venue: `ICLR`, `NeurIPS`, `ICML`, `CVPR`, `ACL`, `AAAI`, `ACM`
`ANONYMOUS`	true	Use anonymous author block for blind review
`MAX_PAGES`	9	Main body page limit (excluding references)
`ILLUSTRATION`	`gemini`	AI illustration mode: `gemini` (default, needs `GEMINI_API_KEY`), `mermaid` (free), or `false` (skip)

Override inline: /paper-write — target venue: NeurIPS, illustration: mermaid

General (all skills using Codex MCP)

Constant	Default	Description
`REVIEWER_MODEL`	`gpt-5.4`	OpenAI model used via Codex MCP. Also available: `gpt-5.3-codex`, `gpt-5.2-codex`, `o3`. See supported models for full list.

Prompt templates — tailor the review persona and evaluation criteria
allowed-tools — restrict or expand what each skill can do

🔀 Alternative Model Combinations

Don't have Claude / OpenAI API access? You can swap in other models — same cross-model architecture, different providers.

⭐ We strongly recommend Claude + GPT-5.4 (default setup). It's the most tested and reliable combination. Alternative setups work but may require prompt tuning.

	Executor	Reviewer	Need Claude API?	Need OpenAI API?	Guide
Default ⭐	Claude Opus/Sonnet	GPT-5.4 (Codex MCP)	Yes	Yes	Quick Start
Alt A	GLM-5 (Z.ai)	GPT-5.4 (Codex MCP)	No	Yes	Setup below
Alt B	GLM-5 (Z.ai)	MiniMax-M2.5	No	No	MINIMAX_MCP_GUIDE
Alt C	Any CC-compatible	Any OpenAI-compatible	No	No	LLM_API_MIX_MATCH_GUIDE
Alt D	Kimi-K2.5 / Qwen3.5+	GLM-5 / MiniMax-M2.5	No	No	ALI_CODING_PLAN_GUIDE
Alt E 🆓	DeepSeek-V3.1 / Qwen3-Coder	DeepSeek-R1 / Qwen3-235B	No	No	MODELSCOPE_GUIDE

Alt C supports tested providers: GLM (Z.ai), Kimi (Moonshot), LongCat (Meituan) as executors; DeepSeek, MiniMax as reviewers. Any OpenAI-compatible API should also work via the generic llm-chat MCP server. Alt D uses Alibaba Coding Plan — one API key for both executor and reviewer, 4 models included (Kimi, Qwen, GLM, MiniMax). Alt E uses ModelScope — free (2000 calls/day), one key, no automation restrictions. Alt G keeps Codex as executor but swaps the reviewer to Claude Code CLI via the local claude-review MCP bridge, with async polling for long paper/review prompts. Alt H uses Google Antigravity as the executor with native SKILL.md support — choose Claude Opus 4.6 (Thinking) or Gemini 3.1 Pro (high) as the execution model. Alt I keeps Codex as executor, adds only a thin skills-codex-gemini-review overlay, and routes the reviewer-aware predefined skills through the local gemini-review MCP bridge with direct Gemini API by default. It is the closest Gemini analogue to the existing Codex+Claude review path, while minimizing skill changes and now also covers poster PNG review via the same bridge. Free-tier availability, rate limits, and data-use terms remain subject to Google's current policy.

* Alt G normally relies on local Codex CLI and Claude Code CLI logins. Direct API keys are optional, not required.

Alt A: GLM + GPT

Only replace the executor (Claude → GLM), keep GPT-5.4 as reviewer via Codex MCP.

npm install -g @anthropic-ai/claude-code
npm install -g @openai/codex
codex setup   # set model to gpt-5.4

Configure ~/.claude/settings.json:

{
    "env": {
        "ANTHROPIC_AUTH_TOKEN": "your_zai_api_key",
        "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
        "API_TIMEOUT_MS": "3000000",
        "ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air",
        "ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7",
        "ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5"
    },
    "mcpServers": {
        "codex": {
            "command": "/opt/homebrew/bin/codex",
            "args": ["mcp-server"]
        }
    }
}

Codex CLI uses your existing OPENAI_API_KEY (from ~/.codex/config.toml or environment) — no extra config needed for the reviewer side.

Alt B: GLM + MiniMax

No Claude or OpenAI API needed. Uses a custom MiniMax MCP server instead of Codex (because MiniMax doesn't support OpenAI's Responses API). Full guide: docs/MINIMAX_MCP_GUIDE.md.

Alt C: Any Executor + Any Reviewer

Mix and match freely using the generic llm-chat MCP server. Supports any OpenAI-compatible API as reviewer. Full guide: docs/LLM_API_MIX_MATCH_GUIDE.md.

Example combinations: GLM + DeepSeek, Kimi + MiniMax, Claude + DeepSeek, LongCat + GLM, etc.

After Setup: Install Skills & Verify

git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep
cp -r skills/* ~/.claude/skills/
claude

⚠️ For non-Claude executors (GLM, Kimi, etc.): Let the model read through the project once to ensure skills are correctly parsed. This is especially important if you've rewritten skills to use a different reviewer MCP (e.g., mcp__llm-chat__chat instead of mcp__codex__codex) — the new executor needs to understand the changed tool call patterns:
Read through this project and verify all skills are working:
/idea-creator, /research-review, /auto-review-loop, /novelty-check,
/idea-discovery, /research-pipeline, /research-lit, /run-experiment,
/analyze-results, /monitor-experiment, /pixel-art

⚠️ Note: Alternative models may behave differently from Claude and GPT-5.4. You may need to tune prompt templates for best results. The core cross-model architecture remains the same.

📋 Roadmap

Done

Human-in-the-loop checkpoints — idea-discovery and research-pipeline pause at key decision points for user approval. Configurable via AUTO_PROCEED (default: auto-continue; set false to always wait)
Alternative model combinations — GLM + GPT, GLM + MiniMax fully documented with setup guides. No Claude or OpenAI API required
Workflow 3: Paper Writing Pipeline — full chain: /paper-plan → /paper-figure → /paper-write → /paper-compile. ICLR/NeurIPS/ICML templates, claims-evidence matrix, publication-quality figures, latexmk auto-fix. Inspired by claude-scholar, Research-Paper-Writing-Skills, baoyu-skills

Configurable REVIEWER_MODEL — all Codex-dependent skills support custom reviewer model (default gpt-5.4, also works with gpt-5.3-codex, gpt-5.2-codex, o3, etc.)
Local paper library scanning — /research-lit scans local papers/ and literature/ directories before external search, leveraging papers you've already read
Idea Discovery pipeline — /idea-discovery orchestrates research-lit → idea-creator → novelty-check → research-review in one command, with pilot experiments on GPU
Full research pipeline — /research-pipeline chains Workflow 1 (idea discovery) → implementation → Workflow 2 (auto-review-loop) end-to-end
Peer review skill — /peer-review for reviewing others' papers as a conference reviewer, with GPT-5.4 meta-review (planned; currently use /research-review with a paper PDF)
Cross-model collaboration — Claude Code (executor) × Codex GPT-5.4 xhigh (reviewer) architecture, avoiding single-model self-play local minima
Feishu/Lark integration — three modes (off/push/interactive), configurable via ~/.claude/feishu.json. Push-only needs just a webhook URL; interactive uses feishu-claude-code. Off by default — zero impact on existing workflows. See

Planned

Daemon mode — auto-restart Claude Code session via launchd/systemd for true unattended operation. Currently the orchestration layer requires an active CLI session; state files (REVIEW_STATE.json, AUTO_REVIEW.md) support resuming across sessions, but relaunch is manual (#11)
Reference-style figure generation — read figures from reference PDFs → identify chart type, color scheme, layout → generate same-style figures with your own data. Sub-goal remaining: Data charts (extract color/font style → matplotlib rcParams). Method diagrams ✅ solved by paper-illustration
Workflow execution report — after each workflow (1/1.5/2/3) completes, auto-generate a structured summary: what was done, key decisions made, experiments run, results obtained, scores, and time spent. Output as WORKFLOW_REPORT.md for progress tracking, team reporting, and supervisor updates
Document-based pipeline input — support passing a detailed document (e.g., RESEARCH_BRIEF.md) as input to /research-pipeline or /idea-discovery instead of a one-line prompt. Many research directions need nuanced context (prior results, constraints, domain knowledge) that can't fit in a single sentence. The document would be parsed for problem definition, constraints, existing results, and specific requirements
Auto hyperparameter tuning skill — rewrite auto-hparam-tuning as an ARIS SKILL.md. 5-step cycle: understand project → plan tuning strategy → run experiments → analyze metrics (TensorBoard/W&B) → learn and iterate. Would plug into Workflow 1.5 () or Workflow 2 () when reviewer says "tune hyperparameters"

💬 Community

Domain-specific skills welcome! The core skills cover general research workflows, but every field has its own tools and patterns. We welcome PRs that add new skills for your domain — EDA, bioinformatics, robotics, HPC, or anything else. Just add a skills/your-skill/SKILL.md and open a PR. See dse-loop for an example.

Join the WeChat group for discussion on Claude Code + AI-driven research workflows:

📖 Citation

If you use ARIS in your research, please cite:

@misc{yang2026aris,
    author       = {Yang, Ruofeng and Li, Yongcan and Li, Shuai},
    title        = {ARIS: Fully Autonomous Research via Adversarial Multi-Agent Collaboration},
    year         = {2026},
    organization = {GitHub},
    url          = {https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep},
}

⭐ Star History

🙏 Acknowledgements

ARIS is inspired by:

🧪 AI Scientist (Sakana AI) — Automated research pioneer
📖 AutoResearch (Andrej Karpathy) — End-to-end research automation
🔭 FARS (Analemma) — Fully Automated Research System
🎨 PaperBanana (PKU) — Multi-agent academic illustration framework

This project builds on and integrates with many excellent open-source projects:

Core Infrastructure

Claude Code — Anthropic's CLI for Claude, the execution backbone
Codex CLI — OpenAI's CLI, used as MCP server for cross-model review

Zotero Integration (setup guide)

zotero-mcp — Zotero MCP server with semantic search and PDF annotations
Zotero — Open-source reference manager

Obsidian Integration (setup guide)

mcpvault — Obsidian vault MCP server (no app required)
obsidian-skills — Claude Code skills for Obsidian Markdown by Steph Ango (Obsidian CEO)

Paper Writing Inspiration

claude-scholar — Academic paper writing with Claude
Research-Paper-Writing-Skills — Paper writing skill templates
baoyu-skills — Claude Code skills collection

Feishu/Lark Integration (setup guide)

feishu-claude-code — Bidirectional Feishu ↔ Claude Code bridge
clawdbot-feishu — Feishu bot for Claude
cc-connect — Multi-platform messaging bridge
lark-openapi-mcp — Official Lark MCP server

Community

awesome-agent-skills — Curated list of Claude Code skills (featured)

Special Thanks — Platform Adaptation

ARIS wouldn't run on so many platforms without these contributors:

🤖 @Falling-Flower — adapted all ARIS skills for Codex CLI using spawn_agent
🔧 @No-518 — ongoing maintenance of the Codex skill set, keeping parity with latest updates
🖱️ @YecanLee — wrote the Cursor adaptation guide and local GPU setup docs
🏆 @DefanXue & @Monglitay — first community paper built entirely with ARIS, scored 8/10 at CS conference

Special Thanks — Architecture & Vision

💡 @JingxuanKang — beyond code contributions (training-check, result-to-claim, ablation-planner, watchdog, templates, session recovery), deeply shaped ARIS through discussions on architecture design, compact mode, workflow state management, and the vision of what autonomous research workflows should look like. Many of today's core features — from structured project files to context-aware session recovery — grew out of these conversations.

License

MIT

— base repo: https://github.com/org/project

llm-chat

/research-pipeline "your topic" — AUTO_PROCEED: false                          # pause at idea selection gate
/research-pipeline "your topic" — human checkpoint: true                       # pause after each review round to give feedback
/research-pipeline "your topic" — sources: zotero, web                         # only search Zotero + web (skip local PDFs)
/research-pipeline "your topic" — arxiv download: true                         # download top arXiv PDFs during literature survey
/research-pipeline "your topic" — AUTO_PROCEED: false, human checkpoint: true  # combine options

/research-review

/paper-illustration

/experiment-bridge

/auto-review-loop

Auto-claude-code-research-in-sleep

README

Related Skills

n8n

everything-claude-code

gemini-cli