n8n
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
npx n8nFair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
npx n8nThe agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
git clone https://github.com/affaan-m/everything-claude-code.gitAn open-source AI agent that brings the power of Gemini directly into your terminal.
npx @google/gemini-cliไธญๆ็ README | English

๐ Let Claude Code do research while you sleep. Wake up to find your paper scored, weaknesses identified, experiments run, and narrative rewritten โ autonomously.
๐ชถ Radically lightweight โ zero dependencies, zero lock-in. The entire system is plain Markdown files. No framework to learn, no database to maintain, no Docker to configure, no daemon to babysit. Every skill is a single
SKILL.mdreadable by any LLM โ swap Claude Code for Codex CLI, OpenClaw, Cursor, Trae, Antigravity, Windsurf, or your own agent and the workflows still work. Fork it, rewrite it, adapt it to your stack.๐ก ARIS is a methodology, not a platform. What matters is the research workflow โ take it wherever you go. ๐ฑ
ยท
ยท
ยท
ยท ๐ฌ Join Community ยท
Custom Claude Code skills for autonomous ML research workflows. These skills orchestrate cross-model collaboration โ Claude Code drives the research while an external LLM (via Codex MCP) acts as a critical reviewer. ๐ Also supports alternative model combinations (Kimi, LongCat, DeepSeek, etc.) โ no Claude or OpenAI API required. For example, MiniMax-M2.7 + GLM-5 or GLM-5 + MiniMax-M2.7. ๐ค Codex CLI native โ full skill set also available for OpenAI Codex. ๐ฑ๏ธ Cursor โ works in Cursor too. ๐ฅ๏ธ Trae โ ByteDance AI IDE. ๐ Antigravity โ Google's agent-first IDE. ๐ Free tier via ModelScope โ zero cost, zero lock-in.
๐ญ Why not self-play with a single model? Using Claude Code subagents or agent teams for both execution and review is technically possible, but tends to fall into local minima โ the same model reviewing its own patterns creates blind spots.
Think of it like adversarial vs. stochastic bandits: a single model self-reviewing is the stochastic case (predictable reward noise), while cross-model review is adversarial (the reviewer actively probes weaknesses the executor didn't anticipate) โ and adversarial bandits are fundamentally harder to game.
๐ญ Why two models, not more? Two is the minimum needed to break self-play blind spots, and 2-player games converge to Nash equilibrium far more efficiently than n-player ones. Adding more reviewers increases API cost and coordination overhead with diminishing returns โ the biggest gain is going from 1โ2, not 2โ4.
Claude Code's strength is fast, fluid execution; Codex (GPT-5.4 xhigh) is slower but more deliberate and rigorous in critique. These complementary styles โ speed ร rigor โ produce better outcomes than either model talking to itself.
These are full pipelines โ you can also use each workflow independently. Already have an idea? Skip to Workflow 1.5. Have results? Jump to Workflow 3. Got reviews? Jump to Workflow 4. See Quick Start for all commands and Workflows for the full breakdown.
Basic mode โ give ARIS a research direction, it handles everything:
/research-pipeline "factorized gap in discrete diffusion LMs"
๐ฅ Targeted mode โ got a paper you want to improve? Give ARIS the paper + the code:
/research-pipeline "improve method X" โ ref paper: https://arxiv.org/abs/2406.04329, base repo: https://github.com/org/project
ARIS reads the paper โ finds its weaknesses โ clones the codebase โ generates ideas that specifically fix those weaknesses with that code โ runs experiments โ writes your paper. Like telling a research assistant: "read this paper, use this repo, find what's missing, and fix it."
Mix and match:
ref paperonly = "what can be improved?",base repoonly = "what can I build with this code?", both = "improve this paper using this code."
๐ฅ Rebuttal mode โ reviews just dropped? Don't panic. ARIS reads every concern, builds a strategy, and drafts a rebuttal that's grounded, structured, and under the character limit:
/rebuttal "paper/ + reviews" โ venue: ICML, character limit: 5000
| Parameter | Default | What it does |
|---|---|---|
venue | ICML | Target venue (ICML/NeurIPS/ICLR/CVPR/ACL/AAAI/ACM) |
character limit | โ | Required. Hard character limit for rebuttal text |
quick mode | false | Stop after parsing + strategy (Phase 0-3). See what reviewers want before drafting |
auto experiment | false | Auto-run supplementary experiments via /experiment-bridge when reviewers ask for new evidence |
max stress test rounds | 1 | How many times GPT-5.4 xhigh stress-tests the draft |
max followup rounds | 3 | Per-reviewer follow-up round limit |
Three safety gates โ rebuttal will NOT finalize if any fails:
Two outputs: PASTE_READY.txt (exact char count, paste to venue) + REBUTTAL_DRAFT_rich.md (extended version for manual editing).
After acceptance โ your paper is in, now prepare the presentation:
/paper-slides "paper/" # โ Beamer PDF + PPTX + speaker notes + Q&A prep
/paper-poster "paper/" # โ A0/A1 poster PDF + editable PPTX + SVG
๐ก From idea to paper to podium โ one toolchain. ๐ฑ
| Paper | Score | Venue | Author | Stack |
|---|---|---|---|---|
| CS Paper | 8/10 "clear accept" | CS Conference | @DefanXue & @Monglitay | Claude Code + GPT-5.4 |
| AAAI Paper | 7/10 "good paper, accept" | AAAI 2026 Main Technical | @xinbo820-web | Pure Codex CLI |
๐ Built entirely with ARIS โ from idea to acceptance. Full details + reviewer screenshots โ
/rebuttal โ post-submission rebuttal pipeline. Parse reviews โ atomize โ strategy โ draft โ safety check โ GPT-5.4 stress test โ finalize (strict + rich versions) โ follow-up rounds. 3 safety gates (no fabrication, no overpromise, full coverage). quick mode for analysis only. auto experiment for supplementary experiments. Designed from 5 successful rebuttal case studies + 3 rounds GPT-5.4 xhigh design review/training-check, /result-to-claim, /ablation-planner. ๐ฆ compact mode โ generate lean summary files for short-context models and session recovery (โ compact: true). ๐ research-refine checkpoint โ auto-resume after interruption. Community contributions by @JingxuanKang & @couragec2026-03-18 โ ๐ค paper-slides + ๐ Codex+Claude bridge + ๐ฑ๏ธ Cursor guide + ๐ค Codex CLI skills + ๐ grant-proposal + ๐จ paper-illustration (Gemini) + ๐ CitationClaw
2026-03-17 โ ๐ง Git code sync + ๐ ModelScope guide + parameter pass-through
2026-03-16 โ ๐ฌ research-refine + experiment-plan โ turn vague ideas into problem-anchored proposals with claim-driven experiment roadmaps. Now integrated into Workflow 1 (/idea-discovery). Community contribution by @zjYao36
2026-03-16 โ ๐จ๐ณ Alibaba Coding Plan guide โ one API key, 4 models (Kimi-K2.5 + Qwen3.5+ + GLM-5 + MiniMax-M2.5), dual-endpoint setup. Community contribution by @tianhao909
2026-03-15 โ ๐ Bring your own model! Any OpenAI-compatible API now works as reviewer via MCP server. GLM, MiniMax, Kimi, LongCat, DeepSeek all tested โ
# 1. Install skills
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cp -r Auto-claude-code-research-in-sleep/skills/* ~/.claude/skills/
# 2. Set up Codex MCP (for review skills)
npm install -g @openai/codex
codex setup # set model to gpt-5.4 when prompted
claude mcp add codex -s user -- codex mcp-server
# 3. Use in Claude Code
claude
> /idea-discovery "your research direction" # Workflow 1 โ be specific! not "NLP" but "factorized gap in discrete diffusion LMs"
> /experiment-bridge # Workflow 1.5 โ have a plan? implement + deploy + collect results
> /auto-review-loop "your paper topic or scope" # Workflow 2: review โ fix โ re-review overnight
> /paper-writing "NARRATIVE_REPORT.md" # Workflow 3: narrative โ polished PDF
> /rebuttal "paper/ + reviews" โ venue: ICML # Workflow 4: parse reviews โ draft rebuttal โ follow-up
> /research-pipeline "your research direction" # Full pipeline: Workflow 1 โ 1.5 โ 2 โ 3 end-to-end
๐ Templates available! See
templates/for ready-to-use input templates for every workflow โ research brief (Workflow 1), experiment plan (Workflow 1.5), narrative report (Workflow 3), paper plan (Workflow 3).
Tip: All pipeline behaviors are configurable via inline overrides โ append
โ key: valueto any command:
Parameter Default What it does AUTO_PROCEEDtrueAuto-continue at idea selection gate. Set falseto manually pick which idea to pursue before committing GPU timehuman checkpointfalsePause after each review round so you can read the score, give custom modification instructions, skip specific fixes, or stop early sourcesallWhich literature sources to search: zotero,obsidian,local,web, orall(comma-separated)arxiv downloadfalseDownload top relevant arXiv PDFs during literature survey. When false, only fetches metadata (title, abstract, authors)DBLP_BIBTEXtrueFetch real BibTeX from DBLP/ instead of LLM-generated entries. Eliminates hallucinated citations. Zero install
Important: Codex MCP uses the model from
~/.codex/config.toml, not from skill files. Make sure it saysmodel = "gpt-5.4"(recommended). Other options:gpt-5.3-codex,gpt-5.2-codex,o3. Runcodex setupor edit the file directly.
Want Codex to execute but Claude Code to review? See
docs/CODEX_CLAUDE_REVIEW_GUIDE.md. That path installs the baseskills/skills-codex/*, then overlaysskills/skills-codex-claude-review/*, and routes review-heavy skills through the localclaude-reviewMCP bridge.
Want Codex to execute but Gemini to review locally? See
docs/CODEX_GEMINI_REVIEW_GUIDE.mdand CN. That path installs the baseskills/skills-codex/*, then overlaysskills/skills-codex-gemini-review/*, and routes the reviewer-aware predefined skills through the localgemini-reviewMCP bridge using direct Gemini API by default.
See full setup guide for details and alternative model combinations if you don't have Claude/OpenAI API.
๐ 31 composable skills โ mix and match, or chain into full pipelines (/idea-discovery, /auto-review-loop, /paper-writing, /research-pipeline)
๐ Literature & novelty โ multi-source paper search (Zotero + Obsidian + local PDFs + arXiv/Scholar) + cross-model novelty verification
๐ก Idea discovery โ literature survey โ brainstorm 8-12 ideas โ novelty check โ GPU pilot experiments โ ranked report
๐ Auto review loop โ 4-round autonomous review, 5/10 โ 7.5/10 overnight with 20+ GPU experiments
๐ Paper writing โ narrative โ outline โ figures โ LaTeX โ PDF โ auto-review (4/10 โ 8.5/10), one command. Anti-hallucination citations via DBLP/CrossRef
๐ค Cross-model collaboration โ Claude Code executes, GPT-5.4 xhigh reviews. Adversarial, not self-play
๐ Peer review โ review others' papers as a conference reviewer, with structured scoring and meta-review
๐ฅ๏ธ Review-driven experiments โ when GPT-5.4 says "run an ablation", Claude Code automatically writes the script, rsyncs to your GPU server, launches in screen, collects results, and folds them back into the paper. Just configure your server in CLAUDE.md (setup guide)
A real overnight 4-round run on an ML research project, from borderline reject to submission-ready:
| Round | Score | What Happened |
|---|---|---|
| Initial | 5.0/10 | Borderline reject |
| Round 1 | 6.5/10 | Added standard metrics, discovered metric decoupling |
| Round 2 | 6.8/10 | Key claim failed to reproduce, pivoted narrative |
| Round 3 | 7.0/10 | Large seed study killed main improvement claim |
| Round 4 | 7.5/10 โ | Diagnostic evidence solidified, submission ready |
The loop autonomously ran 20+ GPU experiments, rewrote the paper's narrative framing, and killed claims that didn't hold up โ all without human intervention.
Real projects where the ARIS pipeline was used end-to-end. If you've used ARIS to complete a paper, we'd love to feature it here โ open an issue or PR!
| Paper | Rating | Venue | Built by | Notes |
|---|---|---|---|---|
| CS Paper | 8/10 โ "Top 50% of accepted papers, clear accept" | CS Conference | @DefanXue & @Monglitay | Full ARIS pipeline: idea โ experiments โ auto-review โ paper writing. Reviewer: "empirical findings are stark, well-supported, and expose a fundamental flaw" |
| AAAI 2026 Paper | 7/10 โ "Good paper, accept" | AAAI 2026 Main Technical | @xinbo820-web | Pure Codex CLI (ARIS-Codex skills). Accepted at AAAI 2026 |
๐ Papers built entirely with ARIS โ from idea to acceptance. Know more? Let us know!
Domain-specific skills and external projects contributed by the community. PRs welcome โ just add a skills/your-skill/SKILL.md and open a PR!
๐ก How to use: Community skills are not auto-wired into core workflows. To use one, ask your executor (Claude Code / OpenClaw / etc.) to read the skill's
SKILL.md, then plug it into the appropriate workflow stage based on the description below.
๐ Community Skills (12): research-refine ยท experiment-plan ยท grant-proposal ยท paper-poster ยท paper-slides ยท mermaid-diagram ยท proof-writer ยท comm-lit-review ยท dse-loop ยท idea-discovery-robot ยท formula-derivation ยท
๐ External Projects & Docs (9): open-source-hardening-skills ยท CitationClaw ยท auto-hparam-tuning ยท Antigravity Adaptation Guide ยท OpenClaw Adaptation Guide ยท Cursor Adaptation Guide ยท Codex+Claude Review Bridge ยท Trae Adaptation Guide ยท paper-illustration
๐ Thanks to every contributor! We fold the tables below to keep the README readable โ but every skill and project here is equally valued. PRs always welcome!
| Name | Domain | Description | Codex MCP? |
|---|---|---|---|
๐ฌ research-refine | General | Turn a vague idea into a problem-anchored, implementation-oriented method proposal. Best inserted between /idea-discovery and /auto-review-loop | Yes |
๐งช experiment-plan | General | Turn a refined proposal into a claim-driven experiment roadmap with ablations, budgets, and run order | No |
๐งญ research-refine-pipeline | General | One-shot chain: /research-refine โ /experiment-plan for method refinement plus experiment planning | Yes |
๐ grant-proposal | General | Grant proposal drafting (KAKENHI/NSF/NSFC/ERC/DFG/SNSF/ARC/NWO). Chains /research-lit โ /novelty-check โ โ |
| Name | Domain | Description |
|---|---|---|
| ๐ก๏ธ open-source-hardening-skills | DevOps / OSS | 10-skill pipeline to harden research code into production-ready open-source projects โ audit, refactor, test, CI, docs, review |
| ๐ CitationClaw | General | Citation impact analysis โ input paper title โ citation crawling, scholar identification, tiered analysis, HTML dashboard |
| ๐ Antigravity Adaptation Guide | General | Use ARIS skills in Google Antigravity โ native SKILL.md support, dual model (Claude Opus 4.6 / Gemini 3.1 Pro), MCP setup, EN + CN guides |
| ๐พ OpenClaw Adaptation Guide | General | Use ARIS workflow methodology in OpenClaw โ skill-to-stage mapping, file-based orchestration, no Claude Code CLI needed |
| ๐ฑ๏ธ Cursor Adaptation Guide | General |
These skills compose into a full research lifecycle. The four workflows can be used independently or chained together:
/idea-discovery/experiment-bridge/auto-review-loop/paper-writing (or step by step: /paper-plan โ /paper-figure โ /paper-write โ /paper-compile โ /auto-paper-improvement-loop)/rebuttal โ parse reviews, draft safe rebuttal, follow-up rounds/research-pipeline + /rebuttal โ from idea to acceptanceโ ๏ธ Important: These tools accelerate research, but they don't replace your own critical thinking. Always review generated ideas with your domain expertise, question the assumptions, and make the final call yourself. The best research comes from human insight + AI execution, not full autopilot.
/research-lit โ /idea-creator โ /novelty-check โ /research-refine โ /experiment-bridge โ /auto-review-loop โ /paper-writing โ submit โ /rebuttal โ accept! ๐
(survey) (brainstorm) (verify novel) (refine method) (implement+deploy) (review & fix) (write paper) (send) (reply to reviewers)
โโโโโโโโโโโโโโโ Workflow 1: Idea Discovery โโโโโโโโโโโโโโโค โ Workflow 1.5 โโค โโโ Workflow 2 โโโค โโโ Workflow 3 โโโค โโโ Workflow 4 โโโค
๐ Blog post: ๆขฆไธญ็ง็ ๅ จๆต็จๅผๆบ
"What's the state of the art? Where are the gaps? How do we solve it?"
Don't have a concrete idea yet? Just give a research direction โ /idea-discovery handles the rest:
The output is a ranked IDEA_REPORT.md plus a refined proposal (refine-logs/FINAL_PROPOSAL.md) and experiment plan (refine-logs/EXPERIMENT_PLAN.md) for the top idea. Dead-end ideas are documented too, saving future exploration.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Idea Discovery & Method Refinement โ
โ โ
โ /research-lit /idea-creator /novelty-check โ
โ (find papers) (brainstorm) (verify novelty) โ
โ โ โ โ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Scan โโโโโถโ Generate โโโโโโถโ Check if โ โ
โ โ local โ โ 8-12 โ โ idea is โ โ
โ โ papers + โ โ ideas โ โ novel โ โ
โ โ search โ โ + rank โ โ โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ โ โ
โ โผ โผ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Filter โโโโโโถโ External โ โ
โ โ by cost, โ โ LLM โ โ
โ โ novelty โ โ evaluatesโ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ โ
โ /research-refine โผ โ
โ (refine method) โโโโโโโโโโโโ โ
โ โ โ Freeze โ โ
โ โผ โ problem โ โ
โ โโโโโโโโโโโโ โ anchor + โ โ
โ โ Iterate โโโโโโถโ refine โ โ
โ โ until โ โ method โ โ
โ โ scoreโฅ9 โ โโโโโโโโโโโโ โ
โ โโโโโโโโโโโโ โ โ
โ โ โผ โ
โ /experiment-plan โโโโโโโโโโโโ โ
โ โ โ Claim- โ โ
โ โผ โ driven โ โ
โ โโโโโโโโโโโโ โ experimentโ โ
โ โ Plan โโโโโโถโ roadmap โ โ
โ โ runs โ โโโโโโโโโโโโ โ
โ โโโโโโโโโโโโ โ
โ โ
โ Typical flow: โ
โ 1. /research-lit "discrete diffusion models" โ
โ 2. /idea-creator "DLLMs post training" โ
โ 3. Review ranked ideas, pick top 2-3 โ
โ 4. /novelty-check "top idea" (deep verification) โ
โ 5. /research-review "top idea" (critical feedback) โ
โ 6. /research-refine "top idea" (problem anchor + method) โ
โ 7. /experiment-plan (claim-driven roadmap) โ
โ 8. /run-experiment โ /auto-review-loop โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Skills involved: research-lit + idea-creator + novelty-check + research-review + research-refine-pipeline
๐ก One-command shortcut:
/idea-discovery "your research direction"runs this entire workflow automatically.
๐ Human-in-the-loop: Each phase presents results and waits for your feedback. Not happy? Tell it what's missing โ it refines the prompt and regenerates. Trust the defaults? It auto-proceeds with the top-ranked option. You decide how hands-on to be.
โ๏ธ Pilot experiment budgets (max hours, timeout, GPU budget) are configurable โ see Customization.
๐ Blog post: Claude Code ไธคๆ NeurIPS ๆๅ
"I have a plan. Now implement it, deploy it, and get me initial results."
Already have an experiment plan (from Workflow 1 or your own)? /experiment-bridge turns it into running code:
refine-logs/EXPERIMENT_PLAN.md)code review: true by default)/run-experimentโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Workflow 1.5: Experiment Bridge โ
โ โ
โ EXPERIMENT_PLAN.md โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Claude โโโโโโถโ GPT-5.4 โโโโโโถโ Sanity โ โ
โ โ Code โ โ xhigh โ โ Check โ โ
โ โ writes โ โ reviews โ โ (1 GPU) โ โ
โ โ code โ โ code โ โ โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Collect โโโโโโโ Monitor โโโโโโโ Deploy โ โ
โ โ results โ โ progress โ โ to GPUs โ โ
โ โ โ โ (+ W&B) โ โ โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ Ready for /auto-review-loop โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Skills involved: experiment-bridge + run-experiment + monitor-experiment
๐ก One-command shortcut:
/experiment-bridgereadsrefine-logs/EXPERIMENT_PLAN.mdautomatically. Or point it to any plan:/experiment-bridge "my_plan.md".
โ๏ธ
CODE_REVIEW,AUTO_DEPLOY,SANITY_FIRST,MAX_PARALLEL_RUNSare configurable โ see Customization.
"Review my paper, fix what's wrong, repeat until it's good."
GPT-5.4 reviews โ identifies weaknesses โ suggests experiments โ Claude Code writes scripts, deploys to GPU, monitors results, rewrites the paper โ all while you sleep. Just add your GPU server config to
CLAUDE.md.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Auto Review Loop โ
โ โ
โ /research-review /auto-review-loop โ
โ (single deep review) (autonomous loop) โ
โ โ โ โ
โ โผ โผ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ External โโโโถโ Implementโโโโถโ Monitor โโโโถ repeat โ
โ โ LLM โ โ fixes โ โ results โ until โ
โ โ reviews โ โ & run โ โ โ score โฅ 6 โ
โ โโโโโโโโโโโโ โ experimentsโ โโโโโโโโโโโโ โ
โ โโโโโโโโโโโโ โ
โ โ
โ When reviewer suggests a new method direction: โ
โ /novelty-check โ verify idea isn't already published โ
โ โ
โ Supporting skills: โ
โ /run-experiment โ deploy to local/remote GPU โ
โ /analyze-results โ interpret experiment outputs โ
โ /monitor-experiment โ check progress, collect results โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Skills involved: auto-review-loop + research-review + novelty-check + run-experiment + analyze-results + monitor-experiment
๐ก One-command shortcut:
/auto-review-loop "your paper topic"runs this entire workflow automatically.What to pass as argument? A short topic or scope is enough โ the skill automatically reads your project's narrative docs (
NARRATIVE_REPORT.md), memory files, experiment results, and prior reviews to build the full context for GPT-5.4. Examples:
/auto-review-loop "factorized gap in discrete diffusion LMs"โ broad topic, skill finds everything/auto-review-loop "focus on Section 3-5, our CRF results are weak"โ targeted scope with hints/auto-review-loopโ also works: skill reads project files and infers the topic
๐ก๏ธ Key safety features:
REVIEW_STATE.json) after each round. If the context window fills up and auto-compacts mid-loop, the workflow reads the state file and resumes from where it left off โ no human intervention neededโ๏ธ MAX_ROUNDS, score threshold, and GPU limits are configurable โ see Customization.
๐ Blog post: ๅผๆบ | ็ก่ง Claude ่ชๅจ่ทๅฎ้ชๆนๆ
"Turn my research narrative into a submission-ready PDF." Requires a local LaTeX environment โ see Prerequisites.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Paper Writing Pipeline โ
โ โ
โ /paper-plan /paper-figure /paper-write โ
โ (outline) (plots & tables) (LaTeX draft) โ
โ โ โ โ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Claims- โโโโโถโ Generate โโโโโโถโ Section โโโโ โ
โ โ Evidence โ โ figures, โ โ by โ โ โ
โ โ Matrix + โ โ tables, โ โ section โ โ โ
โ โ Section โ โ LaTeX โ โ LaTeX โ โ โ
โ โ Plan โ โ includes โ โ draft โ โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ โ
โ โ โ โ
โ โ /paper-compile โ โ
โ โ (build PDF) โ โ
โ โ โ โ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ NARRATIVE_REPORT.md โโโบ PAPER_PLAN.md โโโบ paper/ โ โ
โ โ (input) (outline) (LaTeX+PDF)โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Typical flow: โ
โ 1. Write NARRATIVE_REPORT.md (from Workflow 2 results) โ
โ 2. /paper-plan (claims-evidence matrix + section plan) โ
โ 3. /paper-figure (comparison tables, training curves, etc.) โ
โ 4. /paper-write (section-by-section LaTeX generation) โ
โ 5. /paper-compile (build PDF, fix errors, page check) โ
โ 6. /auto-paper-improvement-loop (review ร2 + format check) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Skills involved: paper-plan + paper-figure + paper-write + paper-compile + auto-paper-improvement-loop + (post-acceptance) paper-poster + paper-slides
One-command shortcut:
/paper-writing "NARRATIVE_REPORT.md"runs this entire workflow automatically.
Input: A NARRATIVE_REPORT.md describing the research: claims, experiments, results, figures. The more detailed the narrative (especially figure descriptions and quantitative results), the better the output. See templates/NARRATIVE_REPORT_TEMPLATE.md for a complete example.
Output: A submission-ready paper/ directory with LaTeX source, clean .bib (only cited entries), and compiled PDF.
Key features:
pdftotext-based precise check that main body fits page limitโ ๏ธ Figure generation scope:
/paper-figureauto-generates data-driven plots (training curves, bar charts, heatmaps) and comparison tables from JSON/CSV. For architecture diagrams and method figures:illustration: gemini(default) uses ClaudeโGeminiโNano Banana Pro for publication-quality diagrams;illustration: mermaidgenerates Mermaid diagrams for free;illustration: falseskips AI figures entirely.Gemini API setup (for
illustration: gemini): Get your API key at Google AI Studio, then set it as an environment variable:export GEMINI_API_KEY="your-key". Or add to your shell profile (~/.zshrc/~/.bashrc). No other dependencies needed.
Tested end-to-end: Generated a 9-page ICLR 2026 theory paper (7 sections, 29 citations, 4 figures, 2 comparison tables) from a single NARRATIVE_REPORT.md โ zero compilation errors, zero undefined references.
After Workflow 3 generates the paper, /auto-paper-improvement-loop runs 2 rounds of GPT-5.4 xhigh content review โ fix โ recompile, plus a final format compliance check, autonomously polishing the paper from rough draft to submission-ready.
Score Progression (Real Test โ ICLR 2026 theory paper):
| Round | Score | Key Changes |
|---|---|---|
| Round 0 | 4/10 (content) | Baseline |
| Round 1 | 6/10 (content) | Fixed assumptions, softened claims, renamed notation |
| Round 2 | 7/10 (content) | Added synthetic validation, stronger limitations |
| Round 3 | 5โ8.5/10 (format) | Removed hero fig, appendix, compressed conclusion, float spacing |
Final: 8 pages main body (ICLR limit: 9), 0 overfull hbox, ICLR-compliant. +4.5 points across 3 rounds.
\resizebox\captionsetup, \textfloatsep)itemize environments"Reviews are in. Help me draft a safe, grounded rebuttal."
Got reviews back? /rebuttal parses them, builds a strategy, and drafts a venue-compliant response:
auto experiment: true, auto-run supplementary experiments via /experiment-bridgePASTE_READY.txt (exact character count) + REBUTTAL_DRAFT_rich.md (extended version for manual editing)โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Workflow 4: Rebuttal โ
โ โ
โ Reviews arrive โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Parse & โโโโโโถโ Strategy โโโโโโถโ Evidence โ โ
โ โ atomize โ โ plan โ โ sprint โ โ
โ โ reviews โ โ โ โ (optional)โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Finalize โโโโโโโ GPT-5.4 โโโโโโโ Draft โ โ
โ โ 2 versionsโ โ stress โ โ rebuttal โ โ
โ โ โ โ test โ โ โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ PASTE_READY.txt (strict) + RICH.md (extended) โ
โ โ โ
โ โผ โ
โ Follow-up rounds (delta replies, per-reviewer threads) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Skills involved: rebuttal
๐ก Quick mode:
/rebuttal โ quick mode: truestops after parsing + strategy (Phase 0-3). See what reviewers want before committing to a full draft.
โ๏ธ
VENUE,AUTO_EXPERIMENT,QUICK_MODE,MAX_STRESS_TEST_ROUNDSare configurable โ see Customization.
Three safety gates โ rebuttal will NOT finalize if any fails:
| Skill | Description | Codex MCP? |
|---|---|---|
๐๏ธ research-pipeline | End-to-end: Workflow 1 โ 1.5 โ 2 โ 3, from research direction to submission | Yes |
| Skill | Description | Codex MCP? |
|---|---|---|
๐ญ idea-discovery | Pipeline orchestrator โ runs all skills below in sequence | Yes |
โ ๐ research-lit | Multi-source literature search (Zotero + Obsidian + local PDFs + arXiv API + web) | No |
โ ๐ก idea-creator | Brainstorm 8-12 ideas, filter by feasibility, pilot on GPU, rank by signal | Yes |
โ ๐ novelty-check | Verify idea novelty against recent literature (multi-source + GPT-5.4 cross-check) | Yes |
โ ๐ฌ research-review |
| Skill | Description | Codex MCP? |
|---|---|---|
๐ experiment-bridge | Read experiment plan โ implement code โ sanity check โ deploy to GPU โ collect initial results | No |
โ ๐ run-experiment | Deploy experiments to local (MPS/CUDA) or remote GPU servers | No |
โ ๐ monitor-experiment | Monitor running experiments, check progress, collect results | No |
| Skill | Description | Codex MCP? |
|---|---|---|
๐ auto-review-loop | Pipeline orchestrator โ autonomous reviewโfixโre-review (max 4 rounds) | Yes |
โ ๐ฌ research-review | Deep review from external LLM (shared with Workflow 1) | Yes |
โ ๐ novelty-check | Verify novelty when reviewer suggests new directions | Yes |
โ ๐ run-experiment | Deploy experiments to local (MPS/CUDA) or remote GPU servers | No |
โ ๐ analyze-results | Analyze experiment results, compute statistics, generate insights | No |
โ ๐ monitor-experiment |
| Skill | Description | Codex MCP? |
|---|---|---|
๐ paper-writing | Pipeline orchestrator โ runs all skills below in sequence | Yes |
โ ๐ paper-plan | Claims-evidence matrix, section structure, figure plan, citation scaffolding | Yes |
โ ๐ paper-figure | Publication-quality matplotlib/seaborn plots + LaTeX comparison tables | Optional |
โ ๐จ paper-illustration | AI-generated architecture diagrams and method figures via Gemini (when illustration: true) | No (needs Gemini API) |
โ โ๏ธ paper-write | Section-by-section LaTeX generation (ICLR/NeurIPS/ICML). Anti-hallucination BibTeX via DBLP/CrossRef |
| Skill | Description | Codex MCP? |
|---|---|---|
๐ rebuttal | Parse reviews โ atomize โ strategy โ draft โ safety check โ stress test โ finalize (2 versions) โ follow-up | Yes |
| Skill | Description | Codex MCP? |
|---|---|---|
๐ arxiv | Search, download, and summarize arXiv papers. Standalone or /research-lit supplement | No |
๐จ pixel-art | Generate pixel art SVG illustrations for READMEs, docs, or slides | No |
๐ฑ feishu-notify | Feishu/Lark push (webhook) or interactive (bidirectional). Off by default | No |
npm install -g @openai/codex
claude mcp add codex -s user -- codex mcp-server
latexmk and pdfinfo:
# macOS
brew install --cask mactex # or: brew install basictex
brew install poppler # provides pdfinfo
# Ubuntu/Debian
sudo apt install texlive-full latexmk poppler-utils
# Verify
latexmk --version && pdfinfo -v
If you only need Workflow 1 & 2 (idea discovery + auto review), LaTeX is not required.
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep
# Install all skills globally
cp -r skills/* ~/.claude/skills/
# Or install specific skills
cp -r skills/auto-review-loop ~/.claude/skills/
cp -r skills/research-lit ~/.claude/skills/
cd Auto-claude-code-research-in-sleep
git pull
# Option A: Full update (overwrites all skills with latest version)
cp -r skills/* ~/.claude/skills/
# Option B: Safe update (only add NEW skills, keep your customizations)
cp -rn skills/* ~/.claude/skills/
# Option C: Update specific skills only
cp -r skills/experiment-bridge ~/.claude/skills/
๐ก Which option? Use A if you haven't customized any skills. Use B if you've modified skills locally (new skills get added, your changes are preserved โ but you'll miss upstream bug fixes in modified files). Use C to selectively update.
# Workflow 1: Idea Discovery
> /idea-discovery "your research direction" # full pipeline
> /research-lit "topic" # just literature survey (all sources)
> /research-lit "topic" โ sources: zotero, web # mix and match sources
> /research-lit "topic" โ arxiv download: true # also download top arXiv PDFs
> /arxiv "discrete diffusion" โ download # standalone arXiv search + download
> /idea-creator "topic" # just brainstorm
# Workflow 2: Auto Research Loop
> /auto-review-loop "your paper topic" # review โ fix โ repeat
> /research-review "your paper" # single deep review
# Workflow 3: Paper Writing
> /paper-writing "NARRATIVE_REPORT.md" # full pipeline
> /paper-plan "NARRATIVE_REPORT.md" # just outline
> /paper-compile "paper/" # just compile
# Full Pipeline
> /research-pipeline "your research direction" # Workflow 1 โ 2 โ 3 end-to-end
# Supporting Skills
> /run-experiment train.py --lr 1e-4 --epochs 100
> /analyze-results figures/*.json
> /monitor-experiment server5
To run the auto-review loop without clicking permission prompts, add to .claude/settings.local.json:
{
"permissions": {
"allow": [
"mcp__codex__codex",
"mcp__codex__codex-reply",
"Write",
"Edit",
"Skill(auto-review-loop)"
]
}
}
When GPT-5.4 says "run an ablation study" or "add a baseline comparison", Claude Code automatically writes the experiment script and deploys it to your GPU server. For this to work, Claude Code needs to know your server environment.
Add your server info to your project's CLAUDE.md:
## Remote Server
- SSH: `ssh my-gpu-server` (key-based auth, no password)
- GPU: 4x A100
- Conda env: `research` (Python 3.10 + PyTorch)
- Activate: `eval "$(/opt/conda/bin/conda shell.bash hook)" && conda activate research`
- Code directory: `/home/user/experiments/`
- Use `screen` for background jobs: `screen -dmS exp0 bash -c '...'`
Claude Code reads this and knows how to SSH in, activate the environment, and launch experiments. GPT-5.4 (the reviewer) only decides what experiments to run โ Claude Code figures out how based on your CLAUDE.md.
If you are already on the GPU server, you can add the following to your CLAUDE.md:
## GPU Environment
- This machine has direct GPU access (no SSH needed)
- GPU: 4x A100 80GB
- Experiment environment: `YOUR_CONDA_ENV` (Python 3.x + PyTorch)
- Activate before any Python command: `The command to activate your experiment environment` (uv, conda, etc.)
- Code directory: `/home/YOUR_USERNAME/YOUR_CODE_DIRECTORY/`
No server? The review and rewriting skills still work without GPU access. Only experiment-related fixes will be skipped (flagged for manual follow-up).
If you use Zotero to manage your paper library, /research-lit can search your collections, read your annotations/highlights, and export BibTeX โ all before searching the web.
Recommended: zotero-mcp (1.8kโญ, semantic search, PDF annotations, BibTeX export)
# Install
uv tool install zotero-mcp-server # or: pip install zotero-mcp-server
# Add to Claude Code (Local API โ requires Zotero desktop running)
claude mcp add zotero -s user -- zotero-mcp -e ZOTERO_LOCAL=true
# Or use Web API (works without Zotero running)
claude mcp add zotero -s user -- zotero-mcp \
-e ZOTERO_API_KEY=your_key -e ZOTERO_USER_ID=your_id
Get your API key at https://www.zotero.org/settings/keys
What it enables in /research-lit:
Not using Zotero? No problem โ /research-lit automatically skips Zotero and uses local PDFs + web search instead.
If you use Obsidian for research notes, /research-lit can search your vault for paper summaries, tagged references, and your own insights.
Recommended: mcpvault (760โญ, no Obsidian app needed, 14 tools, BM25 search)
# Add to Claude Code (point to your vault path)
claude mcp add obsidian-vault -s user -- npx @bitbonsai/mcpvault@latest /path/to/your/vault
Optional complement: obsidian-skills (13.6kโญ, by Obsidian CEO) โ teaches Claude to understand Obsidian-specific Markdown (wikilinks, callouts, properties). Copy to your vault:
git clone https://github.com/kepano/obsidian-skills.git
cp -r obsidian-skills/.claude /path/to/your/vault/
What it enables in /research-lit:
#paper-review, #diffusion-models)Not using Obsidian? No problem โ /research-lit automatically skips Obsidian and works as before.
๐ก Zotero + Obsidian together: Many researchers use Zotero for paper storage and Obsidian for notes. Both integrations work simultaneously โ
/research-litchecks Zotero first (raw papers + annotations), then Obsidian (your processed notes), then local PDFs, then web search.
/research-lit automatically queries the arXiv API for structured metadata (title, abstract, full author list, categories) โ richer than web search snippets. No setup required.
By default, only metadata is fetched (no files downloaded). To also download the most relevant PDFs:
/research-lit "topic" โ arxiv download: true # download top 5 PDFs
/research-lit "topic" โ arxiv download: true, max download: 10 # download up to 10
For standalone arXiv access, use the dedicated /arxiv skill:
/arxiv "attention mechanism" # search
/arxiv "2301.07041" โ download # download specific paper
Get mobile notifications when experiments finish, reviews score, or checkpoints need your input โ without sitting in front of the terminal.
| Push Only (group cards) | Interactive (private chat) |
|---|---|
Three modes โ you choose per-project:
| Mode | What happens | You need |
|---|---|---|
| Off (default) | Nothing. Pure CLI, no Feishu | Nothing |
| Push only | Webhook notifications at key events. Mobile push, no reply | Feishu bot webhook URL |
| Interactive | Full bidirectional. Approve/reject ideas, reply to checkpoints from Feishu | feishu-claude-code running |
Group notifications with rich cards โ experiment done, review scored, pipeline complete. Mobile push, no reply needed.
Step 1: Create a Feishu group bot
ARIS Notifications), copy the Webhook URLARIS (all notifications include this word), or leave unrestrictedStep 2: Create config file
cat > ~/.claude/feishu.json << 'EOF'
{
"mode": "push",
"webhook_url": "https://open.feishu.cn/open-apis/bot/v2/hook/YOUR_WEBHOOK_ID"
}
EOF
Step 3: Test it
curl -s -X POST "YOUR_WEBHOOK_URL" \
-H "Content-Type: application/json" \
-d '{
"msg_type": "interactive",
"card": {
"header": {"title": {"tag": "plain_text", "content": "๐งช ARIS Test"}, "template": "blue"},
"elements": [{"tag": "markdown", "content": "Push mode working! ๐"}]
}
}'
You should see a blue card in your group. Skills will now automatically send rich cards at key events:
| Event | Card color | Content |
|---|---|---|
| Review scored โฅ 6 | ๐ข Green | Score, verdict, top weaknesses |
| Review scored < 6 | ๐ Orange | Score, verdict, action items |
| Experiment complete | ๐ข Green | Results table, delta vs baseline |
| Checkpoint waiting | ๐ก Yellow | Question, options, context |
| Error | ๐ด Red | Error message, suggested fix |
| Pipeline done | ๐ฃ Purple | Score progression, deliverables |
Everything Push mode does, plus bidirectional private chat with Claude Code via Feishu. Approve/reject ideas, reply to checkpoints, give custom instructions โ all from your phone.
How it works: Push cards go to the group (everyone sees status). Interactive conversations happen in private chat with the bot (you reply, Claude Code acts on it).
Step 1: Complete Push setup above first (you'll keep both)
Step 2: Create a Feishu app on open.feishu.cn
ARIS Claude Bot) โ create| Permission | Scope | Why |
|---|---|---|
im:message | Send & receive messages | Core messaging |
im:message:send_as_bot | Send as bot | Bot replies |
im:message.group_at_msg:readonly | Receive group @mentions | Group messages |
im:message.p2p_msg:readonly | Receive private messages | โ ๏ธ Easy to miss! Without this, the bot connects but never receives your messages |
im:resource | Access attachments | Images/files |
im.message.receive_v1 โ saveโ ๏ธ Important: The "Long Connection" page may show "ๆชๆฃๆตๅฐๅบ็จ่ฟๆฅไฟกๆฏ" โ this is normal. You need to start the bridge first (Step 3), then come back and save.
For personal/test Feishu organizations, approval is usually instant.
Step 3: Deploy the bridge
git clone https://github.com/joewongjc/feishu-claude-code.git
cd feishu-claude-code
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Configure
cp .env.example .env
Edit .env:
FEISHU_APP_ID=cli_your_app_id # From app credentials page
FEISHU_APP_SECRET=your_app_secret # From app credentials page
DEFAULT_MODEL=claude-opus-4-6 # โ ๏ธ Default is sonnet โ change to opus for best results
DEFAULT_CWD=/path/to/your/project # Working directory for Claude Code
PERMISSION_MODE=bypassPermissions # Or "default" for safer mode
โ ๏ธ Model matters: The default
claude-sonnet-4-6works but may struggle with complex project context.claude-opus-4-6correctly identified 18 ARIS skills on first try where sonnet could not.
Start the bridge:
python main.py
# Expected output:
# โ
่ฟๆฅ้ฃไนฆ WebSocket ้ฟ่ฟๆฅ๏ผ่ชๅจ้่ฟ๏ผ...
# [Lark] connected to wss://msg-frontier.feishu.cn/ws/v2?...
For long-running use, put it in a screen session:
screen -dmS feishu-bridge bash -c 'cd /path/to/feishu-claude-code && source .venv/bin/activate && python main.py'
Step 4: Save event config โ Go back to Feishu Open Platform โ Events & Callbacks โ the long connection should now show "ๅทฒๆฃๆตๅฐ่ฟๆฅ" โ Save
If you published the app version before the bridge was running, you may need to create a new version (e.g., 1.0.1) and re-publish after saving event config.
Step 5: Test private chat
ไฝ ๅฅฝIf the bot doesn't reply: Send /new to reset the session, then try again. Common issues:
| Symptom | Cause | Fix |
|---|---|---|
| Bot connects but never receives messages | Missing im:message.p2p_msg:readonly permission | Add permission โ create new version โ publish |
| Bot replies but doesn't know your project | DEFAULT_CWD points to wrong directory | Edit .env โ restart bridge |
| Bot replies but seems less capable | Using claude-sonnet-4-6 | Change to claude-opus-4-6 in .env โ restart |
| Old session has stale context | Session cached from before config change | Send /new in chat to start fresh session |
| "ๆชๆฃๆตๅฐๅบ็จ่ฟๆฅไฟกๆฏ" when saving events | Bridge not running yet | Start bridge first, then save event config |
Step 6: Update ARIS config
cat > ~/.claude/feishu.json << 'EOF'
{
"mode": "interactive",
"webhook_url": "https://open.feishu.cn/open-apis/bot/v2/hook/YOUR_WEBHOOK_ID",
"interactive": {
"bridge_url": "http://localhost:5000",
"timeout_seconds": 300
}
}
EOF
Now skills will:
| Skill | Events | Push | Interactive |
|---|---|---|---|
/auto-review-loop | Review scored (each round), loop complete | Score + verdict | + wait for continue/stop |
/auto-paper-improvement-loop | Review scored, all rounds done | Score progression | Score progression |
/run-experiment | Experiments deployed | GPU assignment + ETA | GPU assignment + ETA |
/monitor-experiment | Results collected | Results table | Results table |
/idea-discovery | Phase transitions, final report | Summary at each phase | + approve/reject at checkpoints |
/research-pipeline | Stage transitions, pipeline done | Stage summary | + approve/reject |
Not using Feishu? No problem โ without ~/.claude/feishu.json, all skills behave exactly as before. Zero overhead, zero side effects.
๐ก Alternative IM platforms: The push-only webhook pattern works with any service that accepts incoming webhooks (Slack, Discord, DingTalk, WeChat Work). Just change the
webhook_urland card format infeishu-notify/SKILL.md. For bidirectional support, see cc-connect (multi-platform bridge) or clawdbot-feishu.
Skills are plain Markdown files. Fork and customize:
๐ก Parameter pass-through: Parameters flow down the call chain automatically. For example,
/research-pipeline "topic" โ sources: zotero, arxiv download: truepassessourcesandarxiv downloadthroughidea-discoveryall the way down toresearch-lit. You can set any downstream parameter at any level โ just addโ key: valueto your command.research-pipeline โโโ idea-discovery โโโ research-lit โโโ experiment-bridge โโโ run-experiment โโโ auto-review-loop โโโ idea-creator โโโ novelty-check โโโ research-review
research-pipeline)| Constant | Default | Description | Pass-through |
|---|---|---|---|
AUTO_PROCEED | true | Auto-continue with top-ranked option if user doesn't respond | โ idea-discovery |
ARXIV_DOWNLOAD | false | Download top arXiv PDFs after literature search | โ idea-discovery โ research-lit |
HUMAN_CHECKPOINT | false | When true, pause after each review round for approval | โ auto-review-loop |
WANDB | false | Auto-add W&B logging to experiments | โ experiment-bridge โ run-experiment |
CODE_REVIEW | true | GPT-5.4 reviews experiment code before deployment | โ experiment-bridge |
BASE_REPO | false | GitHub repo URL to clone as base codebase for experiments | โ experiment-bridge |
COMPACT | false | Generate compact summary files for short-context models and session recovery |
Override inline: /research-pipeline "topic" โ auto proceed: false, illustration: mermaid
auto-review-loop)| Constant | Default | Description |
|---|---|---|
MAX_ROUNDS | 4 | Maximum reviewโfixโre-review iterations |
POSITIVE_THRESHOLD | 6/10 | Score at which the loop stops (submission-ready) |
> 4 GPU-hour skip | 4h | Experiments exceeding this are flagged for manual follow-up |
idea-discovery / idea-creator)| Constant | Default | Description | Pass-through |
|---|---|---|---|
PILOT_MAX_HOURS | 2h | Skip any pilot estimated to take longer per GPU | โ |
PILOT_TIMEOUT_HOURS | 3h | Hard timeout โ kill runaway pilots, collect partial results | โ |
MAX_PILOT_IDEAS | 3 | Maximum number of ideas to pilot in parallel | โ |
MAX_TOTAL_GPU_HOURS | 8h | Total GPU budget across all pilots | โ |
AUTO_PROCEED | true | Auto-continue with top-ranked option if user doesn't respond | โ |
ARXIV_DOWNLOAD | false | Download top arXiv PDFs after literature search | โ research-lit |
Override inline: /idea-discovery "topic" โ pilot budget: 4h per idea, sources: zotero, arxiv download: true
experiment-bridge)| Constant | Default | Description |
|---|---|---|
CODE_REVIEW | true | GPT-5.4 xhigh reviews code before deployment. Catches logic bugs before wasting GPU hours |
AUTO_DEPLOY | true | Automatically deploy experiments after implementation + review. Set false to manually inspect |
SANITY_FIRST | true | Run smallest experiment first to catch setup bugs before full deployment |
MAX_PARALLEL_RUNS | 4 | Maximum experiments to deploy in parallel (limited by available GPUs) |
WANDB | false | Auto-add W&B logging. Requires wandb_project in CLAUDE.md |
BASE_REPO | false | GitHub repo URL to clone as base codebase for experiments |
Override inline: /experiment-bridge โ base repo: https://github.com/org/project
research-lit)| Constant | Default | Description |
|---|---|---|
PAPER_LIBRARY | papers/, literature/ | Local directories to scan for PDFs before searching online |
MAX_LOCAL_PAPERS | 20 | Max local PDFs to scan (first 3 pages each) |
SOURCES | all | Which sources to search: zotero, obsidian, local, web, or all (comma-separated) |
ARXIV_DOWNLOAD | false | When true, download top relevant arXiv PDFs to PAPER_LIBRARY after search |
ARXIV_MAX_DOWNLOAD | 5 | Maximum number of PDFs to download when ARXIV_DOWNLOAD = true |
Override inline: /research-lit "topic" โ sources: zotero, web, /research-lit "topic" โ arxiv download: true, max download: 10
paper-write)| Constant | Default | Description |
|---|---|---|
DBLP_BIBTEX | true | Fetch real BibTeX from DBLP/CrossRef instead of LLM-generated entries |
TARGET_VENUE | ICLR | Target venue: ICLR, NeurIPS, ICML, CVPR, ACL, AAAI, ACM |
ANONYMOUS | true | Use anonymous author block for blind review |
MAX_PAGES | 9 | Main body page limit (excluding references) |
ILLUSTRATION | gemini | AI illustration mode: gemini (default, needs GEMINI_API_KEY), mermaid (free), or false (skip) |
Override inline: /paper-write โ target venue: NeurIPS, illustration: mermaid
| Constant | Default | Description |
|---|---|---|
REVIEWER_MODEL | gpt-5.4 | OpenAI model used via Codex MCP. Also available: gpt-5.3-codex, gpt-5.2-codex, o3. See supported models for full list. |
allowed-tools โ restrict or expand what each skill can doDon't have Claude / OpenAI API access? You can swap in other models โ same cross-model architecture, different providers.
โญ We strongly recommend Claude + GPT-5.4 (default setup). It's the most tested and reliable combination. Alternative setups work but may require prompt tuning.
| Executor | Reviewer | Need Claude API? | Need OpenAI API? | Guide | |
|---|---|---|---|---|---|
| Default โญ | Claude Opus/Sonnet | GPT-5.4 (Codex MCP) | Yes | Yes | Quick Start |
| Alt A | GLM-5 (Z.ai) | GPT-5.4 (Codex MCP) | No | Yes | Setup below |
| Alt B | GLM-5 (Z.ai) | MiniMax-M2.5 | No | No | MINIMAX_MCP_GUIDE |
| Alt C | Any CC-compatible | Any OpenAI-compatible | No | No | LLM_API_MIX_MATCH_GUIDE |
| Alt D | Kimi-K2.5 / Qwen3.5+ | GLM-5 / MiniMax-M2.5 | No | No | ALI_CODING_PLAN_GUIDE |
| Alt E ๐ | DeepSeek-V3.1 / Qwen3-Coder | DeepSeek-R1 / Qwen3-235B | No | No | MODELSCOPE_GUIDE |
Alt C supports tested providers: GLM (Z.ai), Kimi (Moonshot), LongCat (Meituan) as executors; DeepSeek, MiniMax as reviewers. Any OpenAI-compatible API should also work via the generic llm-chat MCP server. Alt D uses Alibaba Coding Plan โ one API key for both executor and reviewer, 4 models included (Kimi, Qwen, GLM, MiniMax). Alt E uses ModelScope โ free (2000 calls/day), one key, no automation restrictions. Alt G keeps Codex as executor but swaps the reviewer to Claude Code CLI via the local claude-review MCP bridge, with async polling for long paper/review prompts. Alt H uses Google Antigravity as the executor with native SKILL.md support โ choose Claude Opus 4.6 (Thinking) or Gemini 3.1 Pro (high) as the execution model. Alt I keeps Codex as executor, adds only a thin skills-codex-gemini-review overlay, and routes the reviewer-aware predefined skills through the local gemini-review MCP bridge with direct Gemini API by default. It is the closest Gemini analogue to the existing Codex+Claude review path, while minimizing skill changes and now also covers poster PNG review via the same bridge. Free-tier availability, rate limits, and data-use terms remain subject to Google's current policy.
* Alt G normally relies on local Codex CLI and Claude Code CLI logins. Direct API keys are optional, not required.
Only replace the executor (Claude โ GLM), keep GPT-5.4 as reviewer via Codex MCP.
npm install -g @anthropic-ai/claude-code
npm install -g @openai/codex
codex setup # set model to gpt-5.4
Configure ~/.claude/settings.json:
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "your_zai_api_key",
"ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
"API_TIMEOUT_MS": "3000000",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5"
},
"mcpServers": {
"codex": {
"command": "/opt/homebrew/bin/codex",
"args": ["mcp-server"]
}
}
}
Codex CLI uses your existing OPENAI_API_KEY (from ~/.codex/config.toml or environment) โ no extra config needed for the reviewer side.
No Claude or OpenAI API needed. Uses a custom MiniMax MCP server instead of Codex (because MiniMax doesn't support OpenAI's Responses API). Full guide: docs/MINIMAX_MCP_GUIDE.md.
Mix and match freely using the generic llm-chat MCP server. Supports any OpenAI-compatible API as reviewer. Full guide: docs/LLM_API_MIX_MATCH_GUIDE.md.
Example combinations: GLM + DeepSeek, Kimi + MiniMax, Claude + DeepSeek, LongCat + GLM, etc.
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep
cp -r skills/* ~/.claude/skills/
claude
โ ๏ธ For non-Claude executors (GLM, Kimi, etc.): Let the model read through the project once to ensure skills are correctly parsed. This is especially important if you've rewritten skills to use a different reviewer MCP (e.g.,
mcp__llm-chat__chatinstead ofmcp__codex__codex) โ the new executor needs to understand the changed tool call patterns:Read through this project and verify all skills are working: /idea-creator, /research-review, /auto-review-loop, /novelty-check, /idea-discovery, /research-pipeline, /research-lit, /run-experiment, /analyze-results, /monitor-experiment, /pixel-art
โ ๏ธ Note: Alternative models may behave differently from Claude and GPT-5.4. You may need to tune prompt templates for best results. The core cross-model architecture remains the same.
AUTO_PROCEED (default: auto-continue; set false to always wait)/paper-plan โ /paper-figure โ /paper-write โ /paper-compile. ICLR/NeurIPS/ICML templates, claims-evidence matrix, publication-quality figures, latexmk auto-fix. Inspired by claude-scholar, Research-Paper-Writing-Skills, baoyu-skillsgpt-5.4, also works with gpt-5.3-codex, gpt-5.2-codex, o3, etc.)/research-lit scans local papers/ and literature/ directories before external search, leveraging papers you've already read/idea-discovery orchestrates research-lit โ idea-creator โ novelty-check โ research-review in one command, with pilot experiments on GPU/research-pipeline chains Workflow 1 (idea discovery) โ implementation โ Workflow 2 (auto-review-loop) end-to-end/peer-review for reviewing others' papers as a conference reviewer, with GPT-5.4 meta-review (planned; currently use /research-review with a paper PDF)~/.claude/feishu.json. Push-only needs just a webhook URL; interactive uses feishu-claude-code. Off by default โ zero impact on existing workflows. See launchd/systemd for true unattended operation. Currently the orchestration layer requires an active CLI session; state files (REVIEW_STATE.json, AUTO_REVIEW.md) support resuming across sessions, but relaunch is manual (#11)paper-illustrationWORKFLOW_REPORT.md for progress tracking, team reporting, and supervisor updatesRESEARCH_BRIEF.md) as input to /research-pipeline or /idea-discovery instead of a one-line prompt. Many research directions need nuanced context (prior results, constraints, domain knowledge) that can't fit in a single sentence. The document would be parsed for problem definition, constraints, existing results, and specific requirementsDomain-specific skills welcome! The core skills cover general research workflows, but every field has its own tools and patterns. We welcome PRs that add new skills for your domain โ EDA, bioinformatics, robotics, HPC, or anything else. Just add a skills/your-skill/SKILL.md and open a PR. See dse-loop for an example.
Join the WeChat group for discussion on Claude Code + AI-driven research workflows:
If you use ARIS in your research, please cite:
@misc{yang2026aris,
author = {Yang, Ruofeng and Li, Yongcan and Li, Shuai},
title = {ARIS: Fully Autonomous Research via Adversarial Multi-Agent Collaboration},
year = {2026},
organization = {GitHub},
url = {https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep},
}
ARIS is inspired by:
This project builds on and integrates with many excellent open-source projects:
Core Infrastructure
Zotero Integration (setup guide)
Obsidian Integration (setup guide)
Paper Writing Inspiration
Feishu/Lark Integration (setup guide)
Community
Special Thanks โ Platform Adaptation
ARIS wouldn't run on so many platforms without these contributors:
spawn_agentSpecial Thanks โ Architecture & Vision
MIT
base repoโ base repo: https://github.com/org/projectgemini-review MCP bridge. CNSKILL.md support, dual model, MCP config, EN + CN. Community contribution by @PeppaPigwformula-derivation โ research formula development and verification. Community contribution by @Falling-Flowerpaper-poster โ Conference poster (tcbposter โ A0/A1 PDF + PPTX + SVG). Venue colors, visual review, Codex review. Community contribution by @dengzhe-hou/experiment-bridge now includes GPT-5.4 cross-model code review before GPU deployment (code review: true by default). ๐ W&B fix โ real wandb.Api() callsllm-chat2026-03-15 โ ๐พ OpenClaw adaptation guide โ use ARIS research workflows in OpenClaw without Claude Code slash skills
2026-03-15 โ ๐ proof-writer โ community skill for rigorous theorem proof drafting. ๐ Anti-hallucination citations โ /paper-write now fetches real BibTeX from DBLP/CrossRef instead of LLM-generated entries โ on by default, zero install
2026-03-14 โ ๐ฑ Feishu/Lark integration: three modes (off/push/interactive), mobile notifications for experiments, reviews, and checkpoints
2026-03-13 โ ๐ Human-in-the-loop: configurable AUTO_PROCEED checkpoints across all workflows. Full autopilot or step-by-step approval
2026-03-12 โ ๐ Zotero + Obsidian + local PDFs + arXiv/Scholar: multi-source literature search with cross-model novelty verification
2026-03-12 โ ๐ Three end-to-end workflows complete: one prompt โ top-venue-style paper. /research-pipeline chains idea discovery โ auto review โ paper writing autonomously
2026-03-12 โ ๐ /paper-writing workflow: narrative report โ structured outline โ figures โ LaTeX โ compiled PDF โ 2-round auto-improvement (4/10 โ 8.5/10)
code review | true | GPT-5.4 xhigh reviews experiment code before GPU deployment. Set false to skip |
wandb | false | Auto-add W&B logging to experiment scripts. Set true + configure wandb_project in CLAUDE.md. /monitor-experiment pulls training curves from W&B |
illustration | gemini | AI illustration in Workflow 3: gemini (default, needs GEMINI_API_KEY), mermaid (free), or false (skip) |
venue | ICLR | Target venue: ICLR, NeurIPS, ICML, CVPR, ACL, AAAI, ACM. Determines LaTeX style file and page limit |
base repo | false | GitHub repo URL to clone as base codebase (e.g., โ base repo: https://github.com/org/project). No code? Build on top of an open-source project |
compact | false | Generate compact summary files (IDEA_CANDIDATES.md, findings.md, EXPERIMENT_LOG.md) for short-context models and session recovery |
ref paper | false | Reference paper to build on (PDF path or arXiv URL). Summarized first, then ideas extend/improve it. Combine with base repo for paper+code workflows |
/research-pipeline "your topic" โ AUTO_PROCEED: false # pause at idea selection gate
/research-pipeline "your topic" โ human checkpoint: true # pause after each review round to give feedback
/research-pipeline "your topic" โ sources: zotero, web # only search Zotero + web (skip local PDFs)
/research-pipeline "your topic" โ arxiv download: true # download top arXiv PDFs during literature survey
/research-pipeline "your topic" โ AUTO_PROCEED: false, human checkpoint: true # combine options
๐ Flexible models โ default Claude ร GPT-5.4, also supports GLM, MiniMax, Kimi, LongCat, DeepSeek, etc. โ no Claude or OpenAI API required
๐ Human-in-the-loop โ configurable checkpoints at key decisions. AUTO_PROCEED=true for full autopilot, false to approve each step
๐ฑ Feishu/Lark notifications โ three modes: off (default, strongly recommended for most users), push-only (webhook, mobile alerts), interactive (approve/reject from Feishu). Zero impact when unconfigured
Push Only โ group chat cards (experiment done, checkpoint, error, pipeline complete):
Interactive โ private chat with Claude Code (approve/reject, custom instructions):
๐งฉ Extensible โ domain-specific skills welcome! Add a SKILL.md and open a PR. See community skills like dse-loop (architecture/EDA)
/research-review/paper-illustration| Yes |
๐ค paper-slides | General | Conference talk slides (beamer โ PDF + PPTX) with speaker notes, full talk script + Q&A prep. Auto slide count from talk type | Yes |
๐ผ๏ธ paper-poster | General | Conference poster (article + tcbposter โ A0/A1 PDF + component PPTX + SVG). Venue-specific colors, visual review loop, Codex MCP review | Yes |
๐ proof-writer | ML Theory | Rigorous theorem/lemma proof drafting โ feasibility triage, dependency maps, honest blockage reports | No |
๐ก comm-lit-review | Communications / Wireless | Domain-specific literature review โ IEEE/ACM/ScienceDirect priority, venue tiering, PHY/MAC/transport/NTN taxonomy | No |
๐๏ธ dse-loop | Architecture / EDA | Autonomous design space exploration โ iteratively run, analyze, and tune parameters (gem5, Yosys, etc.) | No |
๐ค idea-discovery-robot | Robotics / Embodied AI | Workflow 1 adaptation โ grounds idea discovery in embodiment, benchmark, sim2real path, and real-robot safety constraints | Yes |
๐ mermaid-diagram | General | Mermaid diagrams (20+ types) โ free alternative to paper-illustration, no API key needed | No |
๐ข formula-derivation | General | Research formula development โ derivation, verification, and LaTeX formatting | No |
Use ARIS skills in Cursor โ @-reference skills, MCP setup, workflow mapping, state file recovery across sessions |
| ๐ฅ๏ธ Trae Adaptation Guide | General | Use ARIS skills in Trae (ByteDance AI IDE) โ EN + CN guides |
๐จ paper-illustration | General | AI-generated architecture diagrams via Gemini. Built on PaperBanana. Integrated into Workflow 3 |
๐ค skills-codex | General | Codex CLI sync pack for the main research skills, now including training-check, result-to-claim, ablation-planner, rebuttal, plus the shared-references/ support directory |
| ๐๏ธ auto-hparam-tuning | General | Automatic hyperparameter tuning โ AI agent reads project, plans strategy, runs experiments, analyzes TensorBoard, learns from results. Hydra-based |
| ๐ Codex+Claude Review Bridge | General | Codex executes + Claude reviews via local claude-review MCP bridge with async polling |
| Single-round deep review from external LLM (xhigh reasoning) |
| Yes |
โ ๐งญ research-refine-pipeline | Refine method + plan experiments in one chain | Yes |
ใโ ๐ฌ research-refine | Problem anchor โ iterative method refinement (up to 5 rounds, score โฅ 9) | Yes |
ใโ ๐งช experiment-plan | Claim-driven experiment roadmap with ablations, budgets, and run order | No |
| Monitor running experiments, check progress, collect results |
| No |
๐ auto-review-loop-llm | Same as above, but uses any OpenAI-compatible API via llm-chat MCP server | No |
| Yes |
โ ๐จ paper-compile | Compile LaTeX to PDF, auto-fix errors, submission readiness checks | No |
โ ๐ auto-paper-improvement-loop | 2-round content review + format check (4/10 โ 8.5/10) | Yes |
| โ all workflows |
REF_PAPER | false | Reference paper (PDF path or URL) to base ideas on. Summarized first, then used as context | โ idea-discovery |
ILLUSTRATION | gemini | AI illustration: gemini (default), mermaid (free), or false (skip) | โ paper-writing |
| Alt F | Codex CLI (GPT-5.4) | Codex spawn_agent (GPT-5.4) | No | Yes | skills-codex/ |
| Alt G ๐ | Codex CLI | Claude Code CLI (claude-review MCP) | No* | No* | CODEX_CLAUDE_REVIEW_GUIDE |
| Alt H ๐ | Antigravity (Claude Opus 4.6 / Gemini 3.1 Pro) | GPT-5.4 (Codex MCP) or any via llm-chat | No | Optional | ANTIGRAVITY_ADAPTATION |
| Alt I ๐ | Codex CLI | Gemini direct API (gemini-review MCP) | No | No | CODEX_GEMINI_REVIEW_GUIDE |
/research-lit searches Zotero collections, reads annotations/highlights, exports BibTeX. Recommended: zotero-mcp (1.8kโญ). See setup guide/research-lit searches Obsidian vault for research notes, tagged references, wikilinks. Recommended: mcpvault (760โญ) + obsidian-skills (13.6kโญ). See setup guidellm-chat MCP server. GLM, MiniMax, Kimi, LongCat, DeepSeek all tested โ no Claude or OpenAI API required/run-experiment supports code_sync: git (git push โ ssh "git pull")wandb.init() + wandb.log() when wandb: true. /monitor-experiment pulls training curves/experiment-bridge/auto-review-loop