Bring back CLAUDE.md: revert commit 0fe3d88c#5625
Conversation
|
I will only approve this PR if you commit to keep this file up to date when the rest of the code changes. |
|
A single big CLAUDE.md file like this does not seem useful, in fact it looks like it's actively harmful the tool even displays a warning about the file being too large. It is also unlikely that most people working with claude will use your provided scripts to split the files I tried to approach of splitting it up too and I did not notice any difference in quality of content it produces. FWIW it seems to look at README.mds too so maybe it's more helpful to write good and useful README files and tell claude to look at @docs.feldera.com, @openapi.json and README files. I suggest we just put all higher level CLAUDE.md files in gitignore and everyone can have whatever CLAUDE.md files in higher level directories they want. You can also put it as a commit that can be applied to any branch someone works in if you like but I really don't see value in having this comitted to main. I think this is snake oil at this point without any objective measure whether it helps with anything or not. |
|
I have been using coding agents when working on NATS connector and exploring Feldera code base. When pulling a fresh Feldera main I delete the CLAUDE.md file from Feldera repo. Is is too big and and make prompts expensive and agent get unfocused with to much info. But that just my experience. Anthropic recommend it to be short and focus. https://code.claude.com/docs/en/best-practices#write-an-effective-claude-md But you can also make CLAUDE.md files in subfolders that the agent will read if it visits the folder. And for some tasks and domain knowledge. Skills can be interesting. I playing with a NATS output connector made with a output connector skill that the agent made before implementing the connector. |
mythical-fred
left a comment
There was a problem hiding this comment.
The concern that PR #5600 should have gone through a discussion rather than a bare commit is fair. But the solution here goes too far in the other direction.
A 10,387-line CLAUDE.md is operationally counterproductive:
Claude Code reads CLAUDE.md on every session start. A file this size consumes a large fraction of the context window with content that is:
- General architecture documentation Claude can derive from reading the source
- Highly likely to drift from reality as the codebase evolves, with no mechanism to catch it
- Not targeted — every session gets all 10K lines regardless of what the developer is actually working on
The right model is per-directory CLAUDE.md files, which is what the unpack script was trying to achieve. Those files load only when Claude Code is working in that directory. The composite-single-file approach defeats that benefit — you end up with one giant file instead of small, focused, per-directory ones.
What would actually help:
- A concise root (< 50 lines) covering: repo overview, how to build, how to run tests, key architectural decisions worth knowing upfront
- Per-directory files for complex subsystems (e.g., , ) with subsystem-specific guidance
- Both should be maintained manually, not auto-generated from a composite
If the concern is that someone reverted via commit instead of PR — yes, that should be raised with them. But the fix isn't to restore 10K lines of auto-generated documentation.
|
FWIW just stumbled upon something interesting on linkedin (for once) here is some pretty strong evidence that this PR is "doing the wrong thing": https://arxiv.org/pdf/2602.11988 from the paper:
|
2ee4944 to
9f99aab
Compare
|
After discussion with Gerd I implemented the following solution: Whoever needs them can run scripts/claude.sh to pull these files as unstaged changes without affecting existing local staged and unstaged changes (CLAUDE.md would be overwritten). |
mythical-fred
left a comment
There was a problem hiding this comment.
This is the right fix. CLAUDE.md on main is now 11 lines of actionable guidance — exactly what it should be. The composite context lives opt-in on claude-context, accessible via scripts/claude.sh for those who want it.
The separation is clean: main stays lean, heavy context is not forced on CI or anyone who doesn't ask for it. Gerd's paper reference is apt — large context files are empirically harmful, not just wasteful.
Based on my experience I do not believe it is necessary nor extremely important to keep this file up-to-date with all the latest changes. Nor do I have an expertise to update context files across the entire codebase. I could set up a way to have an LLM update context changes, but this is not what these context files are designed for. I insist that a practical and meaningful approach to maintain context files is lazy, iterative updates: whenever you interact with Claude and make use of context files and notice a discrepancy within a certain domain (e.g. SQL compiler sources, or Python integration tests) - you update the context file. Occasionally (once a month or two) you would gloss over the context file for a sub-directory you're knowledgeable in and author the file. If you are the one who introduces a change but don't use Claude actively and context files are "hidden out of sight" - in a composite file, separate branch, etc. - I don't expect these files will be updated in sync with code changes; "lazy" approach addresses that. |
9f99aab to
42b0a37
Compare
Signed-off-by: Karakatiza666 <bulakh.96@gmail.com>
42b0a37 to
2f376c0
Compare
|
Doesn't this added line: ... mess with the review bot? |
PR #5600 deleted the root composite CLAUDE.md file that, via a script that "unpacks", it covers the most of the repository with knowledge base through per-directory CLAUDE.md files.
The commit (not even a PR description) describes the CLAUDE.md as excessive, but it is in fact necessary to use Claude Code effectively.
If the composite CLAUDE.md somehow breaks some workflow that cannot use the file-splitting script (e.g. Claude GitHub Reviews ?) let's discuss and address the issue properly.