Hi everyone,
I'm Daksh, a Checkstyle contributor since Hacktoberfest 2025. I've been following the XPath Generator project with a lot of interest and wanted to share where my head is at before writing my proposal.
I spent time going through the existing PoC and I think I understand why a complete redesign makes sense, it's missing a validation layer and the prompt structure is pretty basic. My thinking for the redesign is to use a RAG-based approach, where we build a knowledge base of known correct XPath suppressions and AST patterns, so the LLM has relevant examples as context rather than generating blindly. I think this would dramatically improve precision.
Two things I wanted to get clarity on before I go too deep:
Should the solution be Java-native or is a Python LLM service with a Java wrapper acceptable? And does the RAG direction make sense to you, or do you have a different accuracy improvement strategy in mind?
Would love any feedback asap.
Thanks,
Daksh R Jain
GitHub: github.com/DakshRJain737