Consolidation of Git(Hub) Scanning?

2 views
Skip to first unread message

Gerd Aschemann

unread,
May 23, 2025, 4:08:22 AMMay 23
to jQAssistant
Hi,

recently I have been working a lot with scanning Git data, using both 
I wonder a little bit, if both could be unified? I mean, the truth of Git should be the very same, independent of the source, a local repository, with one or many remotes, or a remote repository like GitHub. From a performance point of view it might be even more efficient to read from a local repository than over the network. Reading from GitHub should in most cases only be necessary for additional information (Issues, PRs, discussions etc.).

In jQA I would just like to reason about Git contents. Unfortunately, none of the existing implementations cover all aspects, e.g., 
  • multiple remotes are missing in the GitHub Plugin,
  • the Git Plugin lacks typed authors 
What do other users but in particular the two authoring teams think? I would be even willing to contribute, or unify both in a new/separate repository (disclaimer: Not before July).

Regards
  Gerd

Gerd Aschemann

unread,
Oct 6, 2025, 3:26:36 PMOct 6
to jqass...@googlegroups.com
Hi jQA maintainers (and contributors from Kontext-E),

I have found the time to look into this proposal (time by time over the last 5 months). 

Meanwhile, I came up with a (large) refactoring of the Buschmais jQA GitHub Plugin. It is now based on the Kontext-E Git Plugin by re-using the existing Neo4j model.
This allows to first perform a local scan of a clone of the repository, which is way faster than retrieving commits from GitHub (and relieves the rate limiting).
Additionally, I have implemented several changes and improvements
  • Scan for all (remote) branches, not just the default (or configured) branch
  • Retrieve Git commits from GitHub if necessary on demand (if not provided by local repository). This should enable GH only scans (without prior scanning of a local clone) - the refactored implementation would stay compatible with the existing implementation. Remote commit retrieval is necessary anyways as the commits may not exist in a local clone, e.g., when coming from PRs or additional forks of the GH repository.
  • Pre-Load all remote commits from GitHub for branches, tags and pull-requests. Upto now, those linked lists of commits (via parent relationship) was retrieved one-by-one in a recursive manner. This was not only time consuming (one remote call per commit), but lead to stack overflows for large collections. Now the implementation uses bulk retrieval calls for such collections from the underlying framework.
  • More sophisticated error handling for missing remote entities, e.g., if a different (forked) remote repository of a pull request was not available anymore (deleted meanwhile) the import was aborted with a RuntimeException.

However, there are still open points and my work requires thorough review. I just wanted to get in touch with you before investing even more work.

I could use my implementation to insert all (> 100) Apache Maven projects into a jQA instance, which was never possible with the given implementation (besides a lack of features like the import of all branches).

I’d be happy if you would incorporate my changes into the existing repository (or even further merge it with the Git plugin). Otherwise I would create (and try to maintain) a fork as I see many benefits in my improvements.

Cheers
  Gerd

--
You received this message because you are subscribed to the Google Groups "jQAssistant" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jqassistant...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jqassistant/b55a4203-bc82-4770-b7d0-588bcc34ff33n%40googlegroups.com.

--
Gerd Aschemann --- Veröffentlichen heißt Verändern (Carmen Thomas)
+49/173/3264070 -- ge...@aschemann.net -- http://www.aschemann.net

Reply all
Reply to author
Forward
0 new messages