I'm contemplating creating an index of Biblical Hebrew chapters or verses based on word frequency. I'm wanting to create lists fitting a description like, "All verses exclusively composed of roots that occur 500 or more times" or maybe "All chapters composed at least 90% of roots that occur 400 or more times".
I'm a software developer, so I have no trouble writing the algorithm, but I need data. I'm looking at
https://github.com/openscriptures/morphhb/tree/master/wlc which seems like it has all the information I need unless I'm mistaken. It looks like, for instance, lemma 1121 is "ben" (son) and all its different inflections, which is perfect.
The current post exists to ask a few questions:
1. Has this been done before so that I'm duplicating other work?
2. Is that directory of that git repository a good source of data, or does a better source exist? I see that the latest commit was a few years ago, and I'm not sure what that means.
3. Does what you can make of my proposed algorithm sound correct?
Thank you for your time!
Yours,
Aaron Laws