Hi all,
GDELT which I had mentioned earlier is Google's indexing platform for news articles. It is very much alive and Cursor was able to use it to get a list of probable news reports of human-wildlife conflicts, deploy LLMs to determine more precisely whether this indeed was a conflict event resulting in deaths or injuries of humans or animals, and use LLMs to extract the most precise location as possible. Check out
https://act4d.iitd.ac.in/gdelt-wildlife/. As before, users can suggest edits and moderators can review the edits which get commited to git.
And this is a generic setup. Anyone can write their own meta json to build similar maps for other kinds of events. I'm hoping to build one next for crop damage events like due to droughts, floods, waterlogging, high temperature, pests, etc. Claude even gave a meta prompt to build the meta json! Check out the README here:
https://github.com/aaditeshwar/gdelt-wildlife/tree/main/meta
Overall learning: A reasonable GPU workstation running local LLMs is able to parse news articles, documents, etc. to get reasonably well structured outputs. Some of our students are trying to use this to parse research papers and extract variables and relationships to build knowledge graphs that can provide an explainable and disciplined reasoning framework for questions asked by users.
Slippery slope: The speed at which all this can be done is amazing and exciting but verification and fine tuning seems slow and hard work in comparison. I didn't bother checking for example whether Claude produced a good list of categories, and I didn't check beyond half a dozen cases about how correctly the locations were being extracted or conflicts were being identified. In the agromet-advisory, I had used the LLMs in reverse to cross-check the outputs, and similarly maybe one way is to use a different LLM to cross-check the outputs. But overall, there is a serious need to teach us patience and discipline, that's the only way I feel. Principles like supplying the provenance, allowing humans-in-the-loop, explaining the setup, etc. can easily be done as a compliance measure and will not add much value I think.
Adi
-- Aaditeshwar SethMicrosoft Chair Professor, Computer Science and Engineering, IIT Delhi
Co-founder, Gram Vaani; Co-founder, CoRE Stack