Hi everyone,
It’s been about a week since Vibhatha, Megan, and I joined LDF, so we wanted to give everyone a quick update on the technical progress we’ve made so far. This week has largely been about learning, understanding our objectives, and assessing the tools and resources we can use.
We put together a high-level system architecture that outlines how we can achieve the functionality we’re aiming for, and we compared it to the existing GIG architecture to see how well it aligns.
We’ve taken a deep dive into defining what exactly we want the system to do and identifying the components needed to make it happen. This included studying the GIG repository to understand its components and evaluating the pros and cons of using GIG for the long term. However, we’re still in the process of learning more about GIG’s full capabilities.
At the same time, we’ve been researching various alternative existing graph systems that could effectively represent our data. Our idea behind this is that using an established graph system could let us focus more on our main goals of extracting, processing, and presenting data, rather than on the complex task of building a graph database from scratch. We’re considering options for integrating such a system with elements of the GIG architecture.
In line with this, we’ve been looking into Neo4j, an interesting graph database that allows us to create knowledge graphs easily. Neo4j also has capabilities for easy integration with LLMs, allowing us to potentially build graphs directly from unstructured sources like PDFs, websites, or videos. We built a few PoCs to explore Neo4j’s graph-building and LLM integration features and even managed to create a (very) rough organizational chart from a gazette using these tools.
Finally, we’re exploring the use of LLMs for direct data extraction from various data sources, structured or unstructured. To test this, we’re attempting to build an organizational chart from the latest Sri Lankan gazette using an LLM to extract the data, with LangChain to facilitate the process.
We’ll be sharing regular updates, so until then!



--
You received this message because you are subscribed to the Google Groups "LDF Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@datafoundation.lk.
To view this discussion visit https://groups.google.com/a/datafoundation.lk/d/msgid/dev/CANpxfrgxczkAzwQgeSCVNmdY_-U%3DuWKFCqOvTkojqAcVnEG2Ug%40mail.gmail.com.


This week, we primarily focused on brainstorming ideas for GIG 2.0 from multiple angles. GIG 2.0 will be an iterative improvement over GIG 1.0. We are reviewing the database design, as well as the compute and UI functionality, for the new version. While exploring the gazettes, we have gathered a significant amount of metadata, providing us with more data to work with. The objective is to define entities and relationships in detail to enable complex querying.
In parallel, we have verified a large number of gazettes, with more still to be processed. We are revisiting the drawing board to refine our existing infrastructure and ensure it meets the requirements of GIG 2.0.
Until next time, Merry Christmas to you all and Happy Holidays!




For OrgChart, we worked on fixing a few UI bugs. On the backend, we developed GraphQL API endpoints and added Swagger documentation along with OpenAPI specifications.
We have also begun development on MyLocal, an application that provides various kinds of data for geographical locations (e.g. provinces, districts) in Sri Lanka. An initial version of this application was created by the previous team, and we have started migrating its data, which was originally stored in CSV files, to our polyglot database. A simple UI has been created to visualize this data. Both GraphQL and REST APIs have been added, along with internal APIs for data ingestion. Additionally, we have integrated PostGIS-formatted geospatial data from the MyLocal data dump to enhance spatial data handling.
To view this discussion visit https://groups.google.com/a/datafoundation.lk/d/msgid/dev/CANpxfrjvNgVpteiGbm%3DX2mJ_0wz38eXXKkzJB5ZEbPL7qNEoGA%40mail.gmail.com.
This sprint, we focused on data integration for OrgChart 2.0, structuring and storing data for MyLocal, and designing a solid software architecture for LDF.
For OrgChart 2.0, we have been updating the new app with gazette data and refining our data migration process. We are currently focusing on integrating data from the most recent two presidencies. Additionally, we have been gathering supplementary data to enhance OrgChart 2.0 and support its integration with MyLocal, with many records collected so far.
For MyLocal, we have successfully stored geospatial data in PostGIS format in a Postgres DB, including topological details. We’ve updated our models to align with the MyLocal API and added test coverage, though we are continuing to refine it further.
On the architecture side, we have been working on re-designing a solid software architecture that can be used across LDF to capture and represent data with the flexibility to accommodate data from any country. The discussions are ongoing, and we are working towards finalizing a well-designed foundation that can seamlessly support all our needs without constant adjustments.
To achieve this, we used our Neo4j backend to track and store gazettes and their relationships. The source code can be found in doctracer. The core is used to develop the sample applications and it can be found in doctracer_examples. We also received support from a student volunteer who contributed to developing Gazette Tracer. The code for this project can be found in rdwaynedehoedt's fork.
Automating Gazette Data Extraction with AI
Extracting structured data from gazettes is a challenging, labor-intensive task due to their complex formatting and references to previous documents. To streamline this, we implemented an LLM-based approach to extract metadata from gazettes in tabular format with full cabinet releases. While the initial implementation is complete, there’s room for further refinement to enhance accuracy and performance. This initiative has now been handed off to a group of open-source developers as part of a final-year project.
Additional Data Collection & OrgChart 1.0 Redeployment
In parallel, we redeployed OrgChart 1.0 with the latest gazette data. Additionally, we’ve been gathering personnel data for the orgchart, which will serve as a valuable dataset for connecting MyLocal and OrgChart. This data collection effort is ongoing.
Core Data Processing Platform
Another major development is the core data processing platform, which began as a discussion towards the end of Q1. This platform serves as the foundation for all our applications, providing a multilayered architecture that enables seamless reading and writing to multiple databases. Designed as a polyglot database system, it can interact with different types of databases, dynamically selecting the appropriate database based on the nature of the data.
Developers can access and manage this data through a series of secure API endpoints, ensuring consistency, reliability, and efficiency across all applications. The fundamental components have been implemented, though further refinements are planned. The primary objective of this initiative is to establish a minimal but functional architecture that can support OrgChart 2.0.
Looking Ahead
Q1 has been a period of growth and learning, filled with valuable insights and foundational progress as we continue building up LDF. As we move into Q2, our focus will be on two key objectives: platform building and the release of OrgChart 2.0. With these priorities in mind, we aim to refine our systems, finalize development, and prepare for a successful launch.
None of this progress would have been possible without the incredible support and contributions of several individuals. We’d like to extend our sincere gratitude to:
Visal Vitharana for his immense help in guiding us through Choreo and the previous OrgChart Choreo deployment. From multiple phone calls to in-person walkthroughs on the 6th floor, his assistance was invaluable.
Umayanga Gunawardhana, who, all the way from Germany, helped us understand and get up to speed on how GIG operates.
Nuwan Senaratna for his valuable insights into how MyLocal functions, how its data is stored and utilized, and for his thought-provoking discussions and ideas on other aspects of our work.
Sameera Jayasoma for his support and contributions during our brainstorming sessions on the software architecture design.
Kasun Amarasinghe and Jaminda Batuwangala for their early guidance and engaging discussions on government gazettes and how to construct an effective OrgChart to track government changes. Their ideas led to the creation of Gazette Tracer, providing an easy reference point for gazettes.
Dwayne Dehoedt, who dedicated significant time and effort to building the Gazette Tracer application while managing both his internship and university coursework.
Ramith Jayasinghe for conducting a Choreo knowledge transfer session for LDF-related deployments.
Vajira Jayasrimal and Tishan Dahanayakage for their help in safely migrating our deployments to a new Choreo data plane.
And Umayangana Senarath for her creative contributions in designing multiple potential LDF logos for us.
We truly appreciate all the time, effort, and knowledge that all of you have contributed to our journey. Thank you!
Onward to Q2—bigger and better things ahead!
Best Regards,
LDF Team
Lastly, we're working on an initial version of a chatbot to help users interact with our data. For the MVP, it will focus on supporting graph-based queries only. The chatbot will generate GraphQL queries based on what users ask.
Best Regards,
Zaeema Nashath