LDF Weekly Technical Update

35 views
Skip to first unread message

Zaeema Nashath

unread,
Nov 9, 2024, 2:35:26 AM11/9/24
to LDF Dev

Hi everyone,

It’s been about a week since Vibhatha, Megan, and I joined LDF, so we wanted to give everyone a quick update on the technical progress we’ve made so far. This week has largely been about learning, understanding our objectives, and assessing the tools and resources we can use.

We put together a high-level system architecture that outlines how we can achieve the functionality we’re aiming for, and we compared it to the existing GIG architecture to see how well it aligns.

We’ve taken a deep dive into defining what exactly we want the system to do and identifying the components needed to make it happen. This included studying the GIG repository to understand its components and evaluating the pros and cons of using GIG for the long term. However, we’re still in the process of learning more about GIG’s full capabilities.

At the same time, we’ve been researching various alternative existing graph systems that could effectively represent our data. Our idea behind this is that using an established graph system could let us focus more on our main goals of extracting, processing, and presenting data, rather than on the complex task of building a graph database from scratch. We’re considering options for integrating such a system with elements of the GIG architecture.

In line with this, we’ve been looking into Neo4j, an interesting graph database that allows us to create knowledge graphs easily. Neo4j also has capabilities for easy integration with LLMs, allowing us to potentially build graphs directly from unstructured sources like PDFs, websites, or videos. We built a few PoCs to explore Neo4j’s graph-building and LLM integration features and even managed to create a (very) rough organizational chart from a gazette using these tools.

Finally, we’re exploring the use of LLMs for direct data extraction from various data sources, structured or unstructured. To test this, we’re attempting to build an organizational chart from the latest Sri Lankan gazette using an LLM to extract the data, with LangChain to facilitate the process.

We’ll be sharing regular updates, so until then!

image.png

Best regards,
Zaeema Nashath

Zaeema Nashath

unread,
Nov 18, 2024, 7:14:06 AM11/18/24
to LDF Dev
Hi everyone,

This week we’ve been working on understanding the workflow of the currently deployed org chart and exploring ways to utilize LLMs to extract the missing data (from 2023 onwards) from the gazettes. Our main goal is to get the org chart up to date and ready for use.

We've held several meetings focused on understanding the Choreo deployment and how new data can be integrated into the org chart. Additionally, we’ve been working on setting up the repositories locally and brainstorming ways to efficiently update the missing data in the org chart.

For the LLM approach, we tried directly passing the PDF to the model but found it lacked spatial awareness to identify departments in the second column of tables when flattened to text. To address this, we switched to an OCR-based approach where we converted the PDF to images and utilized OpenAI’s Vision to process the images. This approach provided better spatial understanding, allowing the LLM to accurately locate the correct column and extract relevant data for each ministry, along with most other information in the tables, aside from a few edge cases.

Moving forward on updating the org chart, our next steps are to locate the missing older gazettes that are not currently reflected, as the old link in the codebase no longer provides access to them. Once we have these, we can create the necessary formatted CSV files for the missing gazettes—using either the LLM approach or an alternative—and push the data to the MongoDB database. We should then be able to reflect the new data on the org chart.

Best regards,
Zaeema Nashath

Zaeema Nashath

unread,
Nov 22, 2024, 10:51:40 PM11/22/24
to LDF Dev
Hi everyone,

This week has been focused on knowledge transfer (KT) sessions and setting up GIG and the orgchart locally so we can work on reflecting the new gazette data in the orgchart. 

On the LLM front, we successfully generated a CSV file for the latest gazette using the LLM approach, which can now be uploaded to the orgchart using the GIG scripts.  

We also walked through the GIG codebase to understand the workflow from extracting data from PDFs to updating the orgchart.  

Through multiple KT sessions, we set up a sandbox environment with GIG Service, the orgchart, and a free hosted MongoDB version to experiment with GIG scripts both locally and on Choreo. A massive shoutout to Visal for his incredible support, spending hours walking us through everything, helping with the setup, and answering all our questions about GIG, the orgchart, and Choreo :)

Once the local setup was complete, we tested the scripts to understand how data insertion works and how updates reflect on the frontend. We ran into issues with data duplication and inaccuracies but we eventually identified that the asynchronous nature of the scripts was causing these problems. We’ve brainstormed some solutions to address these issues and will work on implementing them next week.

Thank you, and hope everyone has a great weekend!

9ba28o.jpg

Best Regards,
Zaeema Nashath

Zaeema Nashath

unread,
Nov 29, 2024, 8:02:39 AM11/29/24
to LDF Dev
Hi everyone,

This week has been focused on locating the missing gazettes, updating their data into the orgchart, and verifying the information.

On locating the missing gazettes, we were able to retrieve them from the Department of Government Printing website through a largely manual process. This required identifying the gazettes visually, a demanding process which also means we will need to validate our findings to ensure nothing was missed. However, so far everything seems to be on track!  

Once the missing gazettes were located, we created the necessary CSV files to upload to GIG. For gazettes in the appropriate format, we leveraged the LLM approach discussed earlier, while others had to be manually processed.  

Once the CSV files were ready, we pushed the data to a staging database, using GIG to generate an updated orgchart with the new data reflected.

Finally, we’re now deep into the most time-consuming part, the validation phase. Due to the nature of the GIG scripts and the gazettes themselves, inconsistencies can arise in the generated orgchart data. We’re working to verify and validate all the orgchart data against the gazettes, splitting the workload between the three of us. Once we’ve identified all the inconsistencies, we will work on correcting these. This validation work will continue into next week.

Thank you all, have a great weekend!

9c154f.jpg

9c13xp.jpg  

Best Regards,
Zaeema Nashath

Zaeema Nashath

unread,
Dec 2, 2024, 1:23:58 AM12/2/24
to LDF Dev
Hi everyone,

Just to add on to last week's update, if you would like to view the updated orgchart, you can do so here, https://orgchart.choreoapps.dev/. Do note that this is still in staging, if you find any inconsistencies or anything to be improved do let us know :)

Best Regards,
Zaeema Nashath

Malith Jayasinghe

unread,
Dec 5, 2024, 11:05:13 PM12/5/24
to d...@datafoundation.lk
This is great! This is great! How many new gazettes did we use to update the org chart?

--
You received this message because you are subscribed to the Google Groups "LDF Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@datafoundation.lk.
To view this discussion visit https://groups.google.com/a/datafoundation.lk/d/msgid/dev/CANpxfrgxczkAzwQgeSCVNmdY_-U%3DuWKFCqOvTkojqAcVnEG2Ug%40mail.gmail.com.

Zaeema Nashath

unread,
Dec 5, 2024, 11:46:48 PM12/5/24
to d...@datafoundation.lk
Hi Malith,

We used 7 new gazettes.

Best Regards,
Zaeema Nashath

Zaeema Nashath

unread,
Dec 6, 2024, 5:09:05 AM12/6/24
to d...@datafoundation.lk
Hi everyone,

This week's update is pretty short as we have been solely focused on verifying the data in the OrgChart. There are a total of 74 timestamps in the OrgChart, each corresponding to a gazette. These timestamps have been divided between the 3 of us and each of us is validating the data in the OrgChart against those in the gazettes for our respective timestamps. We have also been working on understanding some issues we faced with GIG regarding data duplication as well as evaluating our current setup and costs.

That's all for this week. Next week we will continue with the data validation. Have a great weekend!

9ctsth.jpg
Best Regards,
Zaeema Nashath

Zaeema Nashath

unread,
Dec 13, 2024, 7:17:52 AM12/13/24
to d...@datafoundation.lk
Hi everyone,

This week we continued the manual verification of gazettes, a time-intensive task that will extend into the coming weeks. 

Additionally, we have been working on hosting a service monitoring system in Choreo for the OrgChart. Through a brainstorming session, we came up with several ideas for enhancing the OrgChart and leveraging gazettes, which are already being put into action. Work has also begun on the second version of GIG for the OrgChart, with progress in mapping entity relationships and pre-processing gazette data for Neo4j. We are also exploring the inclusion of personnel data for ministries and departments in this version.  

Thank you all, have a great weekend!

9dn0en.jpg

Best Regards,
Zaeema Nashath

Vibhatha Abeykoon

unread,
Dec 20, 2024, 10:18:35 PM12/20/24
to d...@datafoundation.lk
Hello Everyone, 

This week, we primarily focused on brainstorming ideas for GIG 2.0 from multiple angles. GIG 2.0 will be an iterative improvement over GIG 1.0. We are reviewing the database design, as well as the compute and UI functionality, for the new version. While exploring the gazettes, we have gathered a significant amount of metadata, providing us with more data to work with. The objective is to define entities and relationships in detail to enable complex querying.

In parallel, we have verified a large number of gazettes, with more still to be processed. We are revisiting the drawing board to refine our existing infrastructure and ensure it meets the requirements of GIG 2.0.

Until next time, Merry Christmas to you all and Happy Holidays!

We will be back in 2025. 

happy-holidays-cheers.gif

Best Regards,
Vibhatha Abeykoon

Zaeema Nashath

unread,
Jan 11, 2025, 8:38:32 AM1/11/25
to LDF Dev
Hi everyone,

I hope everyone had a great holiday. Moving forward, these technical updates will be provided biweekly to align with our sprint cycle :)

For this sprint, our primary focus was on GIG 2.0, specifically finding an effective way to represent the complex and ever-changing relationships in the orgchart, along with preparing for the integration of all other data in Sri Lanka that LDF will eventually collect.

We first spent some time designing an ideal method for representing the raw data to be fed into the database. We also worked on designing a system that would enable non-technical people to easily update the orgchart when a new gazette is released. This effort was part of our task to use GIG 2.0 to build the orgchart using a graph database (currently experimenting with Neo4j), ensuring the system is easily extendable for other use cases.

Once we finalized how the raw data should be represented, we focused on handling the various relationship and entity changes in the orgchart. A comprehensive set of rules was written to address every possible change scenario. We tested this approach using a series of gazettes and amendments, adding data to the graph and dynamically modifying it in real time to reflect how the orgchart will be updated. This rule-based approach has proven to be the most efficient and reliable method so far.

Currently, we are testing the initial PoC to ensure it covers all use cases of the orgchart. Additionally, we have begun exploring ways to integrate supplementary data, such as personnel information, into the orgchart.

Thank you all and enjoy your long weekend!

44b5b22113a44c29fffac04ab7753e90.jpg
Best Regards,
Zaeema Nashath

Zaeema Nashath

unread,
Jan 26, 2025, 9:14:40 AM1/26/25
to LDF Dev
Hi everyone,

This sprint focused on developing the foundational systems for tourism data ingestion and completing the OrgChart 2.0 Proof of Concept.

Firstly, we worked on analyzing available tourism data and setting up a data ingestion system. A Django-based backend has been developed, incorporating data models derived from the raw tourism dataset. The tables were normalized to a certain degree, and we plan iterative improvements as the dataset grows.

The OrgChart 2.0 Neo4j PoC has been successfully completed. This PoC supports dynamic organizational changes such as renaming, merging, moving, adding, and terminating entities. The system uses rule-based transformations defined via CSV files corresponding to these operations and it can support the ingestion of data from both gazettes with full minister-department tables as well as gazettes of only amendments. The graph supports time-based queries for historical views, and a simple UI has been created to visualize the organizational structure for each gazette. Moving forward, we will work on thoroughly testing the system to ensure it can meet all the orgchart's requirements while also working on enhancing the orgchart UI.

Thank you!

image.png

Best regards,
Zaeema Nashath

Zaeema Nashath

unread,
Feb 9, 2025, 12:34:25 AM2/9/25
to LDF Dev
Hi everyone,

This sprint focused on developing the OrgChart 2.0 UI and studying and evaluating additional data sources LDF can use.

We’ve developed the UI for OrgChart 2.0, which currently replicates the functionality of OrgChart 1.0, with plans to extend it further in the coming sprints. It features a timeline of gazette dates at the top and an organizational chart displaying ministers and their departments, now dynamically extracted from Neo4j via a simple query.

Additionally, we checked out DeepSeek for gazette data extraction, similar to our OpenAI approach, and developed a small proof of concept to demonstrate its capabilities.

We are also actively finding and evaluating other data sources that can support our mission of enhancing transparency for the people of Sri Lanka.

Thank you!

9jkyz3.jpg

Best regards,
Zaeema Nashath


Zaeema Nashath

unread,
Feb 24, 2025, 6:39:05 AM2/24/25
to LDF Dev
Hi everyone,

This sprint focused on the development of OrgChart and MyLocal.

For OrgChart, we worked on fixing a few UI bugs. On the backend, we developed GraphQL API endpoints and added Swagger documentation along with OpenAPI specifications.

We have also begun development on MyLocal, an application that provides various kinds of data for geographical locations (e.g. provinces, districts) in Sri Lanka. An initial version of this application was created by the previous team, and we have started migrating its data, which was originally stored in CSV files, to our polyglot database. A simple UI has been created to visualize this data. Both GraphQL and REST APIs have been added, along with internal APIs for data ingestion. Additionally, we have integrated PostGIS-formatted geospatial data from the MyLocal data dump to enhance spatial data handling.

Best regards,
Zaeema Nashath

Sanjiva Weerawarana

unread,
Feb 24, 2025, 8:54:16 PM2/24/25
to d...@datafoundation.lk
Zaeema we are not going to rewrite the MyLocal UI that Nuwan did - we are just changing the backend. It doesn't make sense to rewrite that at all! We will just update the code to use the data from the DB instead.

Please discuss with Vibhatha too because I discussed this with him.

Sanjiva.



--
Sanjiva Weerawarana

Zaeema Nashath

unread,
Feb 25, 2025, 12:01:58 AM2/25/25
to d...@datafoundation.lk
Hi Sanjiva,

Noted with thanks. The UI we created was just a quick mock to help visualize the data during development. Moving forward, we’ll be using the existing UI that Nuwan built and will focus on updating the backend to pull data from the database instead.

Best regards,
Zaeema Nashath

Zaeema Nashath

unread,
Mar 11, 2025, 1:04:55 PM3/11/25
to d...@datafoundation.lk
Hi everyone,

This sprint, we focused on data integration for OrgChart 2.0, structuring and storing data for MyLocal, and designing a solid software architecture for LDF.

For OrgChart 2.0, we have been updating the new app with gazette data and refining our data migration process. We are currently focusing on integrating data from the most recent two presidencies. Additionally, we have been gathering supplementary data to enhance OrgChart 2.0 and support its integration with MyLocal, with many records collected so far.

For MyLocal, we have successfully stored geospatial data in PostGIS format in a Postgres DB, including topological details. We’ve updated our models to align with the MyLocal API and added test coverage, though we are continuing to refine it further.

On the architecture side, we have been working on re-designing a solid software architecture that can be used across LDF to capture and represent data with the flexibility to accommodate data from any country. The discussions are ongoing, and we are working towards finalizing a well-designed foundation that can seamlessly support all our needs without constant adjustments.

Best regards,
Zaeema Nashath

Zaeema Nashath

unread,
Apr 2, 2025, 3:01:31 AM4/2/25
to d...@datafoundation.lk
Hi everyone, 

Time flies, and just like that, we've reached the end of the first quarter of 2025. Over the past few months, we’ve made significant strides in platform development, application enhancements, and data collection efforts. While some of our original Q1 goals evolved along the way, we’ve adapted our roadmap to ensure everything aligns moving forward into Q2. Here’s a recap of our progress so far:

Platform Maintenance and Upgrades
We carried out essential maintenance for OrgChart 1.0 and MyLocal 1.0, keeping them operational in the Choreo environment and ensuring continuous access to users.

We also rebuilt an OrgChart 2.0 UI. The outdated UI in OrgChart 1.0 had numerous dependency issues due to non-maintained packages, prompting us to rebuild it with up-to-date and well-supported libraries. During this transition, we explored different ways to store organizational data and settled on Neo4j, a graph database. With this we developed a new OrgChart 2.0 backend capable of efficiently storing data while tracking historical changes and amendments. Similarly, a backend upgrade was also implemented for MyLocal 2.0.

Gazette Tracer
Through our research on how OrgChart data is structured—specifically, the relationship between ministers, departments, and departmental transfers—we learnt that this information is sourced from government-released gazettes. These official documents are interlinked, with each new gazette often referring back to previous ones. This complexity made tracking these relationships challenging, which led us to develop Gazette Tracer—a visualization tool designed to map how gazettes relate to each other in the context of government restructuring.

To achieve this, we used our Neo4j backend to track and store gazettes and their relationships. The source code can be found in doctracer. The core is used to develop the sample applications and it can be found in doctracer_examples. We also received support from a student volunteer who contributed to developing Gazette Tracer. The code for this project can be found in rdwaynedehoedt's fork.

Automating Gazette Data Extraction with AI
Extracting structured data from gazettes is a challenging, labor-intensive task due to their complex formatting and references to previous documents. To streamline this, we implemented an LLM-based approach to extract metadata from gazettes in tabular format with full cabinet releases. While the initial implementation is complete, there’s room for further refinement to enhance accuracy and performance. This initiative has now been handed off to a group of open-source developers as part of a final-year project.

Additional Data Collection & OrgChart 1.0 Redeployment
In parallel, we redeployed OrgChart 1.0 with the latest gazette data. Additionally, we’ve been gathering personnel data for the orgchart, which will serve as a valuable dataset for connecting MyLocal and OrgChart. This data collection effort is ongoing.

Core Data Processing Platform
Another major development is the core data processing platform, which began as a discussion towards the end of Q1. This platform serves as the foundation for all our applications, providing a multilayered architecture that enables seamless reading and writing to multiple databases. Designed as a polyglot database system, it can interact with different types of databases, dynamically selecting the appropriate database based on the nature of the data.

Developers can access and manage this data through a series of secure API endpoints, ensuring consistency, reliability, and efficiency across all applications. The fundamental components have been implemented, though further refinements are planned. The primary objective of this initiative is to establish a minimal but functional architecture that can support OrgChart 2.0.

Looking Ahead
Q1 has been a period of growth and learning, filled with valuable insights and foundational progress as we continue building up LDF. As we move into Q2, our focus will be on two key objectives: platform building and the release of OrgChart 2.0. With these priorities in mind, we aim to refine our systems, finalize development, and prepare for a successful launch.

None of this progress would have been possible without the incredible support and contributions of several individuals. We’d like to extend our sincere gratitude to:

  • Visal Vitharana for his immense help in guiding us through Choreo and the previous OrgChart Choreo deployment. From multiple phone calls to in-person walkthroughs on the 6th floor, his assistance was invaluable.

  • Umayanga Gunawardhana, who, all the way from Germany, helped us understand and get up to speed on how GIG operates.

  • Nuwan Senaratna for his valuable insights into how MyLocal functions, how its data is stored and utilized, and for his thought-provoking discussions and ideas on other aspects of our work.

  • Sameera Jayasoma for his support and contributions during our brainstorming sessions on the software architecture design.

  • Kasun Amarasinghe and Jaminda Batuwangala for their early guidance and engaging discussions on government gazettes and how to construct an effective OrgChart to track government changes. Their ideas led to the creation of Gazette Tracer, providing an easy reference point for gazettes.

  • Dwayne Dehoedt, who dedicated significant time and effort to building the Gazette Tracer application while managing both his internship and university coursework.

  • Ramith Jayasinghe for conducting a Choreo knowledge transfer session for LDF-related deployments.

  • Vajira Jayasrimal and Tishan Dahanayakage for their help in safely migrating our deployments to a new Choreo data plane.

  • And Umayangana Senarath for her creative contributions in designing multiple potential LDF logos for us.

We truly appreciate all the time, effort, and knowledge that all of you have contributed to our journey. Thank you!

Onward to Q2—bigger and better things ahead!

Best Regards,
LDF Team

Zaeema Nashath

unread,
Jun 4, 2025, 10:36:17 AM6/4/25
to d...@datafoundation.lk
Hi everyone,

It’s been a while since our last update in April but we’ve been holding off until we could share a more complete picture of what we've been working on.

Over the past couple of months, our main focus has been on building Nexoan (a tentative name for now), the foundational architecture that will power all our future applications moving forward. Nexoan is a system designed for interlinking data and intelligence. It provides a unified way to access and work with different types of data from structured to unstructured to graph-based through a common interface. It handles both storage and retrieval, and supports evolving data formats as they come in.

Another big milestone that happened was expanding our team. After a careful and extensive hiring process, we’re excited to welcome four new software engineering interns to the team: Yasandu Imanjith, Sehansi Perera, Isuru Rangana and Chanuka Ranathunga. They’ve hit the ground running and are already contributing to some of our most important projects.

Alongside this, we’ve been making progress on the LDF website. This will act as the central hub for the work we’re doing, serving as the entry point to our tools and datasets. We’re working through a few technical hurdles on the domain side but expect to have it fully live soon.

And of course, we’ve been continuing our work on OrgChart, our interactive tool for exploring government structures. The next version of OrgChart (2.0) is currently being built directly on top of Nexoan. We’re focusing on making it more dynamic and data-rich, and we’ll be sharing more about that in the coming weeks.

Thanks for sticking with us, we’re excited about what’s ahead and look forward to showing you more soon.

Best regards,
The LDF Team

Zaeema Nashath

unread,
Jun 27, 2025, 9:35:09 AM6/27/25
to d...@datafoundation.lk
Hello everyone,

Since the last update, we now have working first versions of both Nexoan, our foundational architecture, and OrgChart 2.0.

On the OrgChart side, we've been able to integrate and capture new types of data and have built a new, easy-to-understand interface for users. We’re still refining the UI and expect to integrate additional government-related data over the coming weeks.

As for Nexoan, the first version is up and running with support for storing graph data in Neo4j and metadata in MongoDB. We’re currently working on a number of optimizations to improve performance and scalability. In parallel, we’re building the attribute persistence layer, which will allow us to store and manage tabular data. Another piece in progress is our polyglot database query engine, which will make it possible to run unified queries and retrieve aggregated responses from multiple underlying databases.

For OrgChart data collection, we’ve also started exploring automation and AI. We're experimenting with using large language models to automate the download and archiving of targeted gazettes from the documents.gov.lk website, focusing specifically on those related to changes in government structure. We're also working on extracting structured data from these gazettes, which will feed directly into the OrgChart application.

Best regards,
Zaeema Nashath

Zaeema Nashath

unread,
Jul 14, 2025, 10:34:45 AM7/14/25
to d...@datafoundation.lk
Hello everyone,

This sprint, we focused on improvements to Nexoan, continued automation of gazette processing, and made significant progress on OrgChart intelligence and chatbot development.

On the Nexoan side, we're continuing to build validation systems for both table and graph data, while also strengthening the reliability of the system through expanded testing. The Nexoan system itself has gone through a round of refactoring. We've added some new features such as the ability to filter relationships by attributes and direction and cleaned up existing ones. Additionally, all endpoints are now documented with a detailed, interactive interface to make them easier to understand and use.

In our automation work, our system for downloading and archiving government gazettes has successfully passed initial testing. It can now retrieve and archive documents into the correct folder structure, though the categorization can be unreliable at times. We are working on improving the accuracy of this system.

We’ve also made solid progress on automating the extraction of government structure changes from gazettes for use in the OrgChart. The system now handles both non-tabular amendments and those that are entirely table-based. We're currently working on refining it to also handle more complex cases where changes appear in both tables and paragraphs which requires fine-tuning of the prompts.

We’ve also made progress on the OrgChart classifier system, which helps us track structural changes in the government. It can now accurately detect atomic transactions like terminations and infer compound transactions like moves by analyzing sequences of add and terminate operations from LLM outputs. It includes APIs for exposing the latest state and detected transactions for human validation, while also saving state snapshots.

Lastly, we're working on an initial version of a chatbot to help users interact with our data. For the MVP, it will focus on supporting graph-based queries only. The chatbot will generate GraphQL queries based on what users ask.

Best Regards,
Zaeema Nashath

Reply all
Reply to author
Forward
0 new messages