StratML Sitemap & Catalog Updates

1 view
Skip to first unread message

Owen Ambur

unread,
Dec 11, 2022, 12:31:30 PM12/11/22
to Naval Sarda, pradee...@ictect.com, aboutthe...@googlegroups.com, William Glascoe III, Gayanthika Udeshani, Jeff Maynard
Naval, it took three iterations but I think all of the issues with my updated sitemap have now been fixed.  It points to the updated versions of the files that you converted to reference the most recent stylesheet and which I've placed in my new stratml.us/docs folder.

Last night I ran Joe Carmel's Perl script cataloguer again and posted the updated catalog at https://stratml.us/docs/catalogsitemap.html

There are now 5,578 files available in the stratml.us/docs folder for indexing and direct referencing via the query service.

I posted links to the catalog as well as to my manually maintained hypertext listing at https://aboutthem.info/  

One of the things I find pretty valuable about my hypertext listing is the capability to refer people to their own plans in the context of the collection, like this.  They are generally impressed not only to see their own plan in StratML format but also to see the large number of files in the collection.  It also allows them to see when PDFs are available.

I may not want to give up that capability anytime soon, which highlights the importance of making the process of indexing new files in the query service as quick and easy as possible, bearing in mind that it is an index of metadata and not the authoritative source of the files themselves.

It also prompts me to think about posting a project on Freelancer.com to explore prospects for re-developing Joe's Perl script to improve the presentation and utility of the catalog, to see if it might eventually replace my manually maintained listing.

There are features of both the catalog and my listing that either may no longer be worth maintaining or, alternatively, could be enhanced.  For example, I'm not sure sorting the files by web domain adds much value to the catalog, other than perhaps the .gov domain, which cannot be directly referenced in the catalog.  Likewise, while it may be useful to be able to selectively list charitable orgs, I'm not confident in my categorization of them as non-profit, public service organizations in my hypertext listing.

Of course, it also occurs to me that generating static browse listings from the query service is also a possibility, perhaps each time a new file is added to the index.  If we were to do that, we could also give users the opportunity to categorize their plans/reports to facilitate not only queries but also inclusion in selective browse listings (beyond the elements and attributes already available in the StratML schema itself, e.g., Stakeholder type).  

For example, .gov plans can be automatically classified as such based upon their source URLs.  Likewise, U.S. non-profits could be identified as such if they cite "501(c)(3)" in their Organization Descriptions.  However, we can't count on them to do that consistently.  For example, whereas my browse listing includes more than 700 nonprofits, the query service currently discovers only two who have cited "501(c)(3)" in their Stakeholder Descriptions" and none based on a broader "501" query ... which is a problem that should be fixed in the logic of the query service. 

While I don't want to complicate delivery of the basic query service right now, please keep prospects for categorizations in mind for later consideration.

Naval Sarda

unread,
Dec 13, 2022, 9:35:07 AM12/13/22
to Owen Ambur, pradee...@ictect.com, aboutthe...@googlegroups.com, William Glascoe III, Gayanthika Udeshani, Jeff Maynard

Hi Owen,

You can now test the query application with single word searches to verify search results and navigation by identifiers.

URL to test

http://198.38.86.242/

We have observed few XMLs do not have identifier and their corresponding Styled XML have identifiers like 1, 2, 3, etc. Those will not traverse. These records are very few in numbers.

We have fixed the description logic to some extent as per your instructions.

We are yet to fix the search feature for multiple keywords.

Known issue -

Search working only for single complete keyword only

Full text search is also working for single complete keyword only

Full text search is not navigating to the identifier at the moment.

We will share the estimates for simpler import option soon.

We will fix the known issues soon.

Naval Sarda

EpiComm Technologies

Owen Ambur

unread,
Dec 13, 2022, 12:05:58 PM12/13/22
to Naval Sarda, Jeff Maynard, aboutthe...@googlegroups.com
Thanks for the update, Naval.

My first test was to query "501" on the Stakeholder field.  It delivered no results.  So that query logic still needs some work.

My second was a full-text query for "501".  It took awhile to run but delivered 23 hits.  

However, a Google site-specific query turns up 91.  Bing doesn't provide a number but favors the PDFs.  

Note also that those queries can be directly referenced via their URLs.  It would be nice if the StratML-enabled queries could also be.

BTW, yesterday I voted to approve ISO/DIS 16684-4, Graphic technology — Extensible metadata platform (XMP) specification — Part 4: Use of XMP for semantic units.  I'm looking forward to learning if it enables direct external referencing of elements within PDF documents, as the StratML stylesheet does.  If so, we can have the best of both human- and machine-readable documents.



Reply all
Reply to author
Forward
0 new messages