Queries for Values & Goals/Objectives + Values

1 view
Skip to first unread message

Owen Ambur

unread,
Dec 13, 2022, 11:54:32 PM12/13/22
to Naval Sarda, aboutthe...@googlegroups.com
Naval, as shown in the first screen clip below, the Values query feature seems to be working well.

The Values Descriptions are being displayed in the search results list, and if they are blank, that's because the values are only named and not described in the documents ... in which case it is still good to know which Organizations have explicitly documented the referenced value.

However, that result shows why the column heading should probably just be "Description" rather than "Goal/Objective Description".

Also, since the values don't have identifiers, the links cannot point directly to them.  However, when only the Value element is queried, it would be good (but not essential) if the link could point to the Values section of the document, like this for the first hit in the list:  https://stratml.us/docs/Arca.xml#values_

The second screen clip shows that the Goal/Objective Descriptions are appropriately displayed when the query combines a search of that field with a search of the Values field.  The links appear to work properly too, pointing to the relevant goals/objectives.

The third screen clip shows the results when the full text is also queried, in this case for the term "civic" in combination with "education" goals/objectives and "honesty" as a value.

Lookin' good.

Naval Sarda

unread,
Dec 16, 2022, 10:21:19 AM12/16/22
to Owen Ambur, aboutthe...@googlegroups.com

Hi Owen,

We have implemented new logic for searching. Now it will work as per your expectations.

For Full text search, identifiers are not implemented as it will slow down the search feature immensely and it is hard to implement.

Organisation based sorting is pending.

Rest all the search related functionality is complete.

Naval

Owen Ambur

unread,
Dec 16, 2022, 5:08:34 PM12/16/22
to Naval Sarda, aboutthe...@googlegroups.com, ari.knau...@strategi-consulting.com, Andre Cusson, Gayanthika Udeshani, pradee...@ictect.com, Jorge Sanchez, Jeff Maynard
Naval, based upon the testing I've done just now, yes, the fielded queries now seem to be working properly with respect to the results generated for partial words.  For example, a query on the Stakeholder field for the term "student" turns up more hits (408) than "students" does (375).  A full-text query on "student" retrieved 1,042.

I'm providing the URL for the query service here in the event those I am copying may wish to check it out and give us feedback:  http://198.38.86.242/

Out of curiousity, I conducted queries of the Goal/Objective field to check the relative number of hits for the top 10 revealed in Ari Knausenberger's word cloud quite a few years ago.  Here's what I found (number of hits now/Ari's word cloud ranking):
  1. Education ~ 889/1
  2. Research ~ 869/10
  3. Communication ~ 574/9
  4. Collaboration ~ 535/6
  5. Partnership ~ 522/7
  6. Leadership ~ 461/3
  7. Infrastructure ~ 404/5
  8. Advocacy ~ 222/2
  9. Membership ~ 116/9
  10. Representation ~ 82/8
Education remained number 1.  Rearch moved up from 10th to 2nd.  Advocacy dropped from 2nd to 9th.

I'm not suprised that indexing the identifiers for full-text querying is resouce intensive.  While that's not necessarily a big issue, at some point we may wish to consider periodically generating static indices of the identifiers, like Andre Cusson used to do for:


Likewise, it is not essential to re-create statistics for the StratML collection like Andre formerly did at http://stratml.hyperbase.com/statistics.html.  However, as some point we may wish to consider word clouds or ordered listings of the names of goals/objectives, stakeholders, and values based upon their frequency of usage in the collection.

Finally, for now, I'd like to know how the query results listings are being ordered.  I'm not sure how they should be ranked but it seems like it might be a good idea to enable users to sort the organization names alphabetically.  Submission date seems like another possibility, i.e., to present the most recent submissions first (even though those dates are not being displayed).  Another possibility might be to consider the size of the file, on the assumption that the larger ones may be more substantive.  Just some thoughts.  No conclusions about what might make the most sense.  However, I am curious to know what the default is now.



Owen Ambur

unread,
Dec 17, 2022, 11:08:07 AM12/17/22
to Chris Fox, Naval Sarda, aboutthe...@googlegroups.com
Thanks for checking, Chris.  

Here are the results of three related queries that appear to indicate the search logic is working properly on the Stakeholder element:

Chris Fox = 2 hits
Chris F = 11 hits
Fox = 50 hits

A combined full-text query on "Strategy Development" and stakeholders named "Fox" turns up 3 hits, but I doubt the others are related to you personally.

Here are the results for a Goal/Objective query:

strat = 873 (including results for words like "administrative")
strate = 633
strateg = 633
strategi = 467
strategic = 334
strategy = 267
strategies = 172

BTW, since I created most of them, most of the plans that are public in your app are probably already indexed in the query service.  However, at some point we should explore prospects for: 

a) including those that others have created, and 
b) directly referencing them in your app.  

I suspect there may be complications in doing so.  However, the points I want to demonstrate are that StratML files should be posted on the organization's own websites (or intemediary sites they choose to use) and the query service should index and reference them whereever they reside on the Web.



On Saturday, December 17, 2022 at 03:05:08 AM EST, Chris Fox <ch...@chriscfox.com> wrote:


I did the obvious thing and put my name in the stakeholder field... :-)

It seems to be working well now from that perspective!

C

--
You received this message because you are subscribed to the Google Groups "AboutThem.info Plan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to AboutThemInfoP...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/AboutThemInfoPlan/2018750881.604723.1671228508055%40mail.yahoo.com.


--
Chris Fox
Chris C Fox Consulting Limited
            
Have you tried https://www.StratNavApp.com, the online collaborative tool for strategy development and execution?

Chris C Fox Consulting Limited is registered in England and Wales as a Private Limited Company: Company Number 6939359. Registered Office: Optionis House, 840 Ibis Court Centre Park Warrington Cheshire WA1 1RL

Chris Fox

unread,
Dec 19, 2022, 3:17:46 AM12/19/22
to Owen Ambur, Naval Sarda, aboutthe...@googlegroups.com
Hi Owen,

On Sat, 17 Dec 2022 at 16:08, Owen Ambur <owen....@verizon.net> wrote:
BTW, since I created most of them, most of the plans that are public in your app are probably already indexed in the query service.  However, at some point we should explore prospects for: 

a) including those that others have created, and 
b) directly referencing them in your app.  

I suspect there may be complications in doing so.  However, the points I want to demonstrate are that StratML files should be posted on the organization's own websites (or intemediary sites they choose to use) and the query service should index and reference them whereever they reside on the Web.

I think we talked about this before.

There would be no problems if you pointed the indexer at https://www.stratnavapp.com/PublicProjects and hoovered up whatever you found. There is also a sitemap for those files at https://www.stratnavapp.com/PublicProjects/sitemap.xml

Both are automatically updated, so you could check them on a scheduled basis.

If your search app had some sort of API or webhook, it would be easy enough to have StratNavApp.com inform it everytime a new plan is published and/or updated on StratNavApp.com in StratML format. Let me know if that is something you would like to explore.

I don't foresee any complications in doing so.

Kind Regards,
Chris

Owen Ambur

unread,
Dec 20, 2022, 9:24:41 PM12/20/22
to Chris Fox, Naval Sarda, aboutthe...@googlegroups.com
Chris, I want to make sure the basic query service is working properly before taking on additional complexity.  So I'm not aiming to automate the ingestion of files from other sites at this point, e.g., via an API.

However, one of the essential requirements is for me, as administrator, to be able to import documents into the database both one at a time as well as in groups (e.g. via FTP or from sitemap).  So if we can identify some that are available in your app that are either not already indexed in the query service or in better form on you site, importing them would be a good test of that capability.

For example, while the copy of your plan on the stratml.us site is already indexed in the query service, I see that a more recent update is available at https://www.stratnavapp.com/StratML/Part1/5509a8d6-32f5-440d-a1bd-48f33bc9a252/Styled

Naval, perhaps some of the programming logic enabling me to import files could be reused and extended to enable other, non-authenticated users to submit files for: 

a) validation against the schema, 
b) confirmation of the submission by the submitter, e.g., via a link sent to the submitter's E-mail address; and 
c) review/approval for indexing by adminstrators.  

While I would not want to hold up development of the initial service in order to accommodate those additional complexities, neither would I want to miss the opportunity to facilitate the extension of the service to include such capabilities in the future if possible without requiring much additional time or effort right now.



Owen Ambur

unread,
Dec 21, 2022, 1:17:54 PM12/21/22
to Chris Fox, Naval Sarda, aboutthe...@googlegroups.com
Chris, being innundated with submissions for review prior to indexing is a problem I'd love to have.  

Hopefully, the privilege won't be abused, at least not until StratML is widely enoughed used for hackers to care about it; we can rely upon the document hosts to ensure they don't contain inappriate words, and perhaps the review process can be eliminated.  However, this point we don't have Google's algorithms and other resources to police the content and I'd rather be safe than sorry.

As soon as Naval is able to provide the capability for me to import files, I'll see about replacing my copy of your plan with the more recent and authorititive source on you site, as my first test of that capability.

BTW, it has long seemed to me that, like the social media, Google is ripe for creative destruction by query services that take advantage of the semantics and structure of valid XML documents, which it seem Google ignores.  When time permits, I plan to check to see what, if anything, ChatGPT does with them.



On Wednesday, December 21, 2022 at 03:31:22 AM EST, Chris Fox <ch...@chriscfox.com> wrote:




On Wed, 21 Dec 2022 at 02:24, Owen Ambur <owen....@verizon.net> wrote:
Chris, I want to make sure the basic query service is working properly before taking on additional complexity.  So I'm not aiming to automate the ingestion of files from other sites at this point, e.g., via an API.

I will be ready and waiting whenever you're ready.
 
For example, while the copy of your plan on the stratml.us site is already indexed in the query service, I see that a more recent update is available at https://www.stratnavapp.com/StratML/Part1/5509a8d6-32f5-440d-a1bd-48f33bc9a252/Styled

You've said on previous occasions that you think it would be better for people to host their own StratML on their own sites. Here is an example of where that is happening. So really, you should remove the copy on stratml.us as it is out of date. I can understand why you might not want to do that until you can index the correct and up to date copy though.

Naval, perhaps some of the programming logic enabling me to import files could be reused and extended to enable other, non-authenticated users to submit files for: 

a) validation against the schema, 
b) confirmation of the submission by the submitter, e.g., via a link sent to the submitter's E-mail address; and 
c) review/approval for indexing by adminstrators.  

Validation against the schema should be a given. If it does not validate, it is not StratML and it should not be indexed.

I am not sure why you want to review/approve each and every document submitted. You could be creating a rod for your own back. What would be the basis for your review.

And don't forget, StratML documents should be dynamic. They should change from time to time. So your indexer needs to check back periodically. Are you going to review them every time.

Google has made an art out of indexing documents for search. I would encourage you to follow their lead as much as possible.

Kind Regards,
Chris

Naval Sarda

unread,
Dec 26, 2022, 10:20:52 AM12/26/22
to Owen Ambur, Chris Fox, aboutthe...@googlegroups.com

Hi Owen,

We have resolved issue of searching keywords with .(DOT). Now searching is working with exact phrases. 
Sorting is not yet working properly but we are working on it.
Regarding basic feature of importing xml files, as earlier said we have two options for it.



1. Importing through FTP- Multiple files can be imported through this option. You can upload files in one particular folder from where our program will pick files one by one. After that another program will validate format of file content and remove unwanted tags and attribute from the file to make file suitable for indexing. Then file will get saved at desired locations. This process will run periodically after every 30 min or so. This will take 30 hours.

 

 

2. Importing through web UI- Only one file can be imported through this. This will take extra 12 hours for authentication feature + uploading file functionality. Total will be 30 + 12 = 42 hours. This will cover bulk importing through FTP as well.


Also note that file name needs to be same in case you are updating existing record. In case, name is changed, then it will be treated as new record.

Regarding ingestion through API, we need to decide on identifying for existing records some how.


So estimates 1 covers the base as such and all other modes of insertion will be adding to it. So Estimates 2 includes work done in estimate 1. So with few additional hours, we can add more modes on insertion. You have mentioned approval process for anonymous submission. Here if you need admin UI for approval, it will cost more. You can instead get an email for approval and click the link to approve after reviewing externally.


Naval

Owen Ambur

unread,
Dec 26, 2022, 9:45:00 PM12/26/22
to Naval Sarda, aboutthe...@googlegroups.com
Naval, a full-text query on ".gov" turns up 2,253 hits but some of them apparently are false positives.  I'll do more testing as time permits.

Regarding the import capabilities, for now FTP should be sufficient and we could defer the Web import feature and combine it with the anonymous submission capability.

However, before we make that decision, I'd like to know whether cPanel could be used to support the import feature.  I prefer FTP and don't use cPanel to manage the content of my sites much, if at all.  However, I have used it occasionally for other purposes, like granting developers access to my sites and checking my website usage statistics.  I suspect that I may want to regain access to it for use with the VPS, particularly if it makes sense and I am able to transfer my other domains to it along with the aboutthem.info domain.

While I had assumed that import capabilities were included in the project, I'm not adverse to paying $540 for FTP import capability and another $216 for Web import feature.  However, I want to carefully consider the alternatives as well as the sequencing of logically separable capabilities.  For example, since the existing collection has already been imported in bulk, there may be no need for bulk importing again anytime soon, e.g., via FTP.  

So if it might be possible to enable just me to use a Web import feature with less development effort, that might make sense, particularly if: 

a) the confirmation process for other submitters can be added on later, and/or 

b) I do eventually gain cPanel access to the VPS and can use it for bulk imports.

Finally, for now, can you please confirm my recollection that the baseline cost on which we agreed was around $3,800 -- in which case these two features plus the $432 I have already paid for the file conversion, which was clearly a separate project, would bring the cost to just under $5,000 (plus the hosting cost of the VPS)?



Naval Sarda

unread,
Dec 26, 2022, 11:39:04 PM12/26/22
to Owen Ambur, aboutthe...@googlegroups.com

Hi Owen,

Current price quoted is for file validation against XSD, removing characters which baseX does not like and importing into Basex automatically. It does not matter if file / files are uploading through cpanel or FTP. Cpanel does provide interface like File Manager to upload files. It is alternative to FTP transfer.

Also by bulk upload via FTP is actully uploading files to a folder and scheduler to pick whichever files posted in that posted. So it really does not matter whether single file is uploading at one go or more than one file is uploaded.

So cost will remain same $540 for FTP / Cpanel import (single or bulk)

Yes initial cost was $3800.

And you have paid additional for separate project $432

Let me know if we should start implementing FTP import feature.

Naval

Owen Ambur

unread,
Dec 27, 2022, 10:40:54 AM12/27/22
to Naval Sarda, aboutthe...@googlegroups.com
Yes, Naval, if the functionalities are needed regardless of how the files are uploaded, the sooner they are available, the better.  However, I'd just like to confirm that I will be able to use my existing FTP client to do that.

Also, I believe I've mentioned this before, but when full-text queries are conducted alone, without using any of the other three query fields, I believe the hit list should show the organizations' Mission statements (rather than the Plan Description, which may not be populated and is usually less concise, standardized, and precise than the Mission statement, which is almost always populated).

Whenever the Goal/Objective field is queried, regardless of which other field(s) may be queried, the Description of the Goal/Objective should be presented in the hit list and the link should point to the Goal/Objective.

It would also be good when either the Stakeholder or Value fields are queried alone to display their Descriptions and link directly to them.  However, since they don't have Identifiers, that may be difficult to do.  Since Stakeholder queries may generally be combined with Goal/Objective queries, may be OK to display and link to the Goal/Objective Descriptions with which the Stakeholders are associated even when the Goal/Objective field is not directly queried.

However, when the Values field is queried alone, it might be best to link to the Mission statement, on the assumption that it is the simplest and most consistent way of describing what an organization aims to do in support of its values.



Naval Sarda

unread,
Dec 27, 2022, 12:07:02 PM12/27/22
to Owen Ambur, aboutthe...@googlegroups.com

Hi Owen,

We tried to analyze false positive records of full text search of .gov. We have used inbuilt function for comparing search keyword in whole text of xml file. This inbuilt function internally takes texts of all tags in file and concatenate all these text with each other. While concatenating text that function do not add any space in-between two texts. Due to this if there is one tag which is having text with "full stop(.)" and next tag has keyword like "Government" then concatenated text becomes ".Government"
In this way if we search ".gov", these record comes in result.

 

To resolve this scenario, we can't use inbuilt function and that will reduce the performance of search. Such scenarios might be rare in general. It is advised to ignore this scenario in interest of speed of full text search.


Naval

Naval Sarda

unread,
Dec 28, 2022, 8:07:17 AM12/28/22
to Owen Ambur, aboutthe...@googlegroups.com

Hi Owen,

Full search text mentioned below has been addressed. Please review. Also sorting based on organisation is done. Since all the files are imported on same date, sorting by date is not feasible to implement.

Naval

On 27/12/22 9:10 pm, Owen Ambur wrote:

Owen Ambur

unread,
Dec 28, 2022, 12:20:30 PM12/28/22
to Naval Sarda, aboutthe...@googlegroups.com
Naval, I defer to you on technical issues like this and will keep an open mind about how best to deal with the trade-offs going forward.



Reply all
Reply to author
Forward
0 new messages