Re: StratML Import file feature through FTP is ready

4 views
Skip to first unread message

Owen Ambur

unread,
Jan 6, 2023, 5:34:29 PM1/6/23
to Naval Sarda, aboutthe...@googlegroups.com
Naval, there is a typo in the upload file pathname:  "startml" should  be "stratml".

Also, if I understand Sudarshana's assumption correctly, it is false.  Files may be uploaded to the query service from anywhere on the Web and not just from the stratml.us site.  Indeed, the entire collection on the stratml.us site is prototypical in nature.  Hopefully, someday all of the authoritative sources of the files will reside on the organizations' own websites and those on the stratml.us site can be deprecated.  However, that may be fairly far in the future.

When the pathname is corrected, I'll need the UID &  password to upload files.  I'll also need to know which we the most recent files that were imported in batch so that I can figure out where to pick up on uploading those that have been created since then.  If you can tell me where that break occurs in my sitemap listing, that would be helpful.  https://stratml.us/docs/sitemap.xml 



On Friday, January 6, 2023 at 01:24:33 PM EST, Naval Sarda <nsa...@epicomm.net> wrote:


Please see below



-------- Forwarded Message --------
Subject: StratML Import file feature through FTP is ready
Date: Fri, 6 Jan 2023 23:46:24 +0530
From: Sudarshana <sudar...@epicomm.net>
To: Naval Sarda <nsa...@epicomm.net>


Hi Owen

We have deployed application of Import File Feature through the FTP.

If you upload files in  /home/startml/Uploaded_xml_Files/  location on Server that files will get indexed after processing.

We are assuming that those files are getting uploaded externally on https://stratml.us/docs/ this location.


--
Thanks & Regards
Sudarshana

Owen Ambur

unread,
Jan 9, 2023, 5:39:11 PM1/9/23
to Naval Sarda, aboutthe...@googlegroups.com
Naval, I was able to log on with the credentials you sent but unable to upload a file.  See error message in screen shot below.

Also, over-riding files with the same name may and almost certainly will become a problem when we open the service to submissions by others.  In the meantime, I've been dealing with filename collisions by adding numbers to files with the same name in my hperlinked index.  See, for example, https://stratml.us/drybridge/index.htm#OGP-USNAP3

I'm not sure what you mean by saying the the files must be uploaded to https://stratml.us/docs in order to be indexed in the query service.  If that's true, that's a problem that needs to be fixed.  The query service links should point to the files wherever they reside on the Web.


Inline image



On Monday, January 9, 2023 at 05:01:14 PM EST, Naval Sarda <nsa...@epicomm.net> wrote:


Please see below



-------- Forwarded Message --------
Subject: Re: Fwd: StratML Import file feature through FTP is ready
Date: Mon, 9 Jan 2023 23:23:57 +0530
From: Sudarshana <sudar...@epicomm.net>
To: Naval Sarda <nsa...@epicomm.net>, Balasaheb Pandarkar <balas...@epicomm.net>, kom...@epicomm.net, Jitendra Shende <jite...@epicomm.net>


Owen,

We have changed startml folder name to stratml.

In another email we are sending you credentials for FTP, that will open one folder where you have to upload files.(/home/stratml/Uploaded_xml_Files)

Our program will run every 30 minute and picks up files from this folder. Then another program will do processing on that files and if files are perfect for indexing then add those files in indexing folder. Then those files will get deleted from the FTP folder.

After 30 minute when we search  on we application that files should be visible in search query.

If names are identical then file will get overwritten and if filename is different then new file gets added.

To explain this import feature I will explain you with one example.

I have taken one file 1MMMM.xml, renamed it with Test10.xml and imported with this feature.

When I searched by Stakeholder name "Hal Burrows", I got 2 records in result. When I click on view link of first record it redirect to "https://stratml.us/docs/1MMMM.xml" URL. As this file is present on https://stratml.us , that opens successfully. When I click on second link, that will redirect to "https://stratml.us/docs/Test10.xml" URL. As this file is not present on https://stratml.us, it cant open and gives error of "Resource Not found".

For this we have to upload that file on https://stratml.us also. As this location is external to us and located out of the server where the program is running.

So, while uploading file on FTP you have to upload it on https://stratml.us/docs location also. Please refer below images.





-Sudarshana
On 1/7/2023 11:08 AM, Naval Sarda wrote:


See below

-------- Forwarded Message --------
Subject: Re: StratML Import file feature through FTP is ready
Date: Fri, 6 Jan 2023 22:34:10 +0000 (UTC)
From: Owen Ambur <owen....@verizon.net>
Reply-To: Owen Ambur <owen....@verizon.net>
To: Naval Sarda <nsa...@epicomm.net>
CC: aboutthe...@googlegroups.com <aboutthe...@googlegroups.com>

Naval Sarda

unread,
Jan 9, 2023, 5:55:01 PM1/9/23
to Owen Ambur, aboutthe...@googlegroups.com

Hi Owen,

We will check on the FTP issue you have mentioned.

Third party upload will be phase 2. That will involve approval process as we will need to figure out giving option to update existing listing as well through web interface. That time we can added third party links.

Naval

Owen Ambur

unread,
Jan 9, 2023, 9:32:42 PM1/9/23
to Naval Sarda, aboutthe...@googlegroups.com
Naval, I'm not sure what you mean by "third party links."  A link (URL) is a link (URL).  

Is there something about the links in the query service that make the links something other than that?

If so, it seems to me that's problem that needs to be fixed.

Please explain.



Owen Ambur

unread,
Jan 11, 2023, 11:06:56 AM1/11/23
to Naval Sarda, aboutthe...@googlegroups.com
Naval, capturing the URLs and using them for the links provided by the query service has always been an essential requirement.  

However, I see why uploading via FTP does not capture them -- unless the <Source> element contains the URL for the StratML document itself, which is what should occur in the long run but is not yet the case (in the prototypical StratML collection).  

So that is a point we missed in our previous exchanges.  While I don't necessarily want to bear the cost of all such oversights and misunderstandings, I am willing to pay another $216 to resolve this one.

I need to know more about what you mean by "auto refreshing".  If you mean automatically checking to see if the files still exist at their URLs and removing them from the index if they don't, that is not necessarily a priority for me.  If the URLs generate 404 errors, I'm happy to let query service users know that the files formerly existed and perhaps to point them to copies in the Internet Archive.  Again, however, that's more than I'd want to take on right now.



On Wednesday, January 11, 2023 at 10:30:46 AM EST, Naval Sarda <nsa...@epicomm.net> wrote:


Hi Owen,

We had discussed FTP feature uploading stratml file. Now you have suggested that you want the URLs as well.

We are suggesting a feature by which the FTP file you will be uploading will have only the URLs and program will download the content of the stratml file from the web and either insert new record in database or update if that URL is already present. This implementation will take additional 16 hours.

This will solve most of the issues like over riding and even open the gateway to implement auto refreshing the database in future as we will have URLs which we can use to download and reindex the database on periodic basis. Auto refreshing is separate task and not included in current estimates.

Naval


On 11/01/23 1:09 am, Owen Ambur wrote:
Naval, I'm getting concerned about the URL issue.  It sounds like the existing programming logic may be faulty.  I'd rather not pay more for the functionality that should have been built in from the start.

The URLs for the authoritative versions of the files are not "assigned" when they are indexed in the query service; the URLs already exist and point to where the files eside on the Web.  Those URLs should automatically be captured in the database when each of files is indexed.

It also sounds like we'll need to give more thought to the file name collision issue.  Presumably, if the URLs were properly treated in the database, that would not be an issue.

For example, when file collisons occur in my collection on the StratML.us site, I manually add a sequential number to the file name.  It is not a "random" number.  Again, however, that is only necessary because I am maintaining files for so many different organizations on my site.  If they were maintaining them on their own sites, at their own URLs, that would not be an issue.  It would be up to them to decide whether to over-write the previous versions or maintain them for historical purposes (which is my preference).  The desired end state is for each organization to maintain its own plans and reports on its own website, for indexing not only in my query service but many others as well.

I hope we can get these issues resolved.



On Tuesday, January 10, 2023 at 10:57:46 AM EST, Naval Sarda <nsa...@epicomm.net> wrote:


Please see below



-------- Forwarded Message --------
Subject: Re: Fwd: StratML Import file feature through FTP is ready
Date: Tue, 10 Jan 2023 20:59:10 +0530
From: Sudarshana <sudar...@epicomm.net>
To: Naval Sarda <nsa...@epicomm.net>, Balasaheb Pandarkar <balas...@epicomm.net>, kom...@epicomm.net, Jitendra Shende <jite...@epicomm.net>


Owen,

Please use port number 21 to upload files. Permission denied issue is now resolved.

Currently we have implemented basic import functionality with FTP, considering that this will be used by you for your internal purpose . As we are doing this through FTP, to update the existing files we have to use overwrite functionality if name of the files are same. If we assign random number after each file, then it won't update existing record and there will be lot of duplicates.

Along with file If you want to assign the URL for that file where it should open on View Hyperlink, we can provide one input control on web UI to provide that URL. We will then save that URL along with the uploaded files. It will take another 16 hours of efforts along with initial 16 hours of web UI to upload the files. That web UI will be visible after login only. This login will be only for you. Not for public. You can take call whether to implement in such way or not. Direct FTP won't map the file with the URL, hence web UI is needed.

Regarding public submissions of stratml files, we can assign random numbers to the files they are uploading and save the unique filename so generated in database, so that they can update the records they have uploaded using web UI.  This is essentially next phase and is not part of scope of current phase. 

-Sudarshana

Naval Sarda

unread,
Jan 11, 2023, 11:27:11 AM1/11/23
to Owen Ambur, aboutthe...@googlegroups.com

Hi Owen,

Query service currently imports the data on submission once. But if people are updating the StratML file on their server and are not planning to resubmit again, it is not going to update in our database as it is one time import funcationality.

This is where refreshing the query search database in needed. This is similar concept where google indexes our website periodically by crawling occasionaly. So whatever changes we make to our website, may get indexed in google after few days and not immediately. We have not estimated for auto refreshing the database functionality.

Naval

Owen Ambur

unread,
Jan 11, 2023, 11:44:21 AM1/11/23
to Naval Sarda, aboutthe...@googlegroups.com
Thanks for the clarification, Naval.  

That's beyond the scope of what I want to do right now and may not be a priority anytime in the near-future either.

However, if others decide to build StratML-enabled services, exchanging data with them will become a priority.

In the meantime, if individuals and organizations start posting StratML files on their own sites and then change the file names or the URLs, I'm comfortable with leaving it up to them to submit the new URLs for indexing.  If they simply change the files at the same URLs, the query service would still point to them.



Owen Ambur

unread,
Mar 9, 2023, 11:05:09 AM3/9/23
to Naval Sarda, aboutthe...@googlegroups.com
No, Naval, I did not approve the method of creating a new .txt file containing the URLs for uploading via FTP.  Doing it that way adds more work for me, and as I've said before, I don't want to get into the business of maintaining UIDs and passwords for others.  Nor, as a matter of usability, do I expect others to take the time and trouble to create FTP accounts in order to submit files for indexing.

From my perspective, this problem stems back to your misunderstanding of the purpose of the query service as a metadata repository and not a repository for the files themselves, which are to be accessed via their URLs.  So capturing the URLs has always been an essential requirement.  As long as I am the only one submitting the files and they are all in the same stratml.us/docs directory, their URLs can sometimes be inferred from the acronym of the organization in the file itself.  However, that is not always the case.  So, again, capturing the URLs is essential.

If we were to settle on a method for me to upload the URLs via FTP, I'd prefer to use the sitemap listing that I'm already maintaining rather than to be required to take the extra step of creating .txt file each time I want to submit another file for indexing.  It still seems to me that it would be preferable to have a relatively simple form on the site itself for submitters to provide the URLs, along with their E-mail addesses for confirmation of their submissions.  For now, I'd be the only one to use it but the functionality could be relatively seamlessly extended to others if and, hopefully, when we're ready to do so.

However, the bottom line is that I am willing to try this new method for now and see how it works just for me, as an alternative to the previous method of uploading the files themselves, with the caveat that I want you and your developer thing about how best to use the sitemap listing instead of a .txt file.  Thus, far uploading a .txt file doesn't seem to have worked for me but I'll try it again soon and let you know if I have better luck.

In the meantime, the two "Hypha" files that I mentioned in a separate message still aren't showing up in queries despite the fact I've tried indexing them using both methods multiple times.  So we need to figure out why they're apparently not being indexed.  



On Thursday, March 9, 2023 at 08:15:13 AM EST, Naval Sarda <nsa...@epicomm.net> wrote:


Hi Owen,

See below email where you had approved the flow of URL import feature. Both emails in the email chain below back to back where I have mentioned the flow and you had replied approving the flow.

Naval

Owen Ambur

unread,
Mar 16, 2023, 3:10:01 PM3/16/23
to Naval Sarda, aboutthe...@googlegroups.com
Success.  It appears that the new files I included in a sitemap listing were all indexed, including the two "Hypha" files that I repeatedly tried to upload previously.  Looks like we're getting close to an official release.

I want to document the status of the project in my plan/report at https://aboutthem.info/SQS.xml

We'll need to figure out how best to reference the information at https://aboutthem.info/ when we make http://198.38.86.242/ the default (index) page for the AboutThem.info site.  I'm thinking that perhaps the best way to do so is to include an "About Us" link at the bottom of the search screen page.  

I'm also thinking about how best to document and share known issues, like the false positives in full-text queries.  (I've requested to join the BaseX listserv to inquire about prospects for addressing that issue but have not been approved to rejoin the listserv yet.)

As you know from my message to Jim Saia, I'd also like to have the code posted on GitHub &/or another open source code repository(ies).

And I want to make sure the query service is sustainable, i.e., that it can be maintained (by me &/or others without software programming skills).  Or, stated conversely, that we recognize any foreseable events that might cause it to "break" (as my XForms have on occasion).  I'll especially want to be able to edit the narrative text (and images) on the site without breaking the programmatic logic.

Now that the MVP is nearly ready for release, I also want to think about and begin documenting what we might want to do next.  For example, we may wish to add Organization Names to the query fields and/or to begin including dates in the query results lists.

If we were to do the latter, we'd need to consider which date(s) to display, taking into account the fact that only the Submission date can be relied upon to be populated but it is not as relevant as the Start Dates & End Dates for the plans/reports and the Performance Indictors.  So its not a simple matter.

I'd also like to consider prospects for periodically running queries and saving the results in static hyperlinked listings similar to Joe Carmel's catalogue and/or my hypertext listing of the collection.  I wouldn't necessarily want to duplicate either of those.  However, I do find it very useful to be able to point others directly to files within the full listing of them.

Also, it seems to me there would be value in provding hyperlinked indices of the names of goals/objectives and values in descending order of frequency.  Ideally, clicking on a link on each name would execute a search and reveal the plans in which it occurs.  To get a better sense of what I'm talking about, see my article on values and the frequency-/rank-ordered listing of 369 values that I manually documented.  A variation on that theme might be to include pull-down menus on the search fields with the most frequent names appearing at the top.

I'll want to include a mailto comment link, probably on the About Us page, for input and feedback on bugs and enhancement suggestions.  Initially, it should probably be my E-mail address, but at some point, we might want to use tthe abouttheminfoplan Google discussion group or set up a new discussion forum.

Again, however, now that the MVP seems to be in good working order, the next things I'd like to do are get it ready for activation at AboutThem.info and document the performance indicators in my plan/report at https://aboutthem.info/SQS.xml



On Thursday, March 16, 2023 at 05:44:14 AM EDT, Naval Sarda <nsa...@epicomm.net> wrote:


Hi Owen,

We have fixed permission denied issue. Please test again.

Naval

On 14/03/23 11:54 pm, Owen Ambur wrote:
Tried that.  See error message below.
On Tuesday, March 14, 2023 at 02:14:20 PM EDT, Naval Sarda <nsa...@epicomm.net> wrote:


Hi Owen, 
This is old ftp location. Can you try using older ftp client which was working for you

Naval
Get Outlook for iOS
 

From: Owen Ambur <owen....@verizon.net>
Sent: Tuesday, March 14, 2023 10:23 PM
To: Naval Sarda <nsa...@epicomm.net>

Subject: Re: StratML Import file feature through FTP is ready
 
Naval, I received both files and followed the instructions but received this error message:

Inline image




On Tuesday, March 14, 2023 at 10:12:46 AM EDT, Naval Sarda <nsa...@epicomm.net> wrote:


Hi Owen,

I have shared one email regarding FTP connection with two attachments. If it gets blocked and does not reach you, let me know as one of the files has .xml extension.

Naval

On 14/03/23 6:25 am, Owen Ambur wrote:
Naval, User Directory - /xml is not showing up in my FTP view using the log-on credentials in the most recent messages you've sent me.

See the first screen shot below.

It does show up with the previous log-on credentials.  However, as shown in the second screen shot, permission is denied to upload files to it.


On Monday, March 13, 2023 at 08:33:08 PM EDT, Naval Sarda <nsa...@epicomm.net> wrote:


Yes, it will import all.

Naval

On 14/03/23 5:58 am, Owen Ambur wrote:
Naval, does this mean the new feature will import ALL of the files listed in the sitemap each time the program runs?

If so, I will maintain two different versions, one with all of the files and the other with just the new one(s) to be indexed in the query service.





Reply all
Reply to author
Forward
0 new messages