Hi Owen,
We did not find a manual way to compare with Jeo Carmel's Stratml sitemap catalog.
You can upload all the files you have on FTP and it will process
all of them. So if it is already there, it will replace.
Naval
Please see below
Subject: | Re: Fwd: Missing Files? |
---|---|
Date: | Wed, 18 Jan 2023 21:50:16 +0530 |
From: | Sudarshana <sudar...@epicomm.net> |
To: | Naval Sarda <nsa...@epicomm.net>, Jitendra Shende <jite...@epicomm.net>, kom...@epicomm.net, Balasaheb Pandarkar <balas...@epicomm.net> |
Owen,
Herewith we are sending you list of all files which
are indexed. See attached.
See below
-------- Forwarded Message --------
Subject: Re: Missing Files? Date: Tue, 17 Jan 2023 17:10:23 +0000 (UTC) From: Owen Ambur <owen....@verizon.net> Reply-To: Owen Ambur <owen....@verizon.net> To: Naval Sarda <nsa...@epicomm.net>
Thanks, Naval, but I'm having a hard time understanding why Joe Carmel's cataloguer, which runs off the sitemap listing:
a) finds 36 more files than have been imported into the query service, andb) if these 12 files were not at the URLs listed in the sitemap, why his cataloguer did not identify them as missing, as it has done for missing files in the past (even if they were not really missing but were missed due to network issues).
I'm also having a hard time finding these 12 files in my local archives. However, these two entries in a previous version of the sitemap may provide a clue regarding CDIR_2:
<url><loc>https://stratml.us/carmel/iso/part2/CDIR.xml</loc></url><url><loc>https://stratml.us/carmel/iso/CDIR.xml</loc></url>
There were both Part 1 & Part 2 versions of that plan. The Part 2 version has been indexed in the query service but the Part 1 version apparently has not.
I don't think that would be the case for as many as 36 files and I haven't yet found the other 11 in my local archives. However, I also did a bit of sleuthing in the Internet Archive and was able to discover these typos in the URLs:
M4GA stands for Mayors for Guaranteed Income and should be M4GI:
GBERGB should be GBERGP: https://web.archive.org/web/20220515233145/https://stratml.us/carmel/iso/GBERGPwStyle.xml
I'm also led to believe that OSBP is DODOSBP and I was able to find USCC and LOC 2019
I FTP'ed those five files for indexing in the query service. The other six are still a bit of a mystery.
I'm not going to worry too much about this and I don't know if I'll be able to make sense of a complete listing of >5.5K files in the query service in comparison to either my sitemap listing, Joe Carmel's catalog, or my hyperlinked listing. However, I'll look forward to learning if there might be a way to reconcile the discrepancy without taking too much time or effort.
Owen Ambur
On Tuesday, January 17, 2023 at 08:05:38 AM EST, Naval Sarda <nsa...@epicomm.net> wrote:
Hi Owen,
The following list of files were on sitemap,xml but not on the locations pointed by sitemap when we scrapped them in Decemeber.
EPON.xml
ASSC.xml
SECAPP2023.xml
CNDDGR.xml
isoIVPA.xml
OSBP.xml
USCC.xml
CDIR_2.xml
M4GA.xml
ECOLISE.xml
GBERGB.xml
LOC2019.xmlWe will share entire list of files on query server soon so that you can compare what is missing.
Naval
On 16/01/23 9:52 pm, Owen Ambur wrote:
Naval, that enabled me to identify three files saved on 12/8 that appear in my sitemap listing above the CCA.xml file and were apparently not included in the batch import into the query service:
However, that still leaves 36 unaccounted for.
If you can tell me the date that the files were downloaded from the stratml.us site for transformation to conform to the latest version of the schema, I can determine which ones may have been created after that but before I started FTP'ing others into the query service.
Owen Ambur
On Saturday, January 14, 2023 at 10:11:38 PM EST, Naval Sarda <nsa...@epicomm.net> wrote:
Hi Owen,
This was the topest file in the sitemap we have downloaded last
https://stratml.us/carmel/iso/CCA.xml
Naval
On 15/01/23 8:00 am, Owen Ambur wrote:
Naval, I realized I could probably tell where the cut-off was for files that were imported in batch into the query service, based on the date they were all conveted and copied to my stratml/docs folder, i.e., December 9.
I just re-ran Joe Carmel's cataloguer and see that there are now 31 more files in the collection (5,609) than there were when I last ran the cataloguer on December 10 (5,578). https://stratml.us/docs/catalogsitemap.html
It appears that 5,570 files have now been indexed in the query service, meaning that 39 of them may be missing, but I'm now sure how to determine which ones they might be.
Any suggestions?
Owen Ambur
Hi Owen,
Developer working on your project is occupied on a time sensitive
project. Once she is free, she will complete URL import feature.
Most likely next week she will work on your project. We did not
review entire Joe's cataloguer though. It is hard to figure out
the difference manually. Programmatically it can be verified but
that involves engaging programmer to do so.
Naval