cacheFromUrl with Hyrax servers failed to retrieve subdirectory files

25 views
Skip to first unread message

jessy.b...@hakai.org

unread,
Aug 31, 2023, 3:54:38 PM8/31/23
to ERDDAP
Hi, 
We're hoping to generate a number of different ERDDAP datasets by pulling data that is already made available via a hyrax server

We're using, as suggested by the erddap docs, the <cacheFromUrl> feature to maintain a copy locally of the files served by ERDDAP.

ERDDAP seems to fail to detect any files present in subdirectories. As an example, using the following cacheFromUrl below will fail to detect any files present within the subdirectory.  = http://mistwatch.uwaterloo.ca/opendap/hyrax/data/Amundsen_Science_and_CIOOS/521/

From log:

*** constructing EDDTableFromFiles amundsen521
dir/file table doesn't exist: /erddapData/dataset/21/amundsen521/dirTable.nc
dir/file table doesn't exist: /erddapData/dataset/21/amundsen521/fileTable.nc
creating new dirTable and fileTable (dirTable=null?true fileTable=null?true badFileMap=null?false)
doQuickRestart=false
* amundsen521 EDDTableFromNcFiles.makeCopyFileTasks  pathRegex=.*  fileNameRegex=.*
from http://mistwatch.uwaterloo.ca/opendap/hyrax/data/Amundsen_Science_and_CIOOS/521/CCIN521_20110705_CASES_0402_CTD_Franklin_04_07_2011/
to   /datasets/amundsen/521/
  EDDTableFromNcFiles.makeCopyFileTasks: lastFinishedTask=-1 < datasetLastAssignedTask(amundsen521)=null? pendingTasks=false
  EDDTableFromNcFiles.makeCopyFileTasks: no matching source files.

*** An error occurred while trying to load amundsen521:
java.lang.RuntimeException: datasets.xml error on or before line #27657: 0 files found in /datasets/amundsen/521/


However, pointing directly to a subdirectory instead where files are actually present (example) will successfully download a copy of those files locally.

I believe, we sucessfully managed to get ERDDAP to retrieve files from HYRAX subdirectories before. This was however with an older version of ERDDAP (we're now using 2.23) and a different hyrax instance. I'm wondering if this could be related. 

This is the header of an example xml file (see also attached file for full dataset xml):
<dataset type="EDDTableFromNcFiles" datasetID="amundsen521" active="true">
    <reloadEveryNMinutes>10080</reloadEveryNMinutes>
    <updateEveryNMillis>10000</updateEveryNMillis>
    <fileDir>/datasets/amundsen/521/</fileDir>
    <fileNameRegex>.*</fileNameRegex>
    <recursive>true</recursive>
    <pathRegex>.*</pathRegex>
    <metadataFrom>last</metadataFrom>
    <standardizeWhat>0</standardizeWhat>
    <sortedColumnSourceName></sortedColumnSourceName>
    <sortFilesBySourceNames></sortFilesBySourceNames>
    <fileTableInMemory>true</fileTableInMemory>
    <cacheFromUrl>http://mistwatch.uwaterloo.ca/opendap/hyrax/data/Amundsen_Science_and_CIOOS/521/</cacheFromUrl>
    <accessibleViaFiles>true</accessibleViaFiles>
    <!-- sourceAttributes>

...

Is there something I'm doing wrong?

Thanks for all the help,
Jessy Barrette
Hakai Institute - CIOOS

amundsen521.xml

bobsimons2.00

unread,
Sep 1, 2023, 6:55:16 AM9/1/23
to ERDDAP
It is likely that you are doing everything correctly and that this is a problem in ERDDAP due to a new version of Hyrax having changed how it lists directories and files. Basically, ERDDAP is screen scraping the directory and file information from Hyrax's web pages. If a new Hyrax version makes a change in how it stores the information on the web pages, then there needs to be a corresponding change to ERDDAP to add support for this new variant of Hyrax. Screen scraping is prone to not working when the remote system changes.

Chris John, can you please look into this? If it is the problem described above: There are test cases for all of the types of remote sources which are supported. You will need to add another test case for this version of Hyrax. 

Jesse, until there is a new version of ERDDAP which fixes this problem, I suggest you write a script which creates and maintains the local copy of the Hyrax dataset's directories and files. I'm sorry you have to do that. Thanks for notifying us of the problem. It will be a good thing to fix in ERDDAP.

Jessy Barrette

unread,
Sep 1, 2023, 7:46:46 AM9/1/23
to bobsimons2.00, ERDDAP
Awesome thanks Bob, we will use a temporary fix in the meantime.

--
You received this message because you are subscribed to the Google Groups "ERDDAP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to erddap+un...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/erddap/2ac39510-03ac-4d9e-a92f-f535a53b2311n%40googlegroups.com.


--
Jessy Barrette M.Sc.
Marine Data and Instrumentation Specialist
Reply all
Reply to author
Forward
0 new messages