Hi There,
In the POC setup we have at work, we were using an IP address for accessing the HDFS. And consequently, it (via core-site.xml) was set up like this as well.
<property>
<name>fs.defaultFS</name>
<value>hdfs://
192.168.1.28:9000</value>
</property>
Parquet files were accessed with the complete path using the IP address: For example hdfs://
192.168.1.28:9000/blah/blah.parquet. This is also what went into Nessie catalog as the location. We could check via the web url.
As luck would have it, the IP address changed and Nessie stopped accessing those files with "no route to host " exception (via HDFS libs). We had to delete the catalog (persisted on postgres) and re-run the migration to get those test tables back.
Is it possible that hostname/ip or initial path can be provided from outside and nessie only store paths relative to it ? What happened here can always happen in Prod. Also, I don't think it to be a great practice of storing the exact/complete URL, that binds you too tightly to the specifics.
best regards,
Vikram