Replace metadata location

31 views
Skip to first unread message

Vikram Roopchand

unread,
May 5, 2023, 2:43:07 AM5/5/23
to projectnessie
Hello There,

Hope you are doing well.

Is there a way to update the location of the table (current metadata) in Nessie, without actually loading the table from its present location ?

We have recently moved out of the existing Hadoop store. We have been able to migrate the actual hadoop files to the new store. However, Nessie still refers to the older location (JDBC backed store) of the current metadata file. Is there a way to update to the new location without actually loading the table from its older location. Like and "getTableMetadata(boolean getStoreOnly) >> Properties", "updateTableMetadata(Properties)" but this op works only on the persistent store instead of the file system. 

Tried "registerTable" however it tries to check for the table first on the location it has within, which is what we are trying to update. Tried several other ways (and learnt a boat load in process :D) but to no avail.

Would be grateful for any help.

best regards,
Vikram

Robert Stupp

unread,
May 5, 2023, 9:15:30 AM5/5/23
to Vikram Roopchand, projectnessie

Hi Vikram,

There no way to do that "on the fly". It's actually rather an Iceberg issue, there are a couple users hitting that exact same problem.

From Iceberg itself it's definitely not possible, because Iceberg reads the metadata pointer before you have a chance to change it, as you correctly noticed.

The only way I can think of would be to have a tool do that. It would roughly do this - just using the nessie-client artifact:

  1. Create a Nessie client instance
  2. Walk all named references
  3. If the reference is a branch, then:
    1. Fetch all entries from the branch's HEAD
    2. Filter on content type "ICEBERG_TABLE"
    3. Fetch the content objects for all these content keys
    4. Update the metadataPointer attribute in the IcebergContent object accordingly
    5. Commit the updated content objects (one commit can contain multiple "Put" operations though)
  4. If the reference is a tag, then:
    1. Create a temporary branch from the tag's HEAD
    2. (same as for a branch)
    3. Assign the tag to the new HEAD of the temporary branch
    4. Delete the temporary branch

The downside is obviously that you effectively lose access to the Iceberg metadata referenced in older Nessie commits, because it's not possible to change an existing commit (and won't be possible). However, this might be acceptable?

Do you want to take a stab at writing such a tool maybe? We already have some "infra" in place in Nessie in the "content generator tool": https://github.com/projectnessie/nessie/tree/main/tools/content-generator

Robert

PS: Feel free to join our chat https://project-nessie.zulipchat.com/

--
You received this message because you are subscribed to the Google Groups "projectnessie" group.
To unsubscribe from this group and stop receiving emails from it, send an email to projectnessi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/projectnessie/CAKxebTB%3D0T7gWrygmfs-DSEneCgYQnwhwkM6cxCQASWjNg8c0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
-- 
Robert Stupp
@snazy

Robert Stupp

unread,
May 6, 2023, 5:10:41 AM5/6/23
to Vikram Roopchand, projectnessie
Thinking a bit more about this, it’s likely not enough to just update the metadata pointer. We‘d also have to update all Iceberg manifest lists and manifest files, which also have the „old“ URI.

--
Robert Stupp
@snazy



--
Robert Stupp
@snazy
Am 05.05.2023 um 15:15 schrieb Robert Stupp <sn...@snazy.de>:



Vikram Roopchand

unread,
May 6, 2023, 7:21:22 AM5/6/23
to Robert Stupp, projectnessie
Dear Robert,

Thank you for replying.

At the moment this is exactly what I am doing. If successful then will move to your previous comments.

Thanks again, your help is much appreciated.

Will keep you posted.
Best regards,
Vikram
Reply all
Reply to author
Forward
0 new messages