Hey there! I've been struggling to get Nessie GC to work when the FileIO should be GCSFileIO (rather than S3FileIO which is supported out of the box).
From what I could tell, passing the `-H` params is necessary when not using S3FileIO since it falls back to HadoopFileIO by default. However, when I run this, I see the following which seems to indicate that neither Iceberg's GCSFileIO, nor the Google Hadoop file system classes can be found.
2023-11-10 10:56:32,938 [ForkJoinPool-1-worker-1] INFO o.p.gc.identify.IdentifyLiveContents - live-set#63965210-652d-48de-9e28-534bbd861bdf: Start walking the commit log of Branch{name=main, metadata=null, hash=56d92b26e2db94f1ca8ff22c6c6c104ba83c02f0496b76c8356e1b665f9bbdb1} using no cutoff (keep
everything).
2023-11-10 10:56:33,328 [ForkJoinPool-1-worker-1] INFO o.p.gc.identify.IdentifyLiveContents - live-set#63965210-652d-48de-9e28-534bbd861bdf: Finished walking the commit log of Branch{name=main, metadata=null, hash=56d92b26e2db94f1ca8ff22c6c6c104ba83c02f0496b76c8356e1b665f9bbdb1} using no cutoff (k
eep everything) after 323 commits, no more commits.
2023-11-10 10:56:33,328 [ForkJoinPool-1-worker-1] INFO o.p.gc.identify.IdentifyLiveContents - live-set#63965210-652d-48de-9e28-534bbd861bdf: Finished walking all named references, took PT0.764379S: numReferences=1, numCommits=323, numContents=316, shortCircuits=0.
Finished Nessie-GC identify phase finished with status IDENTIFY_SUCCESS after PT0.764443S, live-content-set ID is 63965210-652d-48de-9e28-534bbd861bdf.
2023-11-10 10:56:33,362 [main] INFO o.p.g.e.local.DefaultLocalExpire - live-set#63965210-652d-48de-9e28-534bbd861bdf: Starting expiry.
2023-11-10 10:56:33,369 [ForkJoinPool-3-worker-3] INFO org.apache.iceberg.CatalogUtil - Loading custom FileIO implementation: org.apache.iceberg.gcp.gcs.GCSFileIO
2023-11-10 10:56:33,370 [ForkJoinPool-3-worker-3] WARN o.apache.iceberg.io.ResolvingFileIO - Failed to load FileIO implementation: org.apache.iceberg.gcp.gcs.GCSFileIO, falling back to org.apache.iceberg.hadoop.HadoopFileIO
java.lang.IllegalArgumentException: Cannot initialize FileIO, missing no-arg constructor: org.apache.iceberg.gcp.gcs.GCSFileIO
at org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:312)
at org.apache.iceberg.io.ResolvingFileIO.lambda$io$1(ResolvingFileIO.java:185)
at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1708)
at
org.apache.iceberg.io.ResolvingFileIO.io(ResolvingFileIO.java:174)
at org.apache.iceberg.io.ResolvingFileIO.newInputFile(ResolvingFileIO.java:82)
at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:266)
at org.projectnessie.gc.iceberg.IcebergContentToFiles.extractFiles(IcebergContentToFiles.java:78)
at org.projectnessie.gc.expire.PerContentDeleteExpired.lambda$identifyLiveFiles$2(PerContentDeleteExpired.java:125)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
at java.base/java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1707)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at java.base/java.util.stream.ReduceOps$5.evaluateSequential(ReduceOps.java:258)
at java.base/java.util.stream.ReduceOps$5.evaluateSequential(ReduceOps.java:248)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.count(ReferencePipeline.java:709)
at org.projectnessie.gc.expire.PerContentDeleteExpired.identifyLiveFiles(PerContentDeleteExpired.java:133)
at org.projectnessie.gc.expire.PerContentDeleteExpired.expire(PerContentDeleteExpired.java:73)
at org.projectnessie.gc.expire.local.DefaultLocalExpire.expireSingleContent(DefaultLocalExpire.java:104)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3573)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:960)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:934)
at java.base/java.util.stream.AbstractTask.compute(AbstractTask.java:327)
at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Caused by: java.lang.NoSuchMethodException: Cannot find constructor for interface org.apache.iceberg.io.FileIO
Missing org.apache.iceberg.gcp.gcs.GCSFileIO [java.lang.ClassNotFoundException: org.apache.iceberg.gcp.gcs.GCSFileIO]
at org.apache.iceberg.common.DynConstructors.buildCheckedException(DynConstructors.java:250)
at org.apache.iceberg.common.DynConstructors.access$200(DynConstructors.java:32)
at org.apache.iceberg.common.DynConstructors$Builder.buildChecked(DynConstructors.java:220)
at org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:309)
... 32 common frames omitted
Suppressed: java.lang.ClassNotFoundException: org.apache.iceberg.gcp.gcs.GCSFileIO
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
...
...
I'm wondering if anyone either can see what's wrong here and offer some help, or has experience using Nessie GC with GCS data.