[ec2-user@ip-10-1-101-13 bin]$ ./tachyon loadufs tachyon://10.1.101.13:19998/ami s3n://mybucket/
java.io.IOException: tachyon.exception.TachyonException: Path /files/staging/data/test/ABC Data Readings.xml is invalid.
at tachyon.client.TachyonFS.createFile(TachyonFS.java:322)
at tachyon.client.AbstractTachyonFS.createFile(AbstractTachyonFS.java:83)
at tachyon.client.TachyonFS.createFile(TachyonFS.java:66)
at tachyon.client.UfsUtils.loadUfs(UfsUtils.java:165)
at tachyon.client.UfsUtils.loadUfs(UfsUtils.java:79)
at tachyon.client.UfsUtils.main(UfsUtils.java:222)
Caused by: tachyon.exception.TachyonException: Path /files/staging/data/test/ABC Data Readings.xml is invalid.
at tachyon.client.FileSystemMasterClient.loadMetadata(FileSystemMasterClient.java:444)
at tachyon.client.TachyonFS.createFile(TachyonFS.java:319)
... 5 more
Usage: java -cp target/tachyon-0.8.2-jar-with-dependencies.jar tachyon.client.UfsUtils <TachyonPath> <UfsPath> [<Optional ExcludePathPrefix, separated by ;>]
Example: java -cp target/tachyon-0.8.2-jar-with-dependencies.jar tachyon.client.UfsUtils tachyon://127.0.0.1:19998/a hdfs://localhost:9000/b c
Example: java -cp target/tachyon-0.8.2-jar-with-dependencies.jar tachyon.client.UfsUtils tachyon://127.0.0.1:19998/a file:///b c
Example: java -cp target/tachyon-0.8.2-jar-with-dependencies.jar tachyon.client.UfsUtils tachyon://127.0.0.1:19998/a /b c
In the TFS, all files under local FS /b will be registered under /a, except for those with prefix c
Hi Gene,
Thanks very much for the quick reply, super helpful!
(I will raise a JIRA issues for point 4)
Regarding point 2;
We have observed Drill requesting the location of the file from Tachyon. Drill then reads the file directly from s3. However the file read (single csv file in this case) is never observed being loaded into memory on any of the workers.
We have tried various settings (on the tachyon cluster)
-Dtachyon.user.file.understoragetype.default=SYNC_PERSIST
-Dtachyon.user.file.tachyonstoragetype.default=STORE
( from reading https://groups.google.com/forum/#!msg/tachyon-users/P54fHo3a9YQ/qKJqB5mEAAAJ )
and also
-Dtachyon.user.file.readtype.default=CACHE_PROMOTE
-Dtachyon.user.file.writetype.default=CACHE_THROUGH
As I understand, if the file doesn't exist in tfs and if the above are set a replica of the file should be loaded into one of the worker nodes?
My assumption is that tachyon will load the file into memory (is readType = CACHE* ) and not the client, is correct? OR does this action get performed by the client?
Is there possibly some configuration we are missing when configuring the tachyon client within drill?
0: jdbc:drill:zk=local> show files;
+----------------------+--------------+---------+-----------+--------+--------+--------------+--------------------------+--------------------------+
| name | isDirectory | isFile | length | owner | group | permissions | accessTime | modificationTime |
+----------------------+--------------+---------+-----------+--------+--------+--------------+--------------------------+--------------------------+
| somecsv.csv | false | true | 96763174 | | | rw-rw-rw- | 2016-01-19 02:34:26.651 | 2016-01-19 02:34:26.651 |
+----------------------+--------------+---------+-----------+--------+--------+--------------+--------------------------+--------------------------+
9 rows selected (0.243 seconds)
--
0: jdbc:drill:zk=local> select * from `somecsv.csv`;
<..snip>
738,267 rows selected (221.565 seconds)
I've been snooping api calls between the nodes and have not seen any api calls made other than;
- getFileInfo
- getFileId
- getBlockInfo
Nothing of note in any of the log files (debug=true).
Your help is greatly appreciated!
Thanks
Peter
295e9569-227a-8f8a-3500-18f69506882a:frag:0:0[1] list
224 long currentBlockId = getCurrentBlockId();
225 if (mCurrentBlockInStream == null || mCurrentBlockInStream.remaining() == 0) {
226 closeCacheStream();
227 updateBlockInStream(currentBlockId);
228 => if (mShouldCacheCurrentBlock) {
229 try {
230 // TODO(calvin): Specify the location to be local.
231 mCurrentCacheStream =
232 mContext.getTachyonBlockStore().getOutStream(currentBlockId, -1,
233 NetworkAddressUtils.getLocalHostName(ClientContext.getConf()));
...snip...
Step completed: "thread=295e9569-227a-8f8a-3500-18f69506882a:frag:0:0", tachyon.client.file.FileInStream.checkAndAdvanceBlockInStream(), line=235 bci=68
235 LOG.warn("Failed to get TachyonStore stream, the block " + currentBlockId
295e9569-227a-8f8a-3500-18f69506882a:frag:0:0[1] list
231 mCurrentCacheStream =
232 mContext.getTachyonBlockStore().getOutStream(currentBlockId, -1,
233 NetworkAddressUtils.getLocalHostName(ClientContext.getConf()));
234 } catch (IOException ioe) {
235 => LOG.warn("Failed to get TachyonStore stream, the block " + currentBlockId
236 + " will not be in TachyonStorage. Exception:" + ioe.getMessage());
237 mShouldCacheCurrentBlock = false;
238 }
239 }
240 }
.. snip...
295e9569-227a-8f8a-3500-18f69506882a:frag:0:0[1] eval ioe
ioe = "java.io.IOException: TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)"
This exception is thrown when we have drill and tachyon deployed on the same node.
I think that the exception when we did not have them on the same node was something like not being about to find a local worker.
Your thoughts would be greatly appreciated.
tachyon-start.sh all SudoMount
0: jdbc:drill:zk=local> select * from `ctest`;
Error: SYSTEM ERROR: ThriftIOException: Failed to delete /mnt/ramdisk/tachyonworker/3252252999903914303/693201010688
Storage Usage Summary
Total Capacity / Used 24.00GB / 1024.00KB
MEM Capacity / Used 4096.00MB / 1024.00KB
SSD Capacity / Used 20.00GB / 0.00B
Tiered Storage Details
Alias
Path
Capacity
Space Used
Space Usage
MEM /mnt/ramdisk/tachyonworker 4096.00MB 1024.00KB
100%Free
SSD /var/tmp/tachyonworker 20.00GB 0.00B
100%Free
File Path
In-MEM
In-SSD
In-HDD
Size Creation Time Modification Time
View Settings
Number of items per page: 10__________________
Maximum number of pages to show in pagination component: 10__________________
[BUTTON] Update
$ pwd
/mnt/ramdisk/tachyonworker/3252252999903914303
$ ls -l
total 0
-rw-r--r-- 1 root root 0 Jan 24 22:25 693201010688
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:foreman] INFO org.apache.drill.exec.store.parquet.Metadata - Took 932 ms to read file metadata
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - getFileStatus(/ctest/0_0_0.parquet): HDFS Path: s3n://<mybucketname>/ctest/0_0_0.parquet TPath: tachyon://10.1.101.13:19998/ctest/0_0_0.parquet
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - open(tachyon://10.1.101.13:19998/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - Folder /mnt/ramdisk/tachyonworker/6103841725690598561 was created!
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - LocalBlockOutStream created new file block, block path: /mnt/ramdisk/tachyonworker/6103841725690598561/693201010688
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - Folder /mnt/ramdisk/tachyonworker/257469779071470002 was created!
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - LocalBlockOutStream created new file block, block path: /mnt/ramdisk/tachyonworker/257469779071470002/693201010688
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - Connecting local worker @ ip-10-1-101-10.<aws-region>.compute.internal/10.1.101.10:29998
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO - open(/ctest/0_0_0.parquet, 4096)
It emits about 16 of the BLOCK_ALREAY_EXISTS exceptions - and the file doesn't get cached.
File size is 8.5Kb
When we run a similar command to read a different file we don't see any exceptions (and the file gets cached as expected):
[29516878-a93d-9f6d-47f7-9d7e0549c305:foreman] INFO - listStatus(tachyon://10.1.101.13:19998/): HDFS Path: s3n://<mybucketname>/
[29516878-a93d-9f6d-47f7-9d7e0549c305:foreman] INFO - getFileStatus(tachyon://10.1.101.13:19998/MiscTX_craig.csv): HDFS Path: s3n://<mybucketname>/MiscTX_craig.csv TPath: tachyon://10.1.101.13:19998/MiscTX_craig.csv
[29516878-a93d-9f6d-47f7-9d7e0549c305:foreman] INFO org.apache.drill.exec.store.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 using 1 threads. Time: 19ms total, 19.989193ms avg, 19ms max.
[29516878-a93d-9f6d-47f7-9d7e0549c305:foreman] INFO org.apache.drill.exec.store.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 using 1 threads. Earliest start: 1.190000 μs, Latest start: 1.190000 μs, Average start: 1.190000 μs .
[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO - getWorkingDirectory: /
[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO - open(tachyon://10.1.101.13:19998/MiscTX_craig.csv, 4096)
[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO - Folder /mnt/ramdisk/tachyonworker/4822792725232426273 was created!
[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO - LocalBlockOutStream created new file block, block path: /mnt/ramdisk/tachyonworker/4822792725232426273/693217787904
[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO - open(tachyon://10.1.101.13:19998/MiscTX_craig.csv, 4096)
[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO - Folder /mnt/ramdisk/tachyonworker/8468772189556308610 was created!
[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO - LocalBlockOutStream created new file block, block path: /mnt/ramdisk/tachyonworker/8468772189556308610/693217787904
[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO org.apache.drill.exec.work.fragment.FragmentExecutor - 29516878-a93d-9f6d-47f7-9d7e0549c305:0:0: State change requested AWAITING_ALLOCATION --> RUNNING
No exceptions seem to be generated on this file.
File size of this file is 92.3Mb.
I'm not sure if this caused by drill not reading the whole file -- I am doing a 'select * from ctest;' (the first file) so I expect that it should read it all.
The tachyon worker is running as root.
Thanks
Peter
--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.