Initializing Tachyon

158 kali dilihat
Langsung ke pesan pertama yang belum dibaca

Peter M

belum dibaca,
17 Jan 2016, 19.55.2217/01/16
kepadaTachyon Users
Hi 

I am trying to understand how I initialise Tachyon so that I can see and access the files in the underfs and get them loaded into memory.

Relevant config is as follows:

I am using 0.8.2
underfs is S3

tachyon.user.file.readtype.default CACHE_PROMOTE
tachyon.user.file.writetype.default  CACHE_THROUGH

I have 1 master and 2 worker nodes with tiered storage configured:
4GB RAM each
20GB SSD each

So total storage space is 48GB

I want to access the files from Drill and Spark.

All my files are sitting in a bucket in S3 and I would like to be able to access all of these files from Drill (to start).

What is the best / correct way to initialise the tachyon file system.

When I start tachyon - there is nothing in the tachyon file system.

I mount the S3 bucket as follows:
./tachyon tfs mount /files s3n://mybucket/
-- this creates the mount dir but I still can't see any of the files in the s3 Bucket

Then I run  loadufs - it loads the folder structure and file info/metadata into Tachyon
-- so now I can see and query the files in the bucket

But there are a couple of things that I am not clear about:

1. Is this the right approach?  I want to pre-load the file metadata in the bucket into Tachyon so they can be accessed from drill by the end users. How do people generally do this?

2. Although I can access the files through tachyon it never seems to load the files into tachyon memory unless I manually run tfs load on each file.
Drill can access and read the file but it never gets loaded into memory until I run tfs load.
I thought that CACHE_PROMOTE should mean that when Drill reads the file, it gets copied into Tachyon memory as well as being read from the underFS.

3. Do you normally deploy tachyon onto the same nodes as the clients eg spark and drill?  

One other thing I have noticed is that loadufs seems to fail when the file name has spaces in it. 
For example:

[ec2-user@ip-10-1-101-13 bin]$ ./tachyon loadufs tachyon://10.1.101.13:19998/ami s3n://mybucket/

java.io.IOException: tachyon.exception.TachyonException: Path /files/staging/data/test/ABC Data Readings.xml is invalid.

       at tachyon.client.TachyonFS.createFile(TachyonFS.java:322)

       at tachyon.client.AbstractTachyonFS.createFile(AbstractTachyonFS.java:83)

       at tachyon.client.TachyonFS.createFile(TachyonFS.java:66)

       at tachyon.client.UfsUtils.loadUfs(UfsUtils.java:165)

       at tachyon.client.UfsUtils.loadUfs(UfsUtils.java:79)

       at tachyon.client.UfsUtils.main(UfsUtils.java:222)

Caused by: tachyon.exception.TachyonException: Path /files/staging/data/test/ABC Data Readings.xml is invalid.

       at tachyon.client.FileSystemMasterClient.loadMetadata(FileSystemMasterClient.java:444)

       at tachyon.client.TachyonFS.createFile(TachyonFS.java:319)

       ... 5 more

Usage: java -cp target/tachyon-0.8.2-jar-with-dependencies.jar tachyon.client.UfsUtils <TachyonPath> <UfsPath> [<Optional ExcludePathPrefix, separated by ;>]

Example: java -cp target/tachyon-0.8.2-jar-with-dependencies.jar tachyon.client.UfsUtils tachyon://127.0.0.1:19998/a hdfs://localhost:9000/b c

Example: java -cp target/tachyon-0.8.2-jar-with-dependencies.jar tachyon.client.UfsUtils tachyon://127.0.0.1:19998/a file:///b c

Example: java -cp target/tachyon-0.8.2-jar-with-dependencies.jar tachyon.client.UfsUtils tachyon://127.0.0.1:19998/a /b c

In the TFS, all files under local FS /b will be registered under /a, except for those with prefix c



Thanks
Peter

Gene Pang

belum dibaca,
18 Jan 2016, 10.05.4918/01/16
kepadaTachyon Users
Hi Peter,

Thanks for your detailed questions!

1) How you described your process is generally how it is done in Tachyon. The metadata is loaded lazily, or on demand, so everything doesn't have to be loaded all at once. However, even all the metadata is not loaded, applications and clients should be able to access files that have not been loaded into Tachyon yet. If a client asks for a file not in Tachyon yet and under a mount point, Tachyon will load the metadata on demand.

2) Hrmmm, that is how CACHE should work. Could you try debugging this scenario further? When the client reads the file, what is printed in the client log and the worker logs?

3) For deployment, it really depends on the use case and environment, but usually, it is good to co-locate the Tachyon workers with the computation framework cluster. So, in your case, it will probably be better to co-locate Tachyon workers with Spark nodes.

4) This could be a bug in how s3 names are used. Please create a JIRA issue here: https://tachyon.atlassian.net/projects/TACHYON/issues . Also, you are always welcome and encouraged to contribute fixes back to the code base!

Thanks,
Gene

Peter M

belum dibaca,
19 Jan 2016, 23.53.5719/01/16
kepadaTachyon Users

Hi Gene, 


Thanks very much for the quick reply, super helpful! 

(I will raise a JIRA issues for point 4)


Regarding point 2; 


We have observed Drill requesting the location of the file from Tachyon. Drill then reads the file directly from s3. However the file read (single csv file in this case) is never observed being loaded into memory on any of the workers. 



We have tried various settings (on the tachyon cluster) 


  -Dtachyon.user.file.understoragetype.default=SYNC_PERSIST

  -Dtachyon.user.file.tachyonstoragetype.default=STORE

  ( from reading https://groups.google.com/forum/#!msg/tachyon-users/P54fHo3a9YQ/qKJqB5mEAAAJ )


 and also 


 -Dtachyon.user.file.readtype.default=CACHE_PROMOTE

 -Dtachyon.user.file.writetype.default=CACHE_THROUGH


As I understand, if the file doesn't exist in tfs and if the above are set a replica of the file should be loaded into one of the worker nodes? 


My assumption is that tachyon will load the file into memory (is readType = CACHE* ) and not the client, is correct? OR does this action get performed by the client?


Is there possibly some configuration we are missing when configuring the tachyon client within drill? 


0: jdbc:drill:zk=local> show files;

+----------------------+--------------+---------+-----------+--------+--------+--------------+--------------------------+--------------------------+

|         name         | isDirectory  | isFile  |  length   | owner  | group  | permissions  |        accessTime        |     modificationTime     |

+----------------------+--------------+---------+-----------+--------+--------+--------------+--------------------------+--------------------------+

| somecsv.csv           | false        | true    | 96763174  |        |        | rw-rw-rw-    | 2016-01-19 02:34:26.651  | 2016-01-19 02:34:26.651  |

+----------------------+--------------+---------+-----------+--------+--------+--------------+--------------------------+--------------------------+

9 rows selected (0.243 seconds)


--

0: jdbc:drill:zk=local> select * from `somecsv.csv`; 

<..snip>

738,267 rows selected (221.565 seconds)


I've been snooping api calls between the nodes and have not seen any api calls made other than;

  - getFileInfo

  - getFileId

  - getBlockInfo


Nothing of note in any of the log files (debug=true).


Your help is greatly appreciated! 


Thanks 

Peter

Gene Pang

belum dibaca,
20 Jan 2016, 20.30.1920/01/16
kepadaTachyon Users
Hi Peter,

This seems strange. Are you sure Tachyon is reading the entire file? Tachyon caches blocks when they are read in full.

What do the client/application logs say? There might be some information in those logs.

Thanks,
Gene

Peter M

belum dibaca,
22 Jan 2016, 01.28.0922/01/16
kepadaTachyon Users
Hi Gene,

Tachyon does seem to be reading the whole file.. we have tried files ranging in size from 1K to 100Mb and they can be successfully read.

We haven't been able to see anything in the logs but have managed to look at the client in jdb.
The TachyonStorageType is set to STORE and the client attempts to the cache the file, however we think the failure occurs in the following bit of code in the client -- actually the exception gets thrown in getOutStream() --- its not clear why:

295e9569-227a-8f8a-3500-18f69506882a:frag:0:0[1] list

224        long currentBlockId = getCurrentBlockId();

225        if (mCurrentBlockInStream == null || mCurrentBlockInStream.remaining() == 0) {

226          closeCacheStream();

227          updateBlockInStream(currentBlockId);

228 =>       if (mShouldCacheCurrentBlock) {

229            try {

230              // TODO(calvin): Specify the location to be local.

231              mCurrentCacheStream =

232                  mContext.getTachyonBlockStore().getOutStream(currentBlockId, -1,

233                         NetworkAddressUtils.getLocalHostName(ClientContext.getConf()));

...snip...

Step completed: "thread=295e9569-227a-8f8a-3500-18f69506882a:frag:0:0", tachyon.client.file.FileInStream.checkAndAdvanceBlockInStream(), line=235 bci=68

235              LOG.warn("Failed to get TachyonStore stream, the block " + currentBlockId


295e9569-227a-8f8a-3500-18f69506882a:frag:0:0[1] list

231              mCurrentCacheStream =

232                  mContext.getTachyonBlockStore().getOutStream(currentBlockId, -1,

233                         NetworkAddressUtils.getLocalHostName(ClientContext.getConf()));

234            } catch (IOException ioe) {

235 =>           LOG.warn("Failed to get TachyonStore stream, the block " + currentBlockId

236                  + " will not be in TachyonStorage. Exception:" + ioe.getMessage());

237              mShouldCacheCurrentBlock = false;

238            }

239          }

240        }

.. snip...

295e9569-227a-8f8a-3500-18f69506882a:frag:0:0[1] eval ioe

 ioe = "java.io.IOException: TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)"

This exception is thrown when we have drill and tachyon deployed on the same node.


I think that the exception when we did not have them on the same node was something like not being about to find a local worker.


Your thoughts would be greatly appreciated.


Thanks
Peter

Gene Pang

belum dibaca,
24 Jan 2016, 14.41.0924/01/16
kepadaTachyon Users
Hi Peter,

I have a few questions about your situation. In the Tachyon UI, where does Tachyon think the file is? Is it in memory?

Are your clients co-located with the Tachyon workers?

Thanks,
Gene

Peter M

belum dibaca,
24 Jan 2016, 18.21.2224/01/16
kepadaTachyon Users
Hi Gene 

Here is the information that I have been able to gather:

1. I am starting the tachyon cluster using 
tachyon-start.sh all SudoMount

2. There is a worker node co-located with the client -- when I start the cluster it lists space-used as 0 Kb in the co-located worker node.

3. Once I perform a read of a file from tachyon that is in the underFS, I see two things happen.

a. On the first time I do a read I get the following error:

0: jdbc:drill:zk=local> select * from `ctest`;


Error: SYSTEM ERROR: ThriftIOException: Failed to delete /mnt/ramdisk/tachyonworker/3252252999903914303/693201010688


b. If I do the same read again, it works ie. it can read the files and do the select above.

4. After the read has succeeded, the file is not listed as In Memory either via the UI or via tfs ls, however the co-located worker reports that it has used 1024Kb.
It doesn't show and files in memory and doesn't list any blocks on the worker web UI (copy paste from Lynx shows the screens below), 

Storage Usage Summary


   Total Capacity / Used  24.00GB / 1024.00KB

   MEM Capacity / Used  4096.00MB / 1024.00KB

   SSD Capacity / Used     20.00GB / 0.00B


Tiered Storage Details

  Alias

  Path

  Capacity

  Space Used

  Space Usage

  MEM /mnt/ramdisk/tachyonworker 4096.00MB 1024.00KB

   100%Free

  SSD /var/tmp/tachyonworker 20.00GB 0.00B

   100%Free


BlockInfo on the worker UI page is empty, as follows:

   File Path  

  In-MEM

  In-SSD

  In-HDD

  Size Creation Time Modification Time


View Settings
 Number of items per page:                 10__________________

 Maximum number of pages to show in pagination component: 10__________________

 [BUTTON] Update


The block exists as a file in the ramdisk but it is zero size:Q

$ pwd

/mnt/ramdisk/tachyonworker/3252252999903914303

$ ls -l

total 0

-rw-r--r-- 1 root root 0 Jan 24 22:25 693201010688



This file is small -- only 8.4KB

If I repeat this process with a bigger file ~90Mb  I see the same behaviour 
ie. delete block exception on first time access, another 1Mb of MEM shows as used in the worker, the second read works -- but no files /blocks show up and there is a zero byte file with that blockID in the ramdisk

nThe 1Mb is consumed after the Failed to delete exception -- nothing additional seems to happen after the successful read.

Let me know if you need any further information.


Thanks
Peter

Peter M

belum dibaca,
24 Jan 2016, 23.37.4424/01/16
kepadaTachyon Users
Hi Gene 

ok, so we have tried running the Tachyon worker as root and that seems to resolve the "Failed to delete" exception and now it seems to work as expected for some files , ie. it will cache the files as they are read.

There does seem to be some conditions where it will cache and where it won't.
-- the 8.45Kb test file never seems to be cached, is this expected?
-- our 90Mb test file does get cached when it is read serially in a single operation
-- some other large parquet files don't seem to be cached -- my guess is this may be due to the client not reading the whole file ?? in this case it doesn't look like a complete block is read and therefore doesn't get cached? but this is just a guess.

We are digging into why this is the case as our understanding was that it doesn't need to be run as root, it maybe something to do with our local config.
But the good news is it kinda sometimes works now -- which is a step forward.

Thanks
Peter

Gene Pang

belum dibaca,
27 Jan 2016, 00.41.3327/01/16
kepadaTachyon Users
Hi Peter,

Tachyon will cache blocks which are fully read. If only some of the block is read, it will not cache the block. Therefore, it sounds like the files are not being completely read by the client? Do you know if Drill is doing something special and not reading the entire file?

Thanks,
Gene

Peter M

belum dibaca,
31 Jan 2016, 19.11.0331/01/16
kepadaTachyon Users
Hi Gene,

Here are the logs from a parquet file access (does NOT get cached in memory)  and a CSV file access (that DOES get cached in memory) from drill:

[29516b49-c6ed-66af-bbad-c8b51bebd5d3:foreman] INFO org.apache.drill.exec.store.parquet.Metadata - Took 932 ms to read file metadata
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - getFileStatus(/ctest/0_0_0.parquet): HDFS Path: s3n://<mybucketname>/ctest/0_0_0.parquet TPath: tachyon://10.1.101.13:19998/ctest/0_0_0.parquet
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - open(tachyon://10.1.101.13:19998/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - Folder /mnt/ramdisk/tachyonworker/6103841725690598561 was created!
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - LocalBlockOutStream created new file block, block path: /mnt/ramdisk/tachyonworker/6103841725690598561/693201010688
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - Folder /mnt/ramdisk/tachyonworker/257469779071470002 was created!
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - LocalBlockOutStream created new file block, block path: /mnt/ramdisk/tachyonworker/257469779071470002/693201010688
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - Connecting local worker @ ip-10-1-101-10.<aws-region>.compute.internal/10.1.101.10:29998
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN  - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN  - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN  - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN  - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN  - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN  - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - open(/ctest/0_0_0.parquet, 4096)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] WARN  - Failed to get TachyonStore stream, the block 693201010688 will not be in TachyonStorage. Exception:TachyonTException(type:BLOCK_ALREADY_EXISTS, message:Temp blockId 693,201,010,688 is not available, because it already exists)
[29516b49-c6ed-66af-bbad-c8b51bebd5d3:frag:0:0] INFO  - open(/ctest/0_0_0.parquet, 4096)


It emits about 16 of the BLOCK_ALREAY_EXISTS exceptions - and the file doesn't get cached.

File size is 8.5Kb


When we run a similar command to read a different file we don't see any exceptions (and the file gets cached as expected):


[29516878-a93d-9f6d-47f7-9d7e0549c305:foreman] INFO  - listStatus(tachyon://10.1.101.13:19998/): HDFS Path: s3n://<mybucketname>/

[29516878-a93d-9f6d-47f7-9d7e0549c305:foreman] INFO  - getFileStatus(tachyon://10.1.101.13:19998/MiscTX_craig.csv): HDFS Path: s3n://<mybucketname>/MiscTX_craig.csv TPath: tachyon://10.1.101.13:19998/MiscTX_craig.csv

[29516878-a93d-9f6d-47f7-9d7e0549c305:foreman] INFO org.apache.drill.exec.store.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 using 1 threads. Time: 19ms total, 19.989193ms avg, 19ms max.

[29516878-a93d-9f6d-47f7-9d7e0549c305:foreman] INFO org.apache.drill.exec.store.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 using 1 threads. Earliest start: 1.190000 μs, Latest start: 1.190000 μs, Average start: 1.190000 μs .

[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO  - getWorkingDirectory: /

[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO  - open(tachyon://10.1.101.13:19998/MiscTX_craig.csv, 4096)

[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO  - Folder /mnt/ramdisk/tachyonworker/4822792725232426273 was created!

[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO  - LocalBlockOutStream created new file block, block path: /mnt/ramdisk/tachyonworker/4822792725232426273/693217787904

[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO  - open(tachyon://10.1.101.13:19998/MiscTX_craig.csv, 4096)

[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO  - Folder /mnt/ramdisk/tachyonworker/8468772189556308610 was created!

[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO  - LocalBlockOutStream created new file block, block path: /mnt/ramdisk/tachyonworker/8468772189556308610/693217787904

[29516878-a93d-9f6d-47f7-9d7e0549c305:frag:0:0] INFO org.apache.drill.exec.work.fragment.FragmentExecutor - 29516878-a93d-9f6d-47f7-9d7e0549c305:0:0: State change requested AWAITING_ALLOCATION --> RUNNING


No exceptions seem to be generated on this file.

File size of this file is 92.3Mb.


I'm not sure if this caused by drill not reading the whole file -- I am doing a 'select * from ctest;' (the first file) so I expect that it should read it all.


The tachyon worker is running as root.


Thanks

Peter

Calvin Jia

belum dibaca,
1 Feb 2016, 01.42.4901/02/16
kepadaTachyon Users
Hi Peter,

Is the parquet file opened from multiple streams or just one? From the logging output it seems like it is being accessed multiple times separately.

Thanks,
Calvin

Peter M

belum dibaca,
1 Feb 2016, 20.42.0501/02/16
kepadaTachyon Users

Here is some summarised general info about how drill deals with parquet files (and it might be applicable to anything that reads parquet files):
* definitely don’t read the files in order. 
* Parquet files can have multiple footers ,  only the last one is read
* if columns aren't read/required, we’ll skip data. 
* the first four bytes of the file are magic bytes. Not sure that these will be read in all cases.

So it seems that the way that parquet files are typically accessed (from drill or potentially any client) means that it is seems unlikely to be cached with the current restriction that tachyon only caches fully read blocks (or files in the case of s3).

I have spent enough time on this now, so we will move to manually loading the parquet files if we need to.

Maybe there is a new feature / cache mode here where the worker will load the full file from the underfs/S3 on any access of the file to get around this issue, rather than the client.

cheers
Peter

Gene Pang

belum dibaca,
1 Feb 2016, 23.12.2901/02/16
kepadaTachyon Users
Hi Peter,

Thanks for this detailed explanation on how parquet files are read. It seems like the nature of parquet files and the full block caching restriction prevent s3 parquet files from being cached in Tachyon. Therefore, a work-around is to manually load the file into Tachyon.

I do agree that this scenario points to some feature requests for Tachyon.

Thanks for the investigation!

-Gene

Gene Pang

belum dibaca,
24 Jun 2016, 10.51.5824/06/16
kepadaAlluxio Users
Hi Peter,

I just wanted to let you know that Alluxio 1.1.0 was recently released. In this release, you no longer have to manually cache data in Alluxio (formerly Tachyon) for parquet files. You can read about some of the improvements here: http://www.alluxio.com/2016/06/whats-new-in-alluxio-1-1-release/

Hope that helps,
Gene

na...@cloudability.com

belum dibaca,
24 Jun 2016, 12.26.4124/06/16
kepadaAlluxio Users
Gene,

That's great. I've been trying to get a configuration working with Parquet files in S3 underFS working. I've run into the same issue reported here using the 1.1.0 release. Is there a setting/configuration that needs to be tweaked to get Parquet files to cache properly?

Thanks,
Nate

Pei Sun

belum dibaca,
25 Jun 2016, 14.07.2625/06/16
kepadana...@cloudability.com, Alluxio Users
Hi Nate,
     Are you able to read non-parquet files with s3 as UFS?  Do you have Alluxio log and spark client log? And can you send your alluxio configurations that you have set?

Thank you
Pei

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun

na...@cloudability.com

belum dibaca,
27 Jun 2016, 14.38.4827/06/16
kepadaAlluxio Users, na...@cloudability.com
I can read non-parquet files without issue. Also if I use the load command to ensure the file is loaded from underFS into memory than I don't have issues. 

Unfortunately the cluster I was using for testing has been decommissioned and I didn't grab the logs from it. The exception I was seeing was a BlockAlreadyExistsException. 

I'll try and reproduce the issue again and upload the relevant logs. 

Pei Sun

belum dibaca,
27 Jun 2016, 15.32.0627/06/16
kepadana...@cloudability.com, Alluxio Users
Hi  Nate,
   Let us know when you have the log and config. I have tried it by myself and it worked. I used 1.1.0.

Pei

Pei Sun

belum dibaca,
8 Jul 2016, 17.29.4508/07/16
kepadana...@cloudability.com, Alluxio Users
Hi Nate,
    Just checking whether you got a chance to try Alluxio on Parquet files again. Let us know if you have the logs.  We can help to address your problem.


Thanks
Pei
--
Pei Sun
Balas ke semua
Balas ke penulis
Teruskan
0 pesan baru