Testing the Tiered Storage In Tachyon

38 views
Skip to first unread message

Aakanksha

unread,
Sep 9, 2015, 10:31:02 PM9/9/15
to Tachyon Users
Hi,

I want to see how the tiered storage in Tachyon works. In order to check if the SSD and HDD levels were being accessed, I reduced the level0 (MEM) dir quota to a minimum and ran a simple BasicCheckpoint test on it. I always get an "OutofSpace Exception" like this:

2015-09-09 17:03:50,199 ERROR  (FileOutStream.java:write) - OutOfSpaceException(message:Failed to allocate 8388608 for user 14)
java.io.IOException: OutOfSpaceException(message:Failed to allocate 8388608 for user 14)
at tachyon.worker.WorkerClient.requestBlockLocation(WorkerClient.java:401)
at tachyon.client.TachyonFS.getLocalBlockTemporaryPath(TachyonFS.java:644)
at tachyon.client.LocalBlockOutStream.<init>(LocalBlockOutStream.java:103)
at tachyon.client.BlockOutStream.get(BlockOutStream.java:67)
at tachyon.client.BlockOutStream.get(BlockOutStream.java:45)
at tachyon.client.FileOutStream.getNextBlock(FileOutStream.java:160)
at tachyon.client.FileOutStream.write(FileOutStream.java:185)
at tachyon.client.FileOutStream.write(FileOutStream.java:166)
at tachyon.examples.BasicCheckpoint.writeFile(BasicCheckpoint.java:112)
at tachyon.examples.BasicCheckpoint.call(BasicCheckpoint.java:60)
at tachyon.examples.BasicCheckpoint.call(BasicCheckpoint.java:43)
at tachyon.examples.Utils.runExample(Utils.java:102)
at tachyon.examples.BasicCheckpoint.main(BasicCheckpoint.java:125)
Caused by: OutOfSpaceException(message:Failed to allocate 8388608 for user 14)
at tachyon.thrift.WorkerService$requestBlockLocation_result$requestBlockLocation_resultStandardScheme.read(WorkerService.java:9342)
at tachyon.thrift.WorkerService$requestBlockLocation_result$requestBlockLocation_resultStandardScheme.read(WorkerService.java:9320)
at tachyon.thrift.WorkerService$requestBlockLocation_result.read(WorkerService.java:9254)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at tachyon.thrift.WorkerService$Client.recv_requestBlockLocation(WorkerService.java:402)
at tachyon.thrift.WorkerService$Client.requestBlockLocation(WorkerService.java:387)
at tachyon.worker.WorkerClient.requestBlockLocation(WorkerClient.java:399)
... 12 more
2015-09-09 17:03:50,200 ERROR  (Utils.java:runExample) - Exception running test: tachyon.examples.BasicCheckpoint@3a0a8d45
java.io.IOException: Fail to cache: ASYNC_THROUGH, message: OutOfSpaceException(message:Failed to allocate 8388608 for user 14)
at tachyon.client.FileOutStream.write(FileOutStream.java:202)
at tachyon.client.FileOutStream.write(FileOutStream.java:166)
at tachyon.examples.BasicCheckpoint.writeFile(BasicCheckpoint.java:112)
at tachyon.examples.BasicCheckpoint.call(BasicCheckpoint.java:60)
at tachyon.examples.BasicCheckpoint.call(BasicCheckpoint.java:43)
at tachyon.examples.Utils.runExample(Utils.java:102)
at tachyon.examples.BasicCheckpoint.main(BasicCheckpoint.java:125)
Caused by: java.io.IOException: OutOfSpaceException(message:Failed to allocate 8388608 for user 14)
at tachyon.worker.WorkerClient.requestBlockLocation(WorkerClient.java:401)
at tachyon.client.TachyonFS.getLocalBlockTemporaryPath(TachyonFS.java:644)
at tachyon.client.LocalBlockOutStream.<init>(LocalBlockOutStream.java:103)
at tachyon.client.BlockOutStream.get(BlockOutStream.java:67)
at tachyon.client.BlockOutStream.get(BlockOutStream.java:45)
at tachyon.client.FileOutStream.getNextBlock(FileOutStream.java:160)
at tachyon.client.FileOutStream.write(FileOutStream.java:185)
... 6 more
Caused by: OutOfSpaceException(message:Failed to allocate 8388608 for user 14)
at tachyon.thrift.WorkerService$requestBlockLocation_result$requestBlockLocation_resultStandardScheme.read(WorkerService.java:9342)
at tachyon.thrift.WorkerService$requestBlockLocation_result$requestBlockLocation_resultStandardScheme.read(WorkerService.java:9320)
at tachyon.thrift.WorkerService$requestBlockLocation_result.read(WorkerService.java:9254)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at tachyon.thrift.WorkerService$Client.recv_requestBlockLocation(WorkerService.java:402)
at tachyon.thrift.WorkerService$Client.requestBlockLocation(WorkerService.java:387)
at tachyon.worker.WorkerClient.requestBlockLocation(WorkerClient.java:399)
... 12 more
Failed the test!

I tried making it work with Mapreduce as well but I get this error in Mapreduce:

15/09/01 20:10:35 WARN mapred.LocalJobRunner: job_local1659117566_0001
java.lang.Exception: java.util.ServiceConfigurationError: tachyon.underfs.UnderFileSystemFactory: Provider tachyon.underfs.hdfs.HdfsUnderFileSystemFactory not a subtype
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.util.ServiceConfigurationError: tachyon.underfs.UnderFileSystemFactory: Provider tachyon.underfs.hdfs.HdfsUnderFileSystemFactory not a subtype
at java.util.ServiceLoader.fail(ServiceLoader.java:231)
at java.util.ServiceLoader.access$300(ServiceLoader.java:181)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:369)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at tachyon.underfs.UnderFileSystemRegistry.init(UnderFileSystemRegistry.java:190)
at tachyon.underfs.UnderFileSystemRegistry.<clinit>(UnderFileSystemRegistry.java:83)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:99)
at tachyon.client.TachyonFS.createAndGetUserUfsTempFolder(TachyonFS.java:300)
at tachyon.client.FileOutStream.<init>(FileOutStream.java:70)
at tachyon.client.TachyonFile.getOutStream(TachyonFile.java:241)
at tachyon.hadoop.AbstractTFS.create(AbstractTFS.java:138)
at tachyon.hadoop.TFS.create(TFS.java:26)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
at org.apache.hadoop.examples.terasort.TeraOutputFormat.getRecordWriter(TeraOutputFormat.java:124)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/09/01 20:10:36 INFO mapreduce.Job: Job job_local1659117566_0001 running in uber mode : false

I can see in the web interface that the different tiers have been configured correctly but I am not able to exercise the tier 1 and tier 2 storage. Could you guys tell me what is wrong with my setup and also a good way to get started on the tiered storage?

Thanks a lot for your help!

Aakanksha

Gene Pang

unread,
Sep 11, 2015, 12:23:43 PM9/11/15
to Tachyon Users
Hi Aakanksha,

How big is the file you are writing, and how big is your MEM size?

Thanks,
Gene

Aakanksha

unread,
Sep 11, 2015, 8:39:10 PM9/11/15
to Tachyon Users
Hi Gene,

Thanks for your reply. The file size is 10GB. My memory size is as follows:

Level0: 1GB
Level1: 5GB
Level2: 10GB

I also get the same out of space exception when I try to load a file from HDFS into tachyon using Spark commands:

./spark-shell
$ val s = sc.textFile("tachyon://stanbyHost:19998/X")
$ s.count()
$ s.saveAsTextFile("tachyon://activeHost:19998/Y")

Thanks,
Aakanksha

Gene Pang

unread,
Sep 14, 2015, 11:01:24 AM9/14/15
to Tachyon Users
Hi Aakanksha,

I think what is happening is that since the single file is larger than the top tier, it cannot complete writing the file. This ticket might be related:


If this ticket is relevant, maybe you could add more information about your environment and setup?

Thanks,
Gene

Aakanksha

unread,
Sep 16, 2015, 8:01:00 PM9/16/15
to Tachyon Users
Thanks Gene. I think it is relevant. I checked with smaller file sizes i.e. file size < 1GB and it works fine. When I tried with a file size > 1GB it always fails with the exception I mentioned earlier. I am running tachyon 0.7.1 on a single node cluster. I am also running Hadoop 2.7.1 and Spark 1.5.0 on the same machine. Please let me know if you need further information about my setup. 

Thanks,
Aakanksha
Reply all
Reply to author
Forward
0 new messages