exception on addInput to ComputeTSet

3 views
Skip to first unread message

Ahmet Uyar

unread,
Aug 4, 2020, 11:47:56 AM8/4/20
to Twister2
Hi guys,

I tested membership finding with newly fixed twister2. 
I am still getting "No more frames available in this partition" from BufferedCollectionPartition when reading added input data to TSet. 

I attached the logs. 

Ahmet
auyar-membership-finding-n3atp4n.log

Ahmet Uyar

unread,
Aug 4, 2020, 11:54:31 AM8/4/20
to Twister2
Hi guys,

Actually, three of the 4 workers have thrown NegativeArraySizeException from BufferedCollectionPartition. 
It seems there is some array size mismatch in that class after partitioning and persisting. 

Ahmet

Chathura Widanage

unread,
Aug 4, 2020, 12:43:27 PM8/4/20
to Ahmet Uyar, Twister2
Hi Ahmet,

I have added some logs. Can you pls rerun with below class replaced?


Regards,
Chathura


On Tue, Aug 4, 2020 at 11:54 AM Ahmet Uyar <ahme...@gmail.com> wrote:
Hi guys,

Actually, three of the 4 workers have thrown NegativeArraySizeException from BufferedCollectionPartition. 
It seems there is some array size mismatch in that class after partitioning and persisting. 

Ahmet


On Tue, Aug 4, 2020 at 6:47 PM Ahmet Uyar <ahme...@gmail.com> wrote:
Hi guys,

I tested membership finding with newly fixed twister2. 
I am still getting "No more frames available in this partition" from BufferedCollectionPartition when reading added input data to TSet. 

I attached the logs. 

Ahmet

--
You received this message because you are subscribed to the Google Groups "Twister2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to twister2+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/twister2/CAPBRfYecZyOUZwXgpsTwRYzAhJqcejAbbPL53gTifkxrF9ZUqg%40mail.gmail.com.

Ahmet Uyar

unread,
Aug 4, 2020, 1:59:13 PM8/4/20
to Chathura Widanage, Twister2
Hi Chathura,

I ran it with your class. But, I printed log messages once in 50000. Otherwise, there were so many log messages. 
I modified the flush method as below. I added a similar counter to the next method.

Ahmet

  public void flush() {
    Path filePath = new Path(this.rootPath, (this.fileCounter++) + EXTENSION);
    try (DataOutputStream outputStream = new DataOutputStream(this.fileSystem.create(filePath))) {
      LOG.info("Writing to file : " + filePath + ", no of buffers : " + this.buffers.size());
      outputStream.writeLong(this.buffers.size());
      Iterator<byte[]> bufferIt = this.buffers.iterator();
      long count = 0;
      while (bufferIt.hasNext()) {
        byte[] next = bufferIt.next();
        outputStream.writeInt(next.length);
        if (count++ % 50000 == 0) {
          LOG.info(count + ": Writing to file : " + filePath + ", buffer size : " + next.length);
        }
        outputStream.write(next);
      }
    } catch (IOException e) {
      throw new Twister2RuntimeException("Couldn't flush partitions to the disk", e);
    }
    this.filesList.add(filePath);
    this.buffers.clear();
    this.bufferedBytes = 0;
  }


auyar-membership-finding-n3fm0d2.log

Chathura Widanage

unread,
Aug 5, 2020, 11:40:17 AM8/5/20
to Ahmet Uyar, Twister2
Ahmet,

Could you please send me some sample input files? 

Regards,
Chathura

Ahmet Uyar

unread,
Aug 5, 2020, 2:07:58 PM8/5/20
to Chathura Widanage, Twister2
Hi Chathura,

Could you get it from the login node of victor. 
There are 4 tweet and 4 delete files under: 
/scratch_hdd/auyar/sample-files/tweet/input-0
/scratch_hdd/auyar/sample-files/tweet/input-1
/scratch_hdd/auyar/sample-files/tweet/input-2
/scratch_hdd/auyar/sample-files/tweet/input-3

/scratch_hdd/auyar/sample-files/delete/input-0
/scratch_hdd/auyar/sample-files/delete/input-1
/scratch_hdd/auyar/sample-files/delete/input-2
/scratch_hdd/auyar/sample-files/delete/input-3

Ahmet

Chathura Widanage

unread,
Aug 5, 2020, 2:09:24 PM8/5/20
to Ahmet Uyar, Twister2
Sure, I will do that. Thanks.

Regards,
Chathura

Chathura Widanage

unread,
Aug 5, 2020, 5:15:55 PM8/5/20
to Ahmet Uyar, Twister2
Hi Ahmet,

Figured out the reason for this issue. It was an edge case where data size becomes an exact multiple of "maxBufferedBytes". Please try below fix.


Regards,
Chathura

Ahmet Uyar

unread,
Aug 6, 2020, 5:17:15 AM8/6/20
to Chathura Widanage, Twister2
Hi Chathura,

I tried with the new fix. Unfortunately, the result is the same. 3 out of 4 workers are still getting NegativeArraySizeException. Fourth one is throwing an exception with the reason "No more frames available in this partition". 

 logs are attached. 

thanks,

Ahmet

auyar-membership-finding-n5rru30.log

Chathura Widanage

unread,
Aug 7, 2020, 2:17:36 PM8/7/20
to Ahmet Uyar, Twister2
Hi Ahmet,

I am getting a different exception when running on HDFS.


But it seems they are related. It Seems when reading from HDFS twister2 is not reading the exact content of the file. When I manually check the file with a simple java program, it could read the data without any problem.


Could you run the program out of HDFS once to make sure this comes only on HDFS?

Regards,
Chathura

Ahmet Uyar

unread,
Aug 8, 2020, 1:27:29 PM8/8/20
to Chathura Widanage, Twister2
Hi Chathura,

I ran with the local storage for persistence and it works. So, the problem seems to be with hdfs. 

thanks,

Ahmet

Ahmet Uyar

unread,
Aug 13, 2020, 8:41:06 AM8/13/20
to Chathura Widanage, Twister2
Hi Chathura,

I installed hdfs-3.2.1 at victor and tested again. Twister2 uses hadoop-3.2.1 version libraries. 
I tested with 4 workers each processing 10M tweetID-date pairs. 
3 of the 4 workers are throwing java.lang.NegativeArraySizeException from BufferedCollectionPartition.java.

I attached the logs. 

Ahmet


auyar-membership-finding-nfz5imi.log

Chathura Widanage

unread,
Aug 13, 2020, 10:24:38 AM8/13/20
to Ahmet Uyar, Supun Kamburugamuve, Twister2
Hi Ahmet,

It seems HadoopDataInputStream has some commented out methods. That could possibly cause this issue since we are working with a large dataset, which exceeds MIN_SKIP_BYTES.


Could you please retry by replacing the content of the seek() method as below.


I think twister2 is trying to do some kind of optimization by not directly calling org.apache.hadoop.fs.FSDataInputStream.seek(). @Supun Kamburugamuve do you know what that optimization is? I can't figure out what we are trying to do there.

Regards,
Chathura

Ahmet Uyar

unread,
Aug 13, 2020, 10:39:52 AM8/13/20
to Chathura Widanage, Supun Kamburugamuve, Twister2
Hi Chathura,

Unfortunately it is the same. Attached the logs. 
Also, this exception occurs when reading checkpointed delete tweetIDs. Those are not many. Each worker processes only 1000 tweetIDs. 

Another thing is that, I printed the indexes in the loop where the negative array size occurs. 
In all three workers, the exception occurs at the index of 117. Not sure whether this is meaningful or not. 

[2020-08-13 10:32:30 -0400] [INFO] [worker-3] [Twister2MPIWorker-3] edu.iu.dsc.tws.dataset.partition.BufferedCollectionPartition: Negative Size: -1049794737, index: 117, noOfFrames: 1000, file: hdfs:/twister2/tsetdata/__partition1_3/0.pbck  
[2020-08-13 10:32:30 -0400] [INFO] [worker-2] [Twister2MPIWorker-2] edu.iu.dsc.tws.dataset.partition.BufferedCollectionPartition: Negative Size: -1574692106, index: 117, noOfFrames: 2000, file: hdfs:/twister2/tsetdata/__partition1_2/0.pbck  
[2020-08-13 10:32:31 -0400] [INFO] [worker-1] [Twister2MPIWorker-1] edu.iu.dsc.tws.dataset.partition.BufferedCollectionPartition: Negative Size: -524897368, index: 117, noOfFrames: 1000, file: hdfs:/twister2/tsetdata/__partition1_1/0.pbck  

Ahmet

auyar-membership-finding-ng3cw0r.log

Chathura Widanage

unread,
Aug 13, 2020, 10:51:04 AM8/13/20
to Ahmet Uyar, Supun Kamburugamuve, Twister2
Could you please add a log inside the seek method to print the seekPosition, just to make sure, your program uses that method. Since hdfs writes are working without any issue, I am still thinking that this class is the culprit. Not sure whether the previous code we replaced was trying to address this issue.

Regards,
Chathura

Ahmet Uyar

unread,
Aug 13, 2020, 11:12:45 AM8/13/20
to Chathura Widanage, Supun Kamburugamuve, Twister2
Hi Chathura,

I modified seek method as follows in HadoopDataInputStream:
public void seek(long seekPosition) throws IOException {
    fosInputStream.seek(seekPosition);
    LOG.info("seekPos: " + seekPosition);
}

seekPos is not printed in the logs. 
Attached the logs.

Ahmet

auyar-membership-finding-ng4o59b.log

Chathura Widanage

unread,
Aug 13, 2020, 11:26:01 AM8/13/20
to Ahmet Uyar, Supun Kamburugamuve, Twister2
Supun pointed out another point where things can go wrong. Could you please add below lines at 221 of BufferedCollectionpartition.

int readSize = reader.read(data);
if (size != readSize) {
LOG.info("Read Lesser than expected. " + size + "," + readSize);
}

Regards,
Chathura

Ahmet Uyar

unread,
Aug 13, 2020, 11:57:45 AM8/13/20
to Chathura Widanage, Supun Kamburugamuve, Twister2
Hi Chathura,

It reads lesser data in the previous iteration and then it gets the negative size in the next iteration. 
I also added the loop index to the log message. 
Relevant log messages are: 
[2020-08-13 11:53:08 -0400] [INFO] [worker-1] [Twister2MPIWorker-1] edu.iu.dsc.tws.dataset.partition.BufferedCollectionPartition: Read Lesser than expected. 31,24, index: 116  
[2020-08-13 11:53:08 -0400] [INFO] [worker-2] [Twister2MPIWorker-2] edu.iu.dsc.tws.dataset.partition.BufferedCollectionPartition: Read Lesser than expected. 31,24, index: 116  
[2020-08-13 11:53:08 -0400] [INFO] [worker-3] [Twister2MPIWorker-3] edu.iu.dsc.tws.dataset.partition.BufferedCollectionPartition: Read Lesser than expected. 31,24, index: 116  
[2020-08-13 11:53:08 -0400] [INFO] [worker-2] [Twister2MPIWorker-2] edu.iu.dsc.tws.dataset.partition.BufferedCollectionPartition: Negative Size: -1574692106, index: 117, noOfFrames: 2000, file: hdfs://172.29.200.200:9009/twister2/tsetdata/__partition1_2/0.pbck  
[2020-08-13 11:53:08 -0400] [INFO] [worker-1] [Twister2MPIWorker-1] edu.iu.dsc.tws.dataset.partition.BufferedCollectionPartition: Negative Size: -524897368, index: 117, noOfFrames: 1000, file: hdfs://172.29.200.200:9009/twister2/tsetdata/__partition1_1/0.pbck  
[2020-08-13 11:53:08 -0400] [INFO] [worker-3] [Twister2MPIWorker-3] edu.iu.dsc.tws.dataset.partition.BufferedCollectionPartition: Negative Size: -1049794737, index: 117, noOfFrames: 1000, file: hdfs://172.29.200.200:9009/twister2/tsetdata/__partition1_3/0.pbck  

Ahmet



auyar-membership-finding-ng69nsq.log

Chathura Widanage

unread,
Aug 13, 2020, 11:58:40 AM8/13/20
to Ahmet Uyar, Supun Kamburugamuve, Twister2
Great!. So this is the issue. I'll send you a fix for that.

Regards,
Chathura

Chathura Widanage

unread,
Aug 13, 2020, 12:25:13 PM8/13/20
to Ahmet Uyar, Supun Kamburugamuve, Twister2
Hi Ahmet,

Pls try below fix.


Regards,
Chathura

Ahmet Uyar

unread,
Aug 13, 2020, 12:30:15 PM8/13/20
to Chathura Widanage, Supun Kamburugamuve, Twister2
Hi Chathura,

Thanks for the fix. Now, it does not get that error message. However, it throws another exception :((

All 4 workers throw below exception. 

Ahmet

[2020-08-13 12:24:23 -0400] [SEVERE] [worker-0] [Twister2MPIWorker-0] edu.iu.dsc.tws.rsched.schedulers.standalone.MPIWorkerStarter: Uncaught exception in thread Thread[Twister2MPIWorker-0,5,main]. Finalizing this worker...
java.lang.IndexOutOfBoundsException: Requested more bytes than destination buffer size: request length=79, with offset =17; buffer capacity =62
at org.apache.hadoop.fs.FSInputStream.validatePositionedReadArgs(FSInputStream.java:107)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:823)
at java.io.DataInputStream.read(DataInputStream.java:149)
at edu.iu.dsc.tws.data.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:61)
at java.io.DataInputStream.read(DataInputStream.java:149)
at edu.iu.dsc.tws.dataset.partition.BufferedCollectionPartition$1.next(BufferedCollectionPartition.java:223)
at edu.iu.dsc.tws.tset.sources.DataPartitionSourceFunc.next(DataPartitionSourceFunc.java:36)
at edu.iu.dsc.tws.tset.ops.KeyedSourceOp.execute(KeyedSourceOp.java:41)
at edu.iu.dsc.tws.executor.core.batch.SourceBatchInstance.execute(SourceBatchInstance.java:196)
at edu.iu.dsc.tws.executor.threading.BatchSharingExecutor2$BatchWorker.runExecution(BatchSharingExecutor2.java:401)
at edu.iu.dsc.tws.executor.threading.BatchSharingExecutor2$BatchWorker.access$000(BatchSharingExecutor2.java:364)
at edu.iu.dsc.tws.executor.threading.BatchSharingExecutor2.runExecution(BatchSharingExecutor2.java:183)
at edu.iu.dsc.tws.executor.threading.BatchSharingExecutor2.execute(BatchSharingExecutor2.java:119)
at edu.iu.dsc.tws.task.impl.TaskExecutor.execute(TaskExecutor.java:195)
at edu.iu.dsc.tws.task.impl.TaskExecutor.execute(TaskExecutor.java:208)
at edu.iu.dsc.tws.tset.env.TSetEnvironment.executeBuildContext(TSetEnvironment.java:356)
at edu.iu.dsc.tws.tset.env.BatchEnvironment.run(BatchEnvironment.java:210)
at edu.iu.dsc.tws.tset.links.batch.BatchTLinkImpl.cache(BatchTLinkImpl.java:97)
at edu.iu.dsc.tws.tset.links.batch.PipeTLink.cache(PipeTLink.java:112)
at iu.iuni.deletion.MembershipFinder4.execute(MembershipFinder4.java:74)
at edu.iu.dsc.tws.rsched.worker.Twister2WorkerStarter.execute(Twister2WorkerStarter.java:63)
at edu.iu.dsc.tws.rsched.worker.MPIWorkerManager.execute(MPIWorkerManager.java:66)
at edu.iu.dsc.tws.rsched.schedulers.standalone.MPIWorkerStarter.startWorker(MPIWorkerStarter.java:310)
at edu.iu.dsc.tws.rsched.schedulers.standalone.MPIWorkerStarter.startWorkerWithJM(MPIWorkerStarter.java:253)
at edu.iu.dsc.tws.rsched.schedulers.standalone.MPIWorkerStarter.<init>(MPIWorkerStarter.java:161)
at edu.iu.dsc.tws.rsched.schedulers.standalone.MPIWorkerStarter.main(MPIWorkerStarter.java:120)

auyar-membership-finding-ng7ducp.log

Chathura Widanage

unread,
Aug 13, 2020, 12:32:53 PM8/13/20
to Ahmet Uyar, Supun Kamburugamuve, Twister2
Line should be

int readSize = reader.read(data, readSoFar, data.length - readSoFar);

Regards,
Chathura

Chathura Widanage

unread,
Aug 13, 2020, 12:34:16 PM8/13/20
to Ahmet Uyar, Supun Kamburugamuve, Twister2
*Line 223 should be...

int readSize = reader.read(data, readSoFar, data.length - readSoFar);
Regards,
Chathura

Ahmet Uyar

unread,
Aug 13, 2020, 12:38:12 PM8/13/20
to Chathura Widanage, Supun Kamburugamuve, Twister2
Now, it seems working. I will test it more and let you know. 

Many thanks,

Ahmet

Chathura Widanage

unread,
Aug 13, 2020, 12:38:44 PM8/13/20
to Ahmet Uyar, Supun Kamburugamuve, Twister2
Great, thanks!
Regards,
Chathura

Reply all
Reply to author
Forward
0 new messages