jobID in hdfs path

0 views
Skip to first unread message

Ahmet Uyar

unread,
Aug 4, 2020, 11:51:33 AM8/4/20
to Twister2
Hi Chathura,

The persisted data is saved in hdfs at the given directory in config files. However, we should add jobID to this directory I think. Otherwise, consecutive or concurrent runs may interfere. 

Currently, persisted data is saved in the following directory at hdfs: 
/twister2/persistent/data/__kgather4_0
/twister2/persistent/data/__kgather4_1
/twister2/persistent/data/__kgather4_2
/twister2/persistent/data/__kgather4_3

Ahmet

Ahmet Uyar

unread,
Aug 4, 2020, 12:07:33 PM8/4/20
to Twister2
Hi Chathura,

CheckpointManager seems to save the checkpoints at hdfs at: 
/twister2/persistent/twister2-checkpoints/<jobID>/

Maybe the tset data can also be persisted/saved at the same location under the data directory. Just a suggestion. 

I also checked nfs persistent directory whether there is anything in there. There is logs directory and that is empty. Previously, it was saving the logs there as I remember. Not sure, whether logs should also go to hdfs. 

thanks,
Ahmet

Chathura Widanage

unread,
Aug 4, 2020, 12:42:07 PM8/4/20
to Ahmet Uyar, Twister2
Hi Ahmet,

I am looking into your previous issue. Will get back to you soon.

Regards,
Chathura


On Tue, Aug 4, 2020 at 12:07 PM Ahmet Uyar <ahme...@gmail.com> wrote:
Hi Chathura,

CheckpointManager seems to save the checkpoints at hdfs at: 
/twister2/persistent/twister2-checkpoints/<jobID>/

Maybe the tset data can also be persisted/saved at the same location under the data directory. Just a suggestion. 

I also checked nfs persistent directory whether there is anything in there. There is logs directory and that is empty. Previously, it was saving the logs there as I remember. Not sure, whether logs should also go to hdfs. 

thanks,
Ahmet


On Tue, Aug 4, 2020 at 6:51 PM Ahmet Uyar <ahme...@gmail.com> wrote:
Hi Chathura,

The persisted data is saved in hdfs at the given directory in config files. However, we should add jobID to this directory I think. Otherwise, consecutive or concurrent runs may interfere. 

Currently, persisted data is saved in the following directory at hdfs: 
/twister2/persistent/data/__kgather4_0
/twister2/persistent/data/__kgather4_1
/twister2/persistent/data/__kgather4_2
/twister2/persistent/data/__kgather4_3

Ahmet

--
You received this message because you are subscribed to the Google Groups "Twister2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to twister2+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/twister2/CAPBRfYcA9hU6o36D%3D%2B3xeq1beP28nnz53Q5SJRvAzecuNyv1NQ%40mail.gmail.com.

Chathura Widanage

unread,
Aug 4, 2020, 12:48:06 PM8/4/20
to Ahmet Uyar, Niranda Perera, Twister2
@Niranda Perera  Would it break anything if we change referencePrefix variable to include the job ID?

Regards,
Chathura

Niranda Perera

unread,
Aug 4, 2020, 6:19:28 PM8/4/20
to Chathura Widanage, Ahmet Uyar, Twister2
No it would not. It's okay to use any string there as long as it's deterministic and the same in the sink and source both (for disk persistent sink/ source) 

Reply all
Reply to author
Forward
0 new messages