[JIRA] (JENKINS-60167) AtomicFileWriter performance issue on CephFS in case of Empty File creation

18 views
Skip to first unread message

bulanovk@gmail.com (JIRA)

unread,
Nov 14, 2019, 3:51:02 AM11/14/19
to jenkinsc...@googlegroups.com
Konstantin Bulanov created an issue
 
Jenkins / Bug JENKINS-60167
AtomicFileWriter performance issue on CephFS in case of Empty File creation
Issue Type: Bug Bug
Assignee: Unassigned
Components: core
Created: 2019-11-14 08:50
Environment: Jenkins 2.168-2.204
Priority: Critical Critical
Reporter: Konstantin Bulanov

Hello, during migration for NFS to CephFS file storage we faced with performance degradation of Server startup due to RunIdMigrator.

 

After trace analysis we figure out following thing:

AtomicFileWriter create FileChannelWriterwith with only one OpenOption - StandardOpenOption.WRITE.

For a newly created File, in case when AtomicFileWriter used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes

As a result this operation took up to 5 sec on CephFS.

Ceph logs for not Empty file:
[Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671
[Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty
[Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671
[Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw
[Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing
[Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean
[Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0

Ceph logs for Empty file:
[Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f
[Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f
[Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are -
[Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0

 

As a fix we add StandardOpenOption.CREATE OpenOption.

Server startup with 2k job required to be migrated before fix startup took ~30min

after startup 2 min

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo

bulanovk@gmail.com (JIRA)

unread,
Nov 14, 2019, 3:55:03 AM11/14/19
to jenkinsc...@googlegroups.com
Konstantin Bulanov updated an issue
Change By: Konstantin Bulanov
Hello, during migration for NFS to CephFS file storage we faced with performance degradation of Server startup due to  * RunIdMigrator * .


 

After trace analysis we figure out following thing:

* AtomicFileWriter * create * FileChannelWriterwith * with only one OpenOption - * StandardOpenOption.WRITE * .

For a newly created File, in case when 
* AtomicFileWriter * used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes


As a result this operation took up to 5 sec on CephFS.

As a fix we add *StandardOpenOption.CREATE* OpenOption.

 

Ceph logs for not Empty file Before Fix :
{code:java}
[Wed Nov 13 16: 05 17 : 43 26 2019] ceph: alloc_inode 000000001442a671 000000000aeb2b5f
[Wed Nov 13 16:
05 17 : 43 26 2019] ceph: inode 000000001442a671 now !dirty fsync 000000000aeb2b5f
[Wed Nov 13 16:
05 17 : 43 26 2019] ceph: fsync 000000001442a671 dirty caps are -
[Wed Nov 13 16:
05 17 : 43 30 2019] ceph: fsync dirty caps are Fw 000000000aeb2b5f result=0{code}
 

Ceph logs After Fix:
{code:java}
[Wed Nov 13 16:05:43 2019] ceph: inode alloc_inode 000000001442a671 now !flushing
[Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now
clean !dirty
[Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671
result=0

Ceph logs for Empty file:
[Wed Nov 13 16: 17 05 : 26 43 2019] ceph: alloc_inode 000000000aeb2b5f fsync dirty caps are Fw
[Wed Nov 13 16:
17 05 : 26 43 2019] ceph: fsync 000000000aeb2b5f inode 000000001442a671 now !flushing
[Wed Nov 13 16:
17 05 : 26 43 2019] ceph: fsync dirty caps are - inode 000000001442a671 now clean
[Wed Nov 13 16:
17 05 : 30 43 2019] ceph: fsync 000000000aeb2b5f 000000001442a671 result=0 {code}

 

As a fix we add StandardOpenOption.CREATE OpenOption.

Server startup with 2k job required to be migrated :
*
before fix startup took ~30min

* after startup 2 min

bulanovk@gmail.com (JIRA)

unread,
Nov 14, 2019, 4:14:03 AM11/14/19
to jenkinsc...@googlegroups.com
Konstantin Bulanov updated an issue
Hello, during migration for NFS to CephFS file storage we faced with performance degradation of Server startup due to *RunIdMigrator*.

 

After trace analysis we figure out following thing:

*AtomicFileWriter* create *FileChannelWriterwith* with only one OpenOption - *StandardOpenOption.WRITE*.

For a newly created File, in case when *AtomicFileWriter* used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes


As a result this operation took up to 5 sec on CephFS.

As a fix we add *StandardOpenOption.CREATE* OpenOption. MR - [https://github.com/jenkinsci/jenkins/pull/4357]

 

Ceph logs Before Fix:
{code:java}
[Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f
[Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f
[Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are -
[Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0{code}

 

Ceph logs After Fix:
{code:java}
[Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671
[Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty

[Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671
[Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw
[Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing

[Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean
[Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0{code}

Server startup with 2k job required to be migrated:
* before fix startup took ~30min
* after startup 2 min

adrien.lecharpentier@gmail.com (JIRA)

unread,
Nov 14, 2019, 4:45:02 AM11/14/19
to jenkinsc...@googlegroups.com

adrien.lecharpentier@gmail.com (JIRA)

unread,
Nov 14, 2019, 4:45:03 AM11/14/19
to jenkinsc...@googlegroups.com

adrien.lecharpentier@gmail.com (JIRA)

unread,
Nov 14, 2019, 4:45:03 AM11/14/19
to jenkinsc...@googlegroups.com
Adrien Lecharpentier started work on Bug JENKINS-60167
 
Change By: Adrien Lecharpentier
Status: Open In Progress

bulanovk@gmail.com (JIRA)

unread,
Nov 14, 2019, 5:03:03 AM11/14/19
to jenkinsc...@googlegroups.com
Konstantin Bulanov updated an issue
Change By: Konstantin Bulanov
Hello, during migration for from NFS to CephFS file storage we faced with performance degradation of Server startup due to *RunIdMigrator*.


 

After trace analysis we figure out following thing:

*AtomicFileWriter* create *FileChannelWriterwith* with only one OpenOption - *StandardOpenOption.WRITE*.

For a newly created File, in case when *AtomicFileWriter* used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes

As a result this operation took up to 5 sec on CephFS.

As a fix we add *StandardOpenOption.CREATE* OpenOption. MR - [https://github.com/jenkinsci/jenkins/pull/4357]

 

Ceph logs Before Fix:
{code:java}
[Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f
[Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f
[Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are -
[Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0{code}
 

Ceph logs After Fix:
{code:java}
[Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671
[Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty
[Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671
[Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw
[Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing
[Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean
[Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0{code}
Server startup with 2k job required to be migrated:
* before fix startup took ~30min
* after startup 2 min

o.v.nenashev@gmail.com (JIRA)

unread,
Nov 25, 2019, 4:26:02 PM11/25/19
to jenkinsc...@googlegroups.com
Oleg Nenashev updated Bug JENKINS-60167
 

Released in Jenkins 2.206. It might be a good LTS candideate, but I am not sure about 2.204.1. Maybe it needs more soak testing though the change itself is definitely valid

Change By: Oleg Nenashev
Status: In Review Resolved
Resolution: Fixed
Released As: Jenkins 2.206

o.v.nenashev@gmail.com (JIRA)

unread,
Nov 25, 2019, 4:26:03 PM11/25/19
to jenkinsc...@googlegroups.com

ogondza@gmail.com (JIRA)

unread,
Nov 26, 2019, 4:13:02 AM11/26/19
to jenkinsc...@googlegroups.com

ogondza@gmail.com (JIRA)

unread,
Jan 13, 2020, 1:35:03 PM1/13/20
to jenkinsc...@googlegroups.com
Oliver Gondža updated an issue
Change By: Oliver Gondža
Labels: 2.204.1-rejected lts 2.204.2 - candidate fixed
Reply all
Reply to author
Forward
0 new messages