[PATCH] ocfs2: fix orphan inode disk leak in ocfs2_dio_end_io() on I/O error

1 view
Skip to first unread message

Marco Elver

unread,
Jun 11, 2026, 11:03:55 AMJun 11
to el...@google.com, Mark Fasheh, Joel Becker, Joseph Qi, ocfs2...@lists.linux.dev, linux-...@vger.kernel.org, kasa...@googlegroups.com
When an extending direct I/O write or a direct I/O write racing with an
unlink is initiated, ocfs2_direct_IO() places the user inode into the
system orphan directory and sets the OCFS2_DIO_ORPHANED_FL flag to
ensure defined behavior and crash consistency.

However, if the direct I/O request encounters an error or gets
asynchronous cancellation (bytes <= 0), the VFS completion hook
ocfs2_dio_end_io() bypasses ocfs2_dio_end_io_write() entirely and
executes ocfs2_dio_free_write_ctx(). This completely omits the teardown
of the orphan entry, leaking the user inode in the orphan directory and
leaving the OCFS2_DIO_ORPHANED_FL disk flag set.

Because the OCFS2_DIO_ORPHANED_FL flag remains active, subsequent VFS
final inode eviction (ocfs2_delete_inode) observes the flag, assumes a
direct I/O write is actively in progress, and refuses to wipe the inode.
This results in an irrecoverable disk storage and resource leak that can
only be reclaimed if the cluster unmounts or crashes.

Fix this by ensuring that ocfs2_dio_end_io() inspects dw_orphaned even
when an I/O error occurs, and executes ocfs2_del_inode_from_orphan() to
liberate the inode before destroying the in-memory write context.

Fixes: 5040f8df56fb ("ocfs2: free up write context when direct IO failed")
Assisted-by: Antigravity:Gemini
Signed-off-by: Marco Elver <el...@google.com>
---
fs/ocfs2/aops.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 4acdbb70882c..ad3f2057e26e 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2419,11 +2419,24 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
mlog_ratelimited(ML_ERROR, "Direct IO failed, bytes = %lld",
(long long)bytes);
if (private) {
- if (bytes > 0)
+ if (bytes > 0) {
ret = ocfs2_dio_end_io_write(inode, private, offset,
bytes);
- else
+ } else {
+ struct ocfs2_dio_write_ctxt *dwc = private;
+
+ if (dwc->dw_orphaned) {
+ struct buffer_head *di_bh = NULL;
+
+ if (ocfs2_inode_lock(inode, &di_bh, 1) == 0) {
+ ocfs2_del_inode_from_orphan(OCFS2_SB(inode->i_sb),
+ inode, di_bh, 0, 0);
+ ocfs2_inode_unlock(inode, 1);
+ brelse(di_bh);
+ }
+ }
ocfs2_dio_free_write_ctx(inode, private);
+ }
}

ocfs2_iocb_clear_rw_locked(iocb);
--
2.54.0.1099.g489fc7bff1-goog

Heming Zhao

unread,
Jun 11, 2026, 9:28:01 PMJun 11
to Marco Elver, Mark Fasheh, Joel Becker, Joseph Qi, ocfs2...@lists.linux.dev, linux-...@vger.kernel.org, kasa...@googlegroups.com
Calling only ocfs2_del_inode_from_orphan() without ocfs2_truncate_file() will
leave stale blocks beyond the EOF.

I think the existing OCFS2 code already handles error/crash cases for orphaned
inodes, and this "leaking" behavior is by design.
please refer to ocfs2_recover_orphans() and ocfs2_add_inode_to_orphan().

Thanks,
Heming

Marco Elver

unread,
Jun 12, 2026, 8:58:50 AMJun 12
to Heming Zhao, Mark Fasheh, Joel Becker, Joseph Qi, ocfs2...@lists.linux.dev, linux-...@vger.kernel.org, kasa...@googlegroups.com
Right.

> I think the existing OCFS2 code already handles error/crash cases for orphaned
> inodes, and this "leaking" behavior is by design.
> please refer to ocfs2_recover_orphans() and ocfs2_add_inode_to_orphan().

Periodic scans skip direct I/O entries to avoid racing with active
direct I/O on live nodes:

In fs/ocfs2/journal.c:ocfs2_orphan_filldir():

/* do not include dio entry in case of orphan scan */
if ((p->orphan_reco_type == ORPHAN_NO_NEED_TRUNCATE) &&
(!strncmp(name, OCFS2_DIO_ORPHAN_PREFIX,
OCFS2_DIO_ORPHAN_PREFIX_LEN)))
return true;

Is something else recovering them?

Heming Zhao

unread,
Jun 15, 2026, 11:09:38 AMJun 15
to Marco Elver, Mark Fasheh, Joel Becker, Joseph Qi, ocfs2...@lists.linux.dev, linux-...@vger.kernel.org, kasa...@googlegroups.com
Searching for ORPHAN_NEED_TRUNCATE, i.e.:
- ocfs2_recover_node() calls ocfs2_queue_recovery_completion() with the
ORPHAN_NEED_TRUNCATE flag.
- ocfs2_complete_mount_recovery() calls subroutines with the
ORPHAN_NEED_TRUNCATE flag.

ocfs2 is a cluster fs. When one node crashes, another node handles the recovery.

- Heming
Reply all
Reply to author
Forward
0 new messages