Difficulty having capnp read/write from a kj::[Readable]File, misc. discussion

99 views
Skip to first unread message

nyanpasu64

unread,
Aug 15, 2021, 6:58:48 AM8/15/21
to Cap'n Proto
I've built an (unfinished) application around Cap'n Proto and the kj filesystem library (which has the nice property of handling valid and invalid Unicode paths properly on both Windows and Linux, using WTF-8 or raw bytes), but sadly the filesystem library doesn't have online documentation, only doc comments which I struggled to understand the "big picture" from.

With that in mind, I'm surprised that Cap'n Proto interoperates so poorly with kj, in that capnp can't write to a kj::File or read from a kj::ReadableFile without a shim I wrote. And newFileAppender() is a *bad* idea since it calls stat() on every single file write to query the current length, instead of storing the current write position in the object.

Writes: https://gitlab.com/exotracker/exotracker-cpp/-/blob/5109bd9411c9baaca837ae349358da3f5c3742bc/src/serialize.cpp#L421

Reads: https://gitlab.com/exotracker/exotracker-cpp/-/blob/5109bd9411c9baaca837ae349358da3f5c3742bc/src/serialize.cpp#L1330

It's also unfortunate that it's so much work to convert between kj::ArrayPtr and gsl/std::span, kj::StringPtr (null terminated, not slicable) and std::string_view (not always null terminated, slicable), juggling ConstData and Data::Reader and char and unsigned char/uint8_t, etc. I wish it could be done better, maybe with more conversion operators, but it's understandable considering kj was designed to avoid the C++ standard library.

Kenton Varda

unread,
Aug 15, 2021, 12:10:56 PM8/15/21
to nyanpasu64, Cap'n Proto
To be honest I'm not sure if I've ever used kj::File to load Cap'n-Proto-format data. The file API was sort of built for other things I was doing that happened to be built on KJ.

But if I were using it, I think I'd do:
- Reads always using mmap(), not streams. You'll need a reinterpret_cast to get an ArrayPtr<capnp::word>, but no shims needed here.
- Writes probably using Directory::appendFile() to open the file for appending. This gives you an OutputStream so you can directly use capnp::writeMessage(), no shims needed.

> And newFileAppender() is a *bad* idea since it calls stat() on every single file write to query the current length

I don't understand what you mean here. kj::newFileAppender() is explicitly documented as not doing this, with the caveat that it won't work correctly if there are multiple concurrent appenders. On the other hand, if you use Directory::appendFile() to open the file, then it uses the O_APPEND flag which makes it the operating system's job to ensure all bytes are appended -- no need for stat().

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/97117ffc-3567-40eb-acb6-9007c0a2ad73n%40googlegroups.com.

nyanpasu64

unread,
Aug 15, 2021, 8:23:42 PM8/15/21
to Cap'n Proto
To be honest I'm not sure if I've ever used kj::File to load Cap'n-Proto-format data. The file API was sort of built for other things I was doing that happened to be built on KJ.
That's the impression I got too.
But if I were using it, I think I'd do:
- Reads always using mmap(), not streams. You'll need a reinterpret_cast to get an ArrayPtr<capnp::word>, but no shims needed here.
FsNode::mmap()? I'll try it if I ever need to rewrite the file saving code.
- Writes probably using Directory::appendFile() to open the file for appending. This gives you an OutputStream so you can directly use capnp::writeMessage(), no shims needed.
I looked into this, but decided to use Directory::replaceFile() because I wanted atomic saving. However, Directory::replaceFile() doesn't support returning an AppendableFile.
> And newFileAppender() is a *bad* idea since it calls stat() on every single file write to query the current length

I don't understand what you mean here. kj::newFileAppender() is explicitly documented as not doing this, with the caveat that it won't work correctly if there are multiple concurrent appenders. On the other hand, if you use Directory::appendFile() to open the file, then it uses the O_APPEND flag which makes it the operating system's job to ensure all bytes are appended -- no need for stat().
In the context of Cap'n Proto, concurrent writes result in a corrupted file no matter what behavior is used. And capnp's write methods take a kj::[Buffered]OutputStream&, and OutputStream is inherited by AppendableFile but not File, and the easiest way to get an AppendableFile from a File calls stat() on every write (the hard way is to write a subclass of OutputStream... just found out I only need to subclass OutputStream, not AppendableFile). I suppose each piece of the puzzle is understandable, but the end result is unfortunate. Maybe AppendableFile guarantees that interleaved writes within or between processes are interleaved and don't overwrite (resulting in newFileAppender()'s design), but Cap'n Proto instead expects that no interleaved writes exist at all during writing.

If I remember correctly, I decided against writing to fd integers because they seemed less supported on Windows than Linux, and required different APIs per OS as well, and supporting arbitrary paths on Windows requires 16-bit characters while Linux requires 8-bit characters (whereas kj::Filesystem handles this using transparent WTF-8 conversion). I could've instead used open() and _wopen() (or _wsopen_s() to satisfy Microsoft's API deprecations) rather than kj filesystem operations, and on Windows converted WTF-8 to UTF-16 like kj does. (Of course it would be easier to use narrow characters on Windows, but depending on the codepage it might fail to support unpaired UTF-16 surrogates or even Unicode altogether.) I'd still need to build a cross-platform atomic-save library, perhaps with C++'s std::filesystem, but its char/wchar handling is scary (though perhaps std::filesystem::u8path or path(char8_t) would work).

Kenton Varda

unread,
Aug 15, 2021, 8:37:46 PM8/15/21
to nyanpasu64, Cap'n Proto
Ah, I see, yeah if you're using the atomic creation stuff then you don't get to use append I guess.

Hmmm, and now that I look at it, the doc comment for kj::newFileAppender() says that it assumes it is the only writer... but the implementation does indeed stat() on every write. That's weird, the doc comment to me suggests that it wouldn't do that, but would instead just remember the file position from the previous write. Probably the doc comment needs updating.

Anyway, I suppose we really ought to provide a FileOutputStream class that tracks the current offset as it writes...

> FsNode::mmap()? I'll try it if I ever need to rewrite the file saving code.

It's ReadableFile::mmap(), but yes.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages