--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at http://groups.google.com/group/capnproto.
--
You received this message because you are subscribed to a topic in the Google Groups "Cap'n Proto" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/capnproto/kLQOsxjkjxM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to capnproto+...@googlegroups.com.
I've never done it, but I think something similar should be possible with some combination of splice, vmsplice, and sendfile.
Ok thanks for the info, good to know those options are there!
That would be great but how would you allocate it initially when there are lists?
Kenton—would this fit in the core Cap'n Proto library?
On Mon, Jun 22, 2015 at 9:10 PM, Kamal Marhubi <ka...@marhubi.com> wrote:Kenton—would this fit in the core Cap'n Proto library?Maybe, I'll have to see how you do it. :)
The main problem is segment framing. The current standard format, with an upfront segment table, won't really work for mmap, for a couple reasons:- When you need to grow the table, you'd have to push back all the data by a word, which is obviously unacceptable.
- Each segment will end up with some unused space at the end, since it's unlikely that the last object allocated in the segment will just happen to end at exactly the end of the segment. In the serialized format, we can easily just not serialize these unused bytes, but in an mmap() format that space is already on-disk. Currently you cannot ask Cap'n Proto to return those bytes to you; once allocated, those bytes are permanently part of the segment and Cap'n Proto may choose to go back and use them even after new segments have been allocated.
So you need some alternative framing approach.While you're at it, you may want to aim for segments to be page-aligned.
Maybe you could define a format in which all segment sizes are hard-coded, so you don't need any table. E.g. have the segment sizes be:1: N pages2: N pages3: 2*N pages4: 4*N pages5: 8*N pages...Thus the file size is always a power of two and you can easily determine the count, size, and locations of all segments from the file size.(Note that segments have a hard limit of 4GB, so once you hit that subsequent segments will have to be 4GB each and file sizes will no longer be power-of-two...)
Another possibility is to store each segment as a separate file.
Or yet another, crazy-but-awesome strategy would be to have all segments be 4GB, but initially allocated only with ftruncate(), which adds a "hole" to the end of the file for which pages aren't actually allocated until needed. This way each page will be allocated when you first touch it. Since Cap'n Proto fills the segment sequentially, this actually works really well, at least in theory. The down side is that files with "holes" tend to behave badly with standard tools, e.g. `ls` will show the file as being 4GB even when it only has a few bytes of data, and `cp` will turn the hole into zeros, losing the optimization. There may also be performance problems with the TLB cache being flushed every time a new page is allocated; I'm not sure.
One downside of this: Some filesystems do not support sparse files at all.
You're unlikely to see one very often on *Linux*, but one in particular
might be a big problem elsewhere: HFS+
Kamal Marhubi wrote:
> This is a great point. I don't know how multiplatform all these things
> need to be, and to what degree it matters that the output of an mmap based
> MessageBuilder can be read by the MessageReaders in serialize.h. Kenton?
Oh, it'd be readable. I don't mean it'd error out or anything; I mean that
on HFS+ if you seek forward 4gb or call ftruncate() or whatever, you're
going to be waiting on 4gb of writes.
Kamal Marhubi wrote:
> This is another idea that could work and is pretty attractive. In fact,
> this could be a workaround for the TLB cache problem: map in the whole
> file, but mprotect(PROT_NONE) most of it. In a SIGSEGV handler, unprotect
> the pages and use madvise(MADV_WILLNEED), or alternatively only partially
> map the file and use mmap(MAP_POPULATE) to prefault. The effect is to
> amortize the cost of new pages being allocated using a sort of subsegment
> system implemented through the memory manager. (Let me know if this is too
> crazy and / or cannot work!)
One downside of this: Some filesystems do not support sparse files at all.
You're unlikely to see one very often on *Linux*, but one in particular
might be a big problem elsewhere: HFS+
I think the following strategy has equivalent behavior, but avoids excessive allocations on filesystems without sparse file support:1) mmap() a very large region using a dummy device (maybe /dev/null?), just to reserve address space2) use ftruncate() to set a smallish file size3) mmap() with MAP_FIXED to remap the first part of region (1) so it points to the file4) mprotect() the rest of the region, as above5) SIGSEGV handler uses ftruncate() to extend the file, and *also* uses mmap() with MAP_FIXED to extend the mapping
Paul
On Tue, Jun 23, 2015 at 1:16 AM Kenton Varda <ken...@sandstorm.io> wrote:On Mon, Jun 22, 2015 at 9:10 PM, Kamal Marhubi <ka...@marhubi.com> wrote:Kenton—would this fit in the core Cap'n Proto library?Maybe, I'll have to see how you do it. :)Fair enough! Is a Linux only implementation ok for inclusion, assuming everything else is fine? It should be possible to make it use only POSIX, but I have to start somewhere.
Possible mitigation that preserves compatibility with the recommended byte stream framing: always have the segment table be a multiple of the page size, with a bunch of zero-length segments to pad it out.
An alternative that does not preserve compatibility: move the segment table to the end, delimited by some sort of magic sequence. This requires me to look more carefully at the encoding spec to pick some magic.
- Each segment will end up with some unused space at the end, since it's unlikely that the last object allocated in the segment will just happen to end at exactly the end of the segment. In the serialized format, we can easily just not serialize these unused bytes, but in an mmap() format that space is already on-disk. Currently you cannot ask Cap'n Proto to return those bytes to you; once allocated, those bytes are permanently part of the segment and Cap'n Proto may choose to go back and use them even after new segments have been allocated.fallocate with FALLOC_FL_COLLAPSE_RANGE could be a way to limit the waste to FS block padding in each segment. I'll have to think about it a bit more that shou. This would be Linux only and 3.15+ at that.
This is another idea that could work and is pretty attractive. In fact, this could be a workaround for the TLB cache problem: map in the whole file, but mprotect(PROT_NONE) most of it. In a SIGSEGV handler, unprotect the pages and use madvise(MADV_WILLNEED), or alternatively only partially map the file and use mmap(MAP_POPULATE) to prefault. The effect is to amortize the cost of new pages being allocated using a sort of subsegment system implemented through the memory manager. (Let me know if this is too crazy and / or cannot work!)
Writing this has taken—on and off—most of the day, so I'll end it here. One last question: is there anything like an RFC process for Cap'n Proto related stuff?
This is a great point. I don't know how multiplatform all these things need to be, and to what degree it matters that the output of an mmap based MessageBuilder can be read by the MessageReaders in serialize.h. Kenton?
On Wed, Jun 24, 2015 at 2:01 AM Paul Pelzl <pel...@gmail.com> wrote:I think the following strategy has equivalent behavior, but avoids excessive allocations on filesystems without sparse file support:1) mmap() a very large region using a dummy device (maybe /dev/null?), just to reserve address space2) use ftruncate() to set a smallish file size3) mmap() with MAP_FIXED to remap the first part of region (1) so it points to the file4) mprotect() the rest of the region, as above5) SIGSEGV handler uses ftruncate() to extend the file, and *also* uses mmap() with MAP_FIXED to extend the mappingI like it! I came across MAP_FIXED last night, and wheels were whirring towards something like this. At least on Linux, it seems mapping /dev/null is a no go. I'll see if I can borrow someone's mac to take a look on there.