Am 25.04.2023 um 17:56 schrieb Richard:
> [Please do not mail me a copy of your followup]
>
> I'm wondering if there is a C++ library that might fit this use case.
>
> I have large amounts of text I need to scan through in a read-only
> manner. It needs to be carved up into individual chunks for various
> processing purposes, so it can't be easily handled as just a big char
> array.
>
> Handling individual chunks with a string_view is fine, but then there
> is the manual management of the dynamically allocated storage (or
> mmap-style file mapping) that holds the actual text. (Since
> string_view is a non-owning representation.) I could use std::string,
> but I think I'd get better memory behavior by allocating large,
> fixed-size buffers (probably matched to the virtual memory page size)
> on demand and carving them up for the various strings.
>
> I was cooking up my own "rope" class (essentially a vector<string_view>)
> and realized that it would be more ideal if each individual string_view
> could reference count the associated buffer so that buffers would
> naturally return themselves to a storage pool when no longer referenced
> by any string_view.
>
> In my use case I have lots and lots of read-only strings that I obtain
> from either files or a network socket. I have the occasional user
> entered string for which std::string is sufficient.
>
> I'm not aware of any such library, but I thought I'd ping out to see
> if anything rings a bell.
I have something related to this topic. And reading in several lines
one line at a time is quite slow with the implementations of the C++
standard library. So I came up with the idea of writing a function
that I called linify() that has a generic callback that takes the
lines as a pair of start and end iterators. Here is the code:
#pragma once
#include <type_traits>
#include <iterator>
template<std::input_iterator InputIt, typename IsEnd, typename Consumer>
requires std::is_integral_v<std::iter_value_t<InputIt>>
&& requires( IsEnd isEnd, InputIt it ) { { isEnd( it ) } ->
std::convertible_to<bool>; }
&& requires( Consumer consumer, InputIt inputIt ) { { consumer(
inputIt, inputIt ) }; }
void linify( InputIt begin, IsEnd isEnd, Consumer consumer )
{
using namespace std;
if( isEnd( begin ) ) [[unlikely]]
return;
for( InputIt scn = begin; ; )
for( InputIt lineBegin = scn; ; )
if( *scn == '\n' ) [[unlikely]]
{
consumer( lineBegin, scn );
if( isEnd( ++scn ) )
return;
break;
}
else if( *scn == '\r' ) [[unlikely]]
{
consumer( lineBegin, scn );
if( isEnd( ++scn ) || *scn == '\n' && isEnd( ++scn ) )
return;
break;
}
else if( isEnd( ++scn ) ) [[unlikely]]
{
consumer( lineBegin, scn );
return;
}
}
template<std::input_iterator InputIt, typename Consumer>
requires std::is_integral_v<std::iter_value_t<InputIt>>
&& requires( Consumer consumer, InputIt inputIt ) { { consumer(
inputIt, inputIt ) }; }
inline void linify( InputIt begin, InputIt end, Consumer consumer )
{
linify( begin, [&]( InputIt it ) -> bool { return it == end; }, consumer );
}
template<std::input_iterator InputIt, typename Consumer>
requires std::is_integral_v<std::iter_value_t<InputIt>>
&& requires( Consumer consumer, InputIt inputIt ) { { consumer(
inputIt, inputIt ) }; }
inline void linify( InputIt begin, Consumer consumer )
{
linify( begin, [&]( InputIt it ) -> bool { return !*it; }, consumer );
}
Since the lambda of the consumer exists only once globally for the
entire application and has its own unique type, which means that
the instantiation of linify() also only exists once, the linify()
function and the lambda are compiled to where it is called .