Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

My two string classes could be "templatized" into one, but for one problem...

78 views
Skip to first unread message

DSF

unread,
Dec 9, 2013, 12:52:07 AM12/9/13
to
Hello group!

Under Windows, I have two string classes: FStringA and FStringW, as
per Windows' naming convention. I have done a lot of work recently on
the Wide version and decided to update the ANSI version. As I sat
there with two windows synchronized, looking for updates from W to
apply to A, it occurred to me that these lengthy classes are identical
save for two differences, and would be ideal as a template. That way
only one code set to update!

The first difference is simple: FStringA uses char whilst FStringW
uses wchar_t. If that were the only difference, it would be a piece
of cake. But it's never that simple, is it?

The second difference is the plethora of C sting calls these classes
use. I've rewritten the parts of the string RTL that I use, both for
a speed increase and to handle wide strings. I preface a 'd' to the
front since 'str' is reserved for the RTL. This leaves me with many
'dstringsomethingA' and 'dstringsomethingW' calls.

But how do I get the compiler to determine which call to make when
building the template into code? I could put preprocessor if/else
around all the d...A/Ws. That is actually built-in to them (based on
the definition of UNICODE), but the implementation seems sloppy. I
would have to preface each creation with preprocessor code such as:

// This is off the top of my head and I'm not even sure it would work,
// but it's clumsy anyway.
// FStr being the template class
typedef FStr<char> FStringA;
typedef FStr<wchar_t> FStringW;

#ifdef UNICODE
#define UCFLAG
#undef UNICODE
#endif
FStringA title;
#ifdef UCFLAG
#undef UCFLAG
#define UNICODE
#endif

#ifndef UNICODE
#define UCFLAG
#define UNICODE
#endif
FStringW name;
#ifdef UCFLAG
#undef UCFLAG
#undef UNICODE
#endif

Ugh!

So it's the old question of determining the type at compile time,
perhaps simplified because there are only two choices. Also, it would
be nice to throw a compilation error if the type is not one of the
two.

Any ideas?

TIA!
"'Later' is the beginning of what's not to be."
D.S. Fiscus

Alf P. Steinbach

unread,
Dec 9, 2013, 1:28:39 AM12/9/13
to
On 09.12.2013 06:52, DSF wrote:
> Hello group!
>
> Under Windows, I have two string classes: FStringA and FStringW, as
> per Windows' naming convention. I have done a lot of work recently on
> the Wide version and decided to update the ANSI version. As I sat
> there with two windows synchronized, looking for updates from W to
> apply to A, it occurred to me that these lengthy classes are identical
> save for two differences, and would be ideal as a template. That way
> only one code set to update!
>
> The first difference is simple: FStringA uses char whilst FStringW
> uses wchar_t. If that were the only difference, it would be a piece
> of cake. But it's never that simple, is it?
>
> The second difference is the plethora of C sting calls these classes
> use. I've rewritten the parts of the string RTL that I use, both for
> a speed increase and to handle wide strings. I preface a 'd' to the
> front since 'str' is reserved for the RTL. This leaves me with many
> 'dstringsomethingA' and 'dstringsomethingW' calls.
>
> But how do I get the compiler to determine which call to make when
> building the template into code? I could put preprocessor if/else
> around all the d...A/Ws. That is actually built-in to them (based on
> the definition of UNICODE), but the implementation seems sloppy. I
> would have to preface each creation with preprocessor code such as:
>

The implementation strategy used by std::basic_string is to make the C
string function calls indirectly via an instance of std::char_traits.

You can do the same. :-)

Documentation at <url: http://en.cppreference.com/w/cpp/string/char_traits>


Cheers & hth.

- Alf

Robert Wessel

unread,
Dec 9, 2013, 1:52:10 AM12/9/13
to
On Mon, 09 Dec 2013 00:52:07 -0500, DSF <nota...@address.here>
wrote:
Normal polymorphism??? Code both functions as dstringdosomething()
and give one char parameters, and the other wchar_t parameters.

Paavo Helde

unread,
Dec 9, 2013, 2:14:04 AM12/9/13
to
DSF <nota...@address.here> wrote in
news:t2kaa99judr3b7o96...@4ax.com:

> Hello group!
>
> Under Windows, I have two string classes: FStringA and FStringW, as
> per Windows' naming convention. I have done a lot of work recently on
> the Wide version and decided to update the ANSI version.
[...]
> #ifdef UNICODE
> #define UCFLAG
> #undef UNICODE
> #endif
> FStringA title;
> #ifdef UCFLAG
> #undef UCFLAG
> #define UNICODE
> #endif

Sorry, it's probably my fault, but I did not understand your problem or
example. Why is it necessary to undef UNICODE when declaring an instance
of your class? Why cannot you just use some more typedefs?

Actually I have another unrelated question: what use has the ANSI (in the
sense Windows is abusing this term) version nowadays anyway? I could
understand an ASCII version of strings (for speed) or UTF-8 (for
portability), but ANSI? What possible benefit does it have to support a
random and unknown small subset of non-ASCII characters on the user
computer? These ANSI interfaces in Windows are just an almost 20-year
legacy, all the Windows internals work in Unicode anyway, so why don't
you just have a couple of conversion functions for your class from-to
ANSI to interact with any old legacy client code expecting ANSI strings?

Regards
Paavo

Tobias Müller

unread,
Dec 9, 2013, 2:54:32 AM12/9/13
to
Basically there are two possibilities,

Overloading (this is C++, not C. There is no need for W/A suffixes!):
inline size_t StringLength(const wchar_t* string) { return wcslen(string);
}
inline size_t StringLength(const char* string) { return strlen(string); }

Template specialization (Trait):
template <typename T>
class StringTrait
{
static size_t Length(const wchar_t* string);
//...
};

template <>
inline size_t StringTrait<wchar_t>::Length(const wchar_t* string) { return
wcslen(string); }
template <>
inline size_t StringTrait<char>::Length(const char* string) { return
strlen(string); }

The latter is a bit more complex but also more flexible.

Tobi

DSF

unread,
Dec 9, 2013, 9:49:55 PM12/9/13
to
On Mon, 09 Dec 2013 01:14:04 -0600, Paavo Helde
<myfir...@osa.pri.ee> wrote:

>DSF <nota...@address.here> wrote in
>news:t2kaa99judr3b7o96...@4ax.com:
>
>> Hello group!
>>
>> Under Windows, I have two string classes: FStringA and FStringW, as
>> per Windows' naming convention. I have done a lot of work recently on
>> the Wide version and decided to update the ANSI version.
>[...]
>> #ifdef UNICODE
>> #define UCFLAG
>> #undef UNICODE
>> #endif
>> FStringA title;
>> #ifdef UCFLAG
>> #undef UCFLAG
>> #define UNICODE
>> #endif
>
>Sorry, it's probably my fault, but I did not understand your problem or
>example. Why is it necessary to undef UNICODE when declaring an instance
>of your class? Why cannot you just use some more typedefs?

All of my string functions are set up a-la M$:

size_t dstrlenA(const char * str);
size_t dstrlenW(const wchar_t *str);
#ifdef UNICODE
#define dstrlen dstrlenW
#else
#define dstrlen dstrlenA
#endif.

So saving the UNICODE status then turning it off would cause all of
the string functions to use the "A" version, regardless of the state
of the UNICODE status, which is then restored to its original state.
This depends on how preprocessor and template construction interact.
I believe the preprocessor is first, producing proper calling of the
string functions.


>Actually I have another unrelated question: what use has the ANSI (in the
>sense Windows is abusing this term) version nowadays anyway? I could
>understand an ASCII version of strings (for speed) or UTF-8 (for
>portability), but ANSI? What possible benefit does it have to support a
>random and unknown small subset of non-ASCII characters on the user
>computer? These ANSI interfaces in Windows are just an almost 20-year
>legacy, all the Windows internals work in Unicode anyway, so why don't
>you just have a couple of conversion functions for your class from-to
>ANSI to interact with any old legacy client code expecting ANSI strings?

You answered your question within itself: "(in the sense Windows is
abusing this term)". Their documentation defines A as ANSI and W as
wide or UTF. As far as I'm concerned, A means 8-bit characters and W
means 16-bit Unicode characters, as that's what M$ has decided are
Windows' two character types. They could have started saying "A is
for ASCII," but from their point of view, why bother to change every
reference in the tons of documentation?

To answer your last question, when you consider ANSI is a left-over
term that really means any 8-bit text, its use becomes clearer. Why
have every string take up twice the space as may be needed?

>Regards
>Paavo

Robert Wessel

unread,
Dec 9, 2013, 10:51:33 PM12/9/13
to
On Mon, 09 Dec 2013 21:49:55 -0500, DSF <nota...@address.here>
wrote:
That's a technique with some justification if you're exporting those
functions to a C program. But you're writing a C++ program, just use
overloading. Unless you're actually exporting dstrlen(x) to other
users.

And why does this not work anyway? Assuming your code is actually
using dstrlen (and not dstrlenA or ...W), it should compile as
appropriate based on the whether or not UNICODE is defined.

And yes, preprocessing happens first.

Paavo Helde

unread,
Dec 10, 2013, 2:21:50 AM12/10/13
to
DSF <nota...@address.here> wrote in
news:ielca9pnklp757v96...@4ax.com:

> All of my string functions are set up a-la M$:
>
> size_t dstrlenA(const char * str);
> size_t dstrlenW(const wchar_t *str);
> #ifdef UNICODE
> #define dstrlen dstrlenW
> #else
> #define dstrlen dstrlenA
> #endif.

That's where all your problems begin. There is no need to use such C
hacks in C++ code. Basically you want to have the symbol dstrlen mean
different things in different places of the program. As others have told
you, in C++ you don't need preprocessor for that as C++ function
overloading and templates already cover the same in a much safer manner.
Mixing the preprocessor into the soup just creates a lot of confusion and
unreadable/unmaintable code.

For example:

size_t dstrlenA(const char * str);
size_t dstrlenW(const wchar_t *str);

inline size_t dstrlen(const char* str) {return dstrlenA(str);}
inline size_t dstrlen(const wchar_t* str) {return dstrlenW(str);}

template<typename T>
class MyString {
T* data_;
public:
size_t Length() const {return dstrlen(data_);}
// ...

Voila, not a sign of preprocessor anywhere! Note that one-liner inline
functions will be optimized away by any decent C++ compiler, so there is
no performance argument to be made for the preprocessor (a proper C++
string class would probably cache the length in-class and reduce calls to
strlen() to a minimum, but that's another topic).

[...]
> To answer your last question, when you consider ANSI is a left-over
> term that really means any 8-bit text, its use becomes clearer. Why
> have every string take up twice the space as may be needed?

OK, so as I understand your string class is just a container for holding
either char or wchar_t type of elements. In that case, what makes it
different from std::basic_string or from MS ATL::CStringT template
classes? Or from std::vector, for that matter? What's the reason of
reimplementing the wheel? If it is for learning, then you could start
with studying the implementation of std::basic_string for example.

Cheers
Paavo

DSF

unread,
Dec 15, 2013, 3:54:32 PM12/15/13
to
On Mon, 09 Dec 2013 00:52:07 -0500, DSF <nota...@address.here>
wrote:
Hello group! A group answer for the group of responders.

Sorry I'm late responding. It's due to a few sick days and the fact
that converting the FStringX classes into a template wasn't as simple
as I thought and I had to go do some more research on templates. The
FStringX classes contained code I'd never tried to put in a template.
Also, I wanted to be sure it actually worked before responding.

First, I'd like to thank Paavo Helde, Robert Wessel and Tobias
M�ller for what turned out to be the best answer. Overload the string
functions.

To my credit, I would like to say I created the code below before
reading Paavo's post on the 10th. I dropped the d and replaced it
with a capital F to visually fit in better with the FString class.
Also, and this will cover Robert Wessel's comment about exporting the
dstr* series to a C program, the dstr* series and other string
manipulators are in a C library I've written. Useable for both
languages. About 80% of the library is written in x86 assembly, so
some kind of wrapper was necessary. (I suppose one could figure out
the name mangling and export C++ names in assembly code, but wrappers
are easier, cost nothing, and are more likely to be portable between
compilers on the same platform.)

This means I couldn't use Paavo's code from the tenth, because it
would break the UNICODE-defined A and W in the string library header.
Come to think of it, I could just #undef dstrlen and any others after
the header is included, then Paavo's method would work. But I think
I'll leave it as it is. It separates the C nomenclature from the C++.

Under the d'oh! category: I don't know if I just forgot it a long
time ago or whatever, but it never occurred to me that non-member
functions could be overloaded, too. It hasn't come up in my
programming. But now...

inline size_t Fstrlen(const char *str){return ::dstrlenA(str);}
inline size_t Fstrlen(const wchar_t *str){return ::dstrlenW(str);}
etc.

... I get the correct version for free! (No overhead!) The compiler
picks the correct FStrlen by parameter type and, with inlining,
Fstrlen(some_string); assembles to a direct call to the proper
dstrlen, A or W.

Problem solved. But to be thorough, answers to other questions.

As to Paavo Helde's question regarding:

// FStr being the template class
typedef FStr<char> FStringA;
typedef FStr<wchar_t> FStringW;

#ifdef UNICODE
#define UCFLAG
#undef UNICODE
#endif
FStringA title;
#ifdef UCFLAG
#undef UCFLAG
#define UNICODE
#endif

Paavo: Sorry, it's probably my fault, but I did not understand your
problem or example. Why is it necessary to undef UNICODE when
declaring an instance of your class? Why cannot you just use some more
typedefs?

And Robert Wessel's similar:

And why does this not work anyway? Assuming your code is actually
using dstrlen (and not dstrlenA or ...W), it should compile as
appropriate based on the whether or not UNICODE is defined.

I am using both Fstring types in the same program, therefore, I
cannot just let the UNICODE definition handle it. The above
preprocessor code was a way to force "A" or "W" while preserving
UNICODE for the rest of the program.

As to Alf P. Steinbach's suggestion regarding std::char_traits, and
Paavo's to study std::basic_string. I've browsed some STL code and I
understand why it's called the ST*!L!*. It's almost a Language unto
itself! Template built upon template built upon....ARRRGH! From what
I've seen myself and read in Web articles about learning templates
from studying the STL: Learning template design by studying the STL
is like learning basic physics by studying NASA shuttle design
documents! :o)

One more:

Paavo said:

OK, so as I understand your string class is just a container for
holding either char or wchar_t type of elements. In that case, what
makes it different from std::basic_string or from MS ATL::CStringT
template classes? Or from std::vector, for that matter? What's the
reason of reimplementing the wheel? If it is for learning, then you
could start with studying the implementation of std::basic_string for
example.

Yes, I know you said "reimplementing," but I've been waiting for
someone to use the phrase for quite a while, so I could write the
paragraph below. This is probably as close as I'm going to get.

To agree with you, reinventing the wheel is a waste of time if you
are building a car, but useful if you are learning to build a wheel.
The term "reinventing the wheel" isn't really valid as a alternate
phrase for wasting one's time. The first wheel was probably a slice
of tree trunk with a hole bored through the center and a branch for an
axle. I imagine those wore out fast. Then somebody probably used
something like animal fat to reduce the friction...etc. The modern
wheel, say on a car, is quite a bit more durable and efficient, thanks
to centuries of reinvention.

Anyway, I did finish the transformation, and after a few basic
tests, I pulled FStringA and W.cpp from a project which parses
hundreds of HTML files and produces a large directory tree of files.
(It's almost ALL string manipulation.) FString.h contains the
template and is already used wherever the FString class would be. I
compiled and ran it, comparing the results to the output of the
FStringA/W version and tweaked the template until they matched.

One question. To save a lot of rewriting, I've typedef'd the
strings as so:

typedef FString<char> FStringA;
typedef FString<wchar_t> FStringW;

I don't see any problem there, but it's followed-up with this to
handle the generic FString UNICODE switching:

#ifdef UNICODE
#define FString FStringW
#else
#define FString FStringA
#endif

I'm concerned about using the same name (FString) in the macro as it
is the template class name. It works, but am I missing any pitfalls?

Thanks!

DSF

unread,
Dec 15, 2013, 4:03:11 PM12/15/13
to
On Mon, 09 Dec 2013 21:49:55 -0500, DSF <nota...@address.here>
wrote:

>On Mon, 09 Dec 2013 01:14:04 -0600, Paavo Helde
><myfir...@osa.pri.ee> wrote:
>
>>DSF <nota...@address.here> wrote in
>>news:t2kaa99judr3b7o96...@4ax.com:
>>

>>Actually I have another unrelated question: what use has the ANSI (in the
>>sense Windows is abusing this term) version nowadays anyway? I could
>>understand an ASCII version of strings (for speed) or UTF-8 (for
>>portability), but ANSI? What possible benefit does it have to support a
>>random and unknown small subset of non-ASCII characters on the user
>>computer? These ANSI interfaces in Windows are just an almost 20-year
>>legacy, all the Windows internals work in Unicode anyway, so why don't
>>you just have a couple of conversion functions for your class from-to
>>ANSI to interact with any old legacy client code expecting ANSI strings?

My answer (below) to this was WRONG, WRONG, WRONG! The "A" versions
use ANSI code pages. One exception is that they have included the use
of CP_UTF7 and CP_UTF8 as pseudo code pages, allowing 8-bit Unicode
use. There's also OEM charsets, but things are complicated enough
already!

> You answered your question within itself: "(in the sense Windows is
>abusing this term)". Their documentation defines A as ANSI and W as
>wide or UTF. As far as I'm concerned, A means 8-bit characters and W
>means 16-bit Unicode characters, as that's what M$ has decided are
>Windows' two character types. They could have started saying "A is
>for ASCII," but from their point of view, why bother to change every
>reference in the tons of documentation?
>

Alf P. Steinbach

unread,
Dec 15, 2013, 4:41:39 PM12/15/13
to
On 15.12.2013 21:54, DSF wrote:
>
> inline size_t Fstrlen(const char *str){return ::dstrlenA(str);}
> inline size_t Fstrlen(const wchar_t *str){return ::dstrlenW(str);}
> etc.
>
> ... I get the correct version for free! (No overhead!) The compiler
> picks the correct FStrlen by parameter type and, with inlining,
> Fstrlen(some_string); assembles to a direct call to the proper
> dstrlen, A or W.
>
> Problem solved.

Consider the definition

#include <string.h>

inline size_t FStrlen( char const* str ) { return strlen( str ); }
inline size_t FStrlen( wchar_t const* str ) { return wcslen( str ); }

Here's roughly the same expressed using std::char_traits:

#include <string>

template< class Char >
size_t FStrlen( Char const* str ) { return
std::char_traits<Char>::length( str ); }

And if FStrlen only serves as an implementation helper for a templated
string class, then even this wrapping isn't necessary -- because with
known Char type one can just call char_traits<Char>::length directly.


Cheers,

- Alf

Paavo Helde

unread,
Dec 15, 2013, 4:55:35 PM12/15/13
to
DSF <nota...@address.here> wrote in
news:83vra99i3qbc6lqjv...@4ax.com:
> One question. To save a lot of rewriting, I've typedef'd the
> strings as so:
>
> typedef FString<char> FStringA;
> typedef FString<wchar_t> FStringW;
>
> I don't see any problem there, but it's followed-up with this to
> handle the generic FString UNICODE switching:
>
> #ifdef UNICODE
> #define FString FStringW
> #else
> #define FString FStringA
> #endif
>
> I'm concerned about using the same name (FString) in the macro as it
> is the template class name. It works, but am I missing any pitfalls?

Preprocessing is done before any other stuff, so this indeed works as
long as the macro definition does not precede the template (or its
members') definition. So it is just confusing. Why don't you consider to
rename the original template FString to basic_FString or whatever? There
is no reason for them to have the same name.

p


DSF

unread,
Dec 17, 2013, 7:11:49 PM12/17/13
to
If Fstrlen is removed, then we're left with just STL, right?

I'm still getting used to templates and I find a lot of the STL
confusing to use. Plus there's what's exhibited below...

Should the code snippet above work "as is" with a few details
added, as:

#include <string>

template< class char > size_t FStrlen( Char const* str ) { return
std::char_traits<Char>::length( str ); }

int main()
{
char *test = "ABCDEFG";
static int sz = FStrlen(test);
return ERROR_SUCCESS;
}
Should the above compile/work?

In my case, it blows up with 1 warning and 5 errors in string.h. I
was going to tell you that this wouldn't work for me until I have the
time to set up and learn a newer compiler. I was going to do that by
replacing char with wchar_t, at which point it would no longer even
compile. (I tried this with string and achieved those results.)

The STL was written by a different company than the compiler and
makes calls to functions that are not declared and do not exist. Even
though the above was strictly 8-bit char usage, here's the error
report:
Info :Making...
Info :Compiling E:\test\sttest.cpp
Warn : STRING.h(616,3):Possibly incorrect assignment
Error: STRING.h(625,3):Call to undefined function 'iswspace'
Error: STRING.h(629,3):Call to undefined function 'wcsncmp'
Error: STRING.h(634,3):Call to undefined function 'wcsncpy'
Error: STRING.h(638,3):Call to undefined function 'wcsncpy'
Error: sttest.cpp(10,13):{ expected

Note that all the undefined functions have a 'w' in them. The STL
supports wide chars, but the RTL does not. That's why I made all the
'dstrlen's and 'dstrcpy's, so I could have wide versions available.
(That and they are faster than the RTL versions.)

As a side topic, why remove the extension .h in C++? It would make
associations impossible. I say "would make" because there is only one
string header file with my system, and it's string.h. All RTL and STL
definitions are in the same file. Is that normal? Is the dropped
extension added by the compiler, or are there typically separate
string.h and string files?

Thanks again!

DSF

unread,
Dec 17, 2013, 7:14:14 PM12/17/13
to
True. The code isn't engraved in stone. A few minutes with search
and replace can take care of the whole template.

Alf P. Steinbach

unread,
Dec 17, 2013, 7:52:16 PM12/17/13
to
On 18.12.2013 01:11, DSF wrote:
>
> If Fstrlen is removed, then we're left with just STL, right?

Possibly. :)


> I'm still getting used to templates and I find a lot of the STL
> confusing to use. Plus there's what's exhibited below...
>
> Should the code snippet above work "as is" with a few details
> added, as:
>
> #include <string>
>
> template< class char > size_t FStrlen( Char const* str ) { return
> std::char_traits<Char>::length( str ); }

"class char" won't work since "char" is a keyword. Better make that
"class Char".

>
> int main()
> {
> char *test = "ABCDEFG";

Here you need a "const" for C++11 conformance. It's anyway a good idea
regardless of compiler, to avoid warnings and to avoid bugs. I.e.,

char const* test = "ABCDEFG";


> static int sz = FStrlen(test);
> return ERROR_SUCCESS;

Use of ERROR_SUCCESS may not necessarily compile without including
<stdlib.h> (or <cstdlib>).


> }
> Should the above compile/work?

See the above comments. After applying the suggested fixes you can just
try it with your compiler. This code doesn't use any fancy new features
so (with corrections) should compile with just about any C++ compiler.

Disclaimer: I didn't put it through a compiler.

[snip]
>
> As a side topic, why remove the extension .h in C++? It would make
> associations impossible.

Right. I used to joke that that /was/ the reason. :-)

I think the idea was to do something very distinct from C.

IMHO needless complication.


> I say "would make" because there is only one
> string header file with my system, and it's string.h. All RTL and STL
> definitions are in the same file. Is that normal? Is the dropped
> extension added by the compiler, or are there typically separate
> string.h and string files?

For an implementation with headers-as-files (and that includes all C++
implementations that I have used) there are three files:

<string.h> is the original C library header.

<cstring> is a C++ variant which guaranteed places the non-macro
symbols in namespace `std` (may also use global ns).

<string> is an unrelated C++ header that defines a class
template called std::basic_string, and two
instantiations called std::string and std::wstring.


However, a standard library "header" needs not be a file, and is nowhere
described as a file.

I have not encountered the scheme that you describe with everything in
one big file, but it is therefore permitted by the standard.

A standard library header can be implemented by way of a database query,
or even hardcoded in the compiler -- whatever.


Cheers & hth.,

- Alf

0 new messages