Ascii Characters

e4m...@gmail.com

unread,

Apr 28, 2009, 9:13:50 AM4/28/09

to

This is a very dumb question... I have a std::vector<std::string> that
I'm pushing ascii charcters into like so:

set.push_back("a");
set.push_back("b");
...
...

I'm doing this manually. This is error prone and time consuming. Does
something in the STL have these already? I'm rather new to C++ so
forgive me if this is blatantly obvious. Google did not help much. The
goal is to get all typable ascii chars into the vector.

Thanks

Victor Bazarov

unread,

Apr 28, 2009, 9:19:53 AM4/28/09

to

e4m...@gmail.com wrote:
> This is a very dumb question... I have a std::vector<std::string> that
> I'm pushing ascii charcters into like so:
>
> set.push_back("a");
> set.push_back("b");

'set' is not the best name for a vector. There is 'std::set', you know...

> ...
> ...
>
> I'm doing this manually. This is error prone and time consuming. Does
> something in the STL have these already? I'm rather new to C++ so
> forgive me if this is blatantly obvious. Google did not help much. The
> goal is to get all typable ascii chars into the vector.

I am not sure what is "a typable ascii", to be honest, perhaps you need
to do 'isprint' on those, but look into 'std::generate' or 'std::fill'.
There is a constructor of 'std::string' that takes the value and the
count, use it in a loop. Something like

for (char c = ' '; c <= '~'; ++c)
set.push_back(std::string(1, c));

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

Pascal J. Bourguignon

unread,

Apr 28, 2009, 9:44:22 AM4/28/09

to

e4m...@gmail.com writes:

> This is a very dumb question... I have a std::vector<std::string> that
> I'm pushing ascii charcters into like so:
>
> set.push_back("a");
> set.push_back("b");
> ...
>

> I'm doing this manually. This is error prone and time consuming. Does
> something in the STL have these already? I'm rather new to C++ so
> forgive me if this is blatantly obvious. Google did not help much. The
> goal is to get all typable ascii chars into the vector.

Better browse directly a C++ Library Reference such as
http://www.cplusplus.com/reference/

std::string printableCharactersInASCII(" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~");

for(std::string::const_iterator it=printableCharactersInASCII.begin();
it!=printableCharactersInASCII.end();
it++){
std::string character(it,it+1);
set.push_back(character);
}

Notice that this program works even on systems that don't use the
ASCII _code_, as long as they can represent all the printable
characters of the ASCII _character_ _set_ (those in that
printableCharactersInASCII string).

--
__Pascal Bourguignon__

Pascal J. Bourguignon

unread,

Apr 28, 2009, 9:44:53 AM4/28/09

to

Victor Bazarov <v.Aba...@comAcast.net> writes:

> for (char c = ' '; c <= '~'; ++c)
> set.push_back(std::string(1, c));

This doesn't work on my EBCDIC based system. Broken program!

--
__Pascal Bourguignon__

Victor Bazarov

unread,

Apr 28, 2009, 9:57:21 AM4/28/09

to

Pascal J. Bourguignon wrote:
> Victor Bazarov <v.Aba...@comAcast.net> writes:
>
>> for (char c = ' '; c <= '~'; ++c)
>> set.push_back(std::string(1, c));
>
> This doesn't work on my EBCDIC based system. Broken program!

Didn't the OP say "ASCII"?...

Pascal J. Bourguignon

unread,

Apr 28, 2009, 10:47:54 AM4/28/09

to

Victor Bazarov <v.Aba...@comAcast.net> writes:

> Pascal J. Bourguignon wrote:
>> Victor Bazarov <v.Aba...@comAcast.net> writes:
>>
>>> for (char c = ' '; c <= '~'; ++c)
>>> set.push_back(std::string(1, c));
>>
>> This doesn't work on my EBCDIC based system. Broken program!
>
> Didn't the OP say "ASCII"?...

No, he said "the ASCII characters".

Whatever the encoding used by the programming language and/or the
host, you can handle them as long has their character set includes the
ASCII printable characters, which are the space and
!"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
`abcdefghijklmnopqrstuvwxyz{|}~

Unfortunately, when you use the char type in C or C++, with literals
such as ' ' or '~', you are not working with characters anymore, but
with their codes (char is a subtype of int, not of character).

We may assume that if the OP is interested by ASCII characters, he may
have to encode and decode them, so it is assumed that the
printableCharactersInASCII string will be used to do that:

#include <limits.h>
#if CHARBITS<7
typedef int ASCII_Code;
#else
typedef unsigned char ASCII_Code;
#endif

ASCII_Code character_to_ASCII_Code(std::string character){
if(character=="\n"){
return(10);
}else{
size_t pos=printableCharactersInASCII.find(character);
if(pos==std::string::npos){
throw std::exception("Not an ASCII character");
}else{
return(32+pos);
}
}
}

std::string ASCII_Code_to_character(ASCII_Code code){
if(code==10){
return(std::string("\n"));
}else if((32<=code)and(code<=126)){
return(std::string(printableCharactersInASCII[code-32]));
}else{
throw std::exception("Not an ASCII printable character");
}
}

--
__Pascal Bourguignon__

Jack Klein

unread,

Apr 28, 2009, 10:36:12 PM4/28/09

to

On Tue, 28 Apr 2009 16:47:54 +0200, p...@informatimago.com (Pascal J.
Bourguignon) wrote in comp.lang.c++:

> Victor Bazarov <v.Aba...@comAcast.net> writes:
>
> > Pascal J. Bourguignon wrote:
> >> Victor Bazarov <v.Aba...@comAcast.net> writes:
> >>
> >>> for (char c = ' '; c <= '~'; ++c)
> >>> set.push_back(std::string(1, c));
> >>
> >> This doesn't work on my EBCDIC based system. Broken program!
> >
> > Didn't the OP say "ASCII"?...
>
> No, he said "the ASCII characters".

OK, it is possible to have different interpretations of what the OP
really wants, based on his post.

> Whatever the encoding used by the programming language and/or the
> host, you can handle them as long has their character set includes the
> ASCII printable characters, which are the space and
> !"#$%&'()*+,-./0123456789:;<=>?
> @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
> `abcdefghijklmnopqrstuvwxyz{|}~
>
>
> Unfortunately, when you use the char type in C or C++, with literals
> such as ' ' or '~', you are not working with characters anymore, but
> with their codes (char is a subtype of int, not of character).
>
>
> We may assume that if the OP is interested by ASCII characters, he may
> have to encode and decode them, so it is assumed that the
> printableCharactersInASCII string will be used to do that:
>
> #include <limits.h>
> #if CHARBITS<7

I assume, of course, that you mean CHAR_BIT, which is an actual macro
defined in <limits.h> or <climits>. If CHAR_BIT is less than 8, you
don't have a C++ or even a C compiler.

> typedef int ASCII_Code;

Why not always define it as an int?

[snip]

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html

Pascal J. Bourguignon

unread,

Apr 29, 2009, 4:28:34 AM4/29/09

to

Jack Klein <jack...@spamcop.net> writes:

>> typedef int ASCII_Code;
>
> Why not always define it as an int?

Because ASCII codes have only 7 bits. If you can store these 7 bits
in a unsigned char then vectors of ASCII codes will take less memory
space.

--
__Pascal Bourguignon__

Jerry Coffin

unread,

Apr 29, 2009, 8:10:09 PM4/29/09

to

In article <cf0a3865-7f74-4d61-ab70-
3e5963...@b6g2000pre.googlegroups.com>, e4m...@gmail.com says...

My first question would be exactly what you're trying to accomplish with
this. You mention ASCII, but that's a fairly obsolete encoding -- you're
much more likely to encounter something like ISO 8859 nowadays. Are you
sure you really need ASCII per se, or do you want whatever character set
is being used on the target computer?

Assuming what you really want is the information for the current locale
(or at least some known locale) you can retrieve it from that locale
instead of trying to generate it on your own. For the global locale, it
can look like this:

for (int i=0; i<CHAR_MAX; i++)
if (isprint(i))
set.push_back(i);

For another locale, things get a bit uglier, but not horribly so:

std::locale loc("");

for (int i=0; i<CHAR_MAX; i++)
if (std::use_facet<ctype<char> >(loc).isprint(i))
set.push_back(i);

For the moment, I've assumed a char-based encoding -- if you want a
wchar_t based encoding instead, things could get a little hairy. For
UCS-4 encoded ISO 10646, your string would occupy something like 16
gigabytes (32-bit encoding, so each character is 32 bits, and you have 2
^32-1 of them possible, and a large number of those are printable. For
such a situation, I'd think hard about fiding some fundamentally
different method of doing what you need (possibly using the locale's
functions instead of storing their results).

--
Later,
Jerry.

The universe is a figment of its own imagination.