ABC
PQR
XYZ
for that I have put the above mentioned string as ABC \nPQR \nXYZ
But instead of displaying the next string on next line, when I print
the data, it prints as ABC \nPQR \nXYZ.
Any help would be highly appriciated.
Thanks
SRK
Input file (tst.txt):
ABC PQR XYZ
Example code (tst):
#include <iostream>
int main() {
std::string word;
while(std::cin >> word) {
if(std::cin.eof()) break;
std::cout << word << std::endl;
}
}
Call as:
$ ./tst < tst.txt
Let me mention that I am using FILE pointer for reading from the file
and using fgets for reading the entire line. I dont have to print the
string one by one, but I want to make it something like a menu and
send it on a socket.
thanks
SRK
> Hi folks,
> I am trying to read some data from a config file and want that data to
> be printed in formatted way foe example if I have this string in the
> config file - ABC PQR XYZ and want to display it like
>
> ABC
> PQR
> XYZ
>
> for that I have put the above mentioned string as ABC \nPQR \nXYZ
>
> But instead of displaying the next string on next line, when I print
> the data, it prints as ABC \nPQR \nXYZ.
You need to convert two symbols (backslash and n) to a single symbol
(line-feed '\n').
If this is not very performance-critical, you can do it in-place:
std::string s = "ABC\\nPQR\\nXYZ\\n"; // contains backslash and n
std::string::size_type k = 0;
while((k=s.find("\\n", k))!=s.npos) {
s.replace(k, 2, "\n");
}
// s now contains linefeed characters instead.
hth
Paavo
or, maybe the good old char-pointers strategy:
char tb[256]; //or whatever is the maximum sane length
char *s, *t;
s=input; t=tb;
while(*s)
{
if((*s=='\\') && (*(s+1)=='n'))
{ *t++='\n'; s+=2; continue; }
*t++=*s++;
}
*t++=0;
granted, one can debate whether or not this is good style in C++, ... but,
should work ok.
>
> "Paavo Helde" <myfir...@osa.pri.ee> wrote in message
> news:Xns9CEBD2680...@216.196.109.131...
>> SRK <kum...@gmail.com> wrote in news:f45f3f70-d140-4b48-af0a-
>> 10f52e...@15g2000prz.googlegroups.com:
>>
>>> Hi folks,
>>> I am trying to read some data from a config file and want that data
>>> to be printed in formatted way foe example if I have this string in
>>> the config file - ABC PQR XYZ and want to display it like
>>>
>>> ABC
>>> PQR
>>> XYZ
>>>
>>> for that I have put the above mentioned string as ABC \nPQR \nXYZ
>>>
>>> But instead of displaying the next string on next line, when I print
>>> the data, it prints as ABC \nPQR \nXYZ.
>>
>> You need to convert two symbols (backslash and n) to a single symbol
>> (line-feed '\n').
>>
>> If this is not very performance-critical, you can do it in-place:
>>
>> std::string s = "ABC\\nPQR\\nXYZ\\n"; // contains backslash and n
>>
>> std::string::size_type k = 0;
>> while((k=s.find("\\n", k))!=s.npos) {
>> s.replace(k, 2, "\n");
>> }
>>
>> // s now contains linefeed characters instead.
>>
>
> or, maybe the good old char-pointers strategy:
> char tb[256]; //or whatever is the maximum sane length
This is a buffer overrun error waiting to happen (or being exploited). At
least one should check the length, or allocate a buffer long enough. In
this case this should be easy.
> char *s, *t;
>
> s=input; t=tb;
> while(*s)
> {
> if((*s=='\\') && (*(s+1)=='n'))
> { *t++='\n'; s+=2; continue; }
> *t++=*s++;
> }
> *t++=0;
>
>
> granted, one can debate whether or not this is good style in C++, ...
> but, should work ok.
This is a C solution, and assumes C strings (zero-terminated, no embedded
zeroes). Probably this is a harmless assumption, but nevertheless it is
slightly different, and makes the coding a bit more convenient (no extra
care needed when checking *(s+1)).
The same algorithm can be made to work with std::string as well of
course, by extracting the C-style string pointer via the c_str() member
function:
std::string input = ...
std::string output(input.length(), '\0');
if (!input.empty()) {
const char* s = input.c_str();
char* t = &output[0];
// C-style algorithm here...
output.resize(t-&output[0]);
}
Paavo
but, it is worth noting that new/malloc and delete/free are not free
either...
so, the possibility of a buffer overflow may sometimes be justifiable in the
name of performance...
there is also a reason for a value like 256 rather than, say, 82.
we could declare, "well, no valid text file has > 80 characters per line",
and use 82 (allowing for a newline and a nul), but 256 adds a little more
padding.
granted, a 256 char line will overflow this buffer...
granted:
char *buf;
buf=(char *)malloc(strlen(input)+1);
is easy enough...
in practice, I usually use an alternative strategy I call a "rotating
allocator", where usually the potential cost of a buffer overflow is fairly
low, and a rotating allocator is not readily exploitable (there is little to
say where the string will be in memory), ...
>
>> char *s, *t;
>>
>> s=input; t=tb;
>> while(*s)
>> {
>> if((*s=='\\') && (*(s+1)=='n'))
>> { *t++='\n'; s+=2; continue; }
>> *t++=*s++;
>> }
>> *t++=0;
>>
>>
>> granted, one can debate whether or not this is good style in C++, ...
>> but, should work ok.
>
> This is a C solution, and assumes C strings (zero-terminated, no embedded
> zeroes). Probably this is a harmless assumption, but nevertheless it is
> slightly different, and makes the coding a bit more convenient (no extra
> care needed when checking *(s+1)).
>
this is partly why I said "good old char pointers"...
> The same algorithm can be made to work with std::string as well of
> course, by extracting the C-style string pointer via the c_str() member
> function:
>
>
> std::string input = ...
> std::string output(input.length(), '\0');
>
> if (!input.empty()) {
> const char* s = input.c_str();
> char* t = &output[0];
>
> // C-style algorithm here...
>
> output.resize(t-&output[0]);
> }
>
granted, but I guess the question then is whether or not someone is using
std::string...
admitted, yes, I am more of a C coder than a C++ one (I use C++ sometimes,
but a majority of my code is C), and I tend to prefer strategies which work
fairly well in both cases...
(not wanting to make debate here, but there are reasons to choose one or
another in different contexts, many not particularly relevant to the
language as seen/written by humans, and many not related to the "majority"
of projects).
> Paavo
>
string tmp;
while(getline(ifs,tmp))
{
istringstream iss(tmp);
while(iss>>tmp)cout<<tmp<<'\n';
}
this code does not compile,,,, it is just example of one of ways you can
do that....
Greets
> so, the possibility of a buffer overflow may sometimes be justifiable
> in the name of performance...
I hope you are joking!
Now, seriously, the unexpected things like the size of input data come from
the outside of the program, by definition. Input/output is typically slow
enough that a check for the input data size would cost next to nothing. I
see *no* justification of skipping that! Note that I do not advocate
dynamic allocation, but just a simple check and error return.
> (not wanting to make debate here, but there are reasons to choose one
> or another in different contexts, many not particularly relevant to
> the language as seen/written by humans, and many not related to the
> "majority" of projects).
I cannot see any reason to knowingly leave a potential UB bug in the
program. God knows there are many of them already left unknowingly, no
reason to add one!
Paavo
Hm, microsoft had practice to allow write in deallocated memory
in order for some important applications to work on windows.
I don;t see how this is problem. Besides that it is excellent
idea to provide COM interface to download binary code in such
environment from the internet..
Greets
Greets
there are many cases where strings-based processing may need to be done
purely in memory, and in a performance-critical location.
consider for example, a program drives many parts of its logic via in-memory
command strings.
in such cases, allocating or freeing memory, or sometimes even basic sanity
checking (such as checking that a passed in pointer is not NULL, or that a
string is not too long and does not contain invalid characters, ...), can
risk notably slowing down the app.
for example, I have an x86 interpreter (an interpreter for 32-bit x86
machine code) where, of all things, the main opcode decoder, is based
primarily on strings-based logic code (although it is optimized some via
dynamically-built dispatch-tables and hashing).
another case of strings based logic is in many auto-codegen functions (which
use ASCII command-strings to generate machine code to further drive the app
logic, or build parts of the app's logic-code at runtime).
similarly, this kind of thing may allow many aspects of the apps' logic to
be "human readable" (or, at least as much as a big mass of ASCII characters
can be...), which is much nicer for debugging than having to sort through
binary data (for example, in the form of hexdumps or base64 dumps, ...).
similarly, both my object system and XML DOM code use lots of strings
handling code, and could also risk slowing things down (one may even end up
going so far as to pre-compute hash keys after noting that a notable amount
of time was going into simply recalculating the hash value during lookups).
granted, in the OP's case, performance is probably not all that important...
>> (not wanting to make debate here, but there are reasons to choose one
>> or another in different contexts, many not particularly relevant to
>> the language as seen/written by humans, and many not related to the
>> "majority" of projects).
>
> I cannot see any reason to knowingly leave a potential UB bug in the
> program. God knows there are many of them already left unknowingly, no
> reason to add one!
>
these can be "boundary conditions", and are normally weeded out elsewhere.
but, alas, there may be a lot of consideration as to whether it is better to
leave a possible bug, or fix it so that it doesn't risk crashing or posing a
possible security hole.
one can then use the debugger and test cases to determine how generally
reliable the code is (AKA: how many bits of bad data can escape through the
proper "safety nets", as well as how well everythings actually works), and
profilers to determine where optimization is needed.
> Paavo
> > I cannot see any reason to knowingly leave a potential UB
> > bug in the program. God knows there are many of them already
> > left unknowingly, no reason to add one!
> Hm, microsoft had practice to allow write in deallocated
> memory in order for some important applications to work on
> windows.
In the earliest versions of C (pre-standard), the rule was that
the pointer to realloc had to be the last pointer that was
freed. In those days, it was considered acceptable to use freed
memory up until the next call to malloc.
In those days, of course, there was no multithreading, and
programs weren't connected to the internet.
> I don;t see how this is problem.
Using a dangling pointer is a serious security hole.
--
James Kanze
as well as a serious crash hazard, IMO...
granted, I am a little less concerned over buffer overflows, granted, they
may be a bit more of a worry if the app actually matters as far as security
goes (connected to the internet, getting input from "untrusted" sources,
...).
even then, it is not often "as bad" in practice, for example, for calls like
'fgets()' one supplies the maximum string length anyways (typically a few
chars less than the buffer size), so this much is self-limiting. one can
know the call will not return an oversize string, since it will be cut off
and returned on the next line.
in many other cases, one knows the code that both produces and accepts the
strings, and so can know that code further up the chain will not exceed the
limit.
as well, 256 is an "accepted" maximum string length (a tradition since long
past that something is seriously wrong if a string is longer than this).
much like how something is wrong if a line in a text file is longer than 80
chars, and it is usually best to limit output to 76 chars just to be safe...
(except in certain file formats, where longer lines tend to pop up a
lot...).
this does allow "some" safety with fixed-size char arrays, which is good
since these are one of the fastest ways I know of to implement certain
string operations.
> --
> James Kanze
Why "few chars less"? Because you are not sure in the documentation?
Or in yourself?
> self-limiting. one can know the call will not return an oversize
> string, since it will be cut off and returned on the next line.
>
> in many other cases, one knows the code that both produces and accepts
> the strings, and so can know that code further up the chain will not
> exceed the limit.
>
> as well, 256 is an "accepted" maximum string length (a tradition since
> long past that something is seriously wrong if a string is longer than
> this).
Accepted by who? I'm serving 10MB HTTP packets through std::string so I'm
sorry I have never heard of this convention. (There was a 256-char string
limitation in Turbo Pascal 3.3, but fortunately this is about 15 years in
history ;-)
>
> much like how something is wrong if a line in a text file is longer
> than 80 chars, and it is usually best to limit output to 76 chars just
> to be safe... (except in certain file formats, where longer lines tend
> to pop up a lot...).
It seems you are confusing the human interface with the program
interface.
>
>
> this does allow "some" safety with fixed-size char arrays, which is
> good since these are one of the fastest ways I know of to implement
> certain string operations.
Using fixed-size arrays does not mean you may skip the check if the data
fits in there. Actually, if the input is not verified and comes from
outside of the program, then it is ridiculous to not check its size. The
cost of doing that is zero, as compared to the time of getting the data
from outside into the program.
I guess many viruses are in dept to guys like you when the "internally
safe" code somehow gets re-used and exploited in the wild.
Paavo
had to go check the documentation...
actually, I had thought the N was the max number of chars to read, excluding
the '\n' and the 0.
apparently the N is adjusted automatically...
oh well...
doesn't matter too much if there is an occasional fgets around with an N of
254...
>
>> self-limiting. one can know the call will not return an oversize
>> string, since it will be cut off and returned on the next line.
>>
>> in many other cases, one knows the code that both produces and accepts
>> the strings, and so can know that code further up the chain will not
>> exceed the limit.
>>
>> as well, 256 is an "accepted" maximum string length (a tradition since
>> long past that something is seriously wrong if a string is longer than
>> this).
>
> Accepted by who? I'm serving 10MB HTTP packets through std::string so I'm
> sorry I have never heard of this convention. (There was a 256-char string
> limitation in Turbo Pascal 3.3, but fortunately this is about 15 years in
> history ;-)
>
"accepted" by traditional practice.
typically, constants like PATH_MAX, ... are 256.
it doesn't take long (for example, if one digs around in system headers),
before a string-length limit of 256 becomes a recurring pattern (even if
there are variations, for example, UNIX_MAX is 108, but I think this is
because of a general rule that (sizeof(sockaddr_storage)==128) or so, ...).
there are many other examples of this particular limit being in use.
it is an accepted limit, much like how i, j, and k, are accepted names for
integer variables, ...
granted, sometimes one wants a bigger limit though, and sometimes a bigger
limit is used (as there is no real technical reason for this particular
limit apart from convention), ...
I once wrote an HTTP server though, and requests with longer strings kept
comming from nowhere (mostly a string of repreating characters with some
garbage at the end), so in that case I made the limit 1024 and also put in a
limit check. (it can be noted that I think a lot of them were like 256 A's
followed by the garbage...).
luckily though, any buffer overflow exploits intended for one server are
likely to do little more than crash another...
>>
>> much like how something is wrong if a line in a text file is longer
>> than 80 chars, and it is usually best to limit output to 76 chars just
>> to be safe... (except in certain file formats, where longer lines tend
>> to pop up a lot...).
>
> It seems you are confusing the human interface with the program
> interface.
>
either way, this limit is established, as a sort of rule of convention for
most well-formed text files.
it is much like how, by convention, a programmer should not write code with
lines longer than this limit.
>>
>>
>> this does allow "some" safety with fixed-size char arrays, which is
>> good since these are one of the fastest ways I know of to implement
>> certain string operations.
>
> Using fixed-size arrays does not mean you may skip the check if the data
> fits in there. Actually, if the input is not verified and comes from
> outside of the program, then it is ridiculous to not check its size. The
> cost of doing that is zero, as compared to the time of getting the data
> from outside into the program.
>
granted, external disk IO is usually measurable at around 20 MB/s IME.
however, there is a lot which often happens "within" the program, say, when
ones' app is divided up into lots of DLL's which do lots of their internal
communication via data serialized as strings, ...
one component will produce streams of text as its output, and another
component will parse them and follow embedded commands. many tasks may
involve many stages of processing of this sort (in addition to the use of
binary API's, ...).
nevermind that, in many of these cases, ANY unsafe input would be a security
risk, even if it does fit nicely into the buffers. the reason here being
that many of these facilities actually have access to features which are
either turing complete in their own right (yeah, this property tends to pop
up a lot...), or have access to code-generation machinery.
consider for example one has a text-stream "eval" mechanism. outside access
to eval is dangerous even if the text itself is well-formed, since eval will
generally allow whatever code hits it to much around with the app (unless of
course the eval is sandboxed, but I am assuming here it is not...).
similar goes if several components are connected via a stream in a
PostScript like format, and, say, some input goes over which fouls up the
command-interpreter, creates an infinite loop, or worse.
trivial example: "/foo {foo} def foo"
granted, this trivial case could be handled by detecting a stack overflow,
but in the general case, it would be difficult to secure even with input
validation...
> I guess many viruses are in dept to guys like you when the "internally
> safe" code somehow gets re-used and exploited in the wild.
>
or it could be just like expecting to check that pointers always point to
valid addressable memory (say, if one is using a garbage collector with the
ability to validate that a pointer is a heap pointer).
often, it would be too expensive, and too much of a hassle, to check these
things as a general matter of practice.
so, a tradeoff is made:
we assume that the caller is passing valid data, and typically check either
in code which is not likely to be a bottleneck, or where the "safety" of the
other end is not ensured.
typically, validity checking will be done when: performing file IO, dealing
with a network connection, or implementing or dealing with a public API.
if none of these is being done (for example, all this is stuff going on
purely internal to the app, which could happen easily enough) then there may
not be a need to validate.
You're young and your history seems to stem from the micro world.
Limits date from a time when really long punch cards were hard to deal
with and they didn't fit the reader. :)
>> Accepted by who? I'm serving 10MB HTTP packets through std::string so I'm
>> sorry I have never heard of this convention. (There was a 256-char string
>> limitation in Turbo Pascal 3.3, but fortunately this is about 15 years in
>> history ;-)
>
> "accepted" by traditional practice.
>
> typically, constants like PATH_MAX, ... are 256.
POSIX and most linux disagree, check linux/limits.h.
> it doesn't take long (for example, if one digs around in system headers),
> before a string-length limit of 256 becomes a recurring pattern (even if
> there are variations, for example, UNIX_MAX is 108, but I think this is
> because of a general rule that (sizeof(sockaddr_storage)==128) or so, ...).
>
> there are many other examples of this particular limit being in use.
Calling this "accepted" or "common" is a stretch. Basic, Pascal, and
environments that represented strings with embedded size were limited
but even in research unix the constant was often larger even with real
core memory constraints. It's hard to find or imagine a modern
unix/linux/Gnu app with a 256 limit.
Windows has some real ideas about accepted string limits scattered
around randomly but I can't agree that either the concept of limits or
even a common 256 default exists in c and it's nonsense for c++.
> it is an accepted limit, much like how i, j, and k, are accepted names for
> integer variables, ...
Mostly habit from Fortran, where it was part of the language and not
just a convention, that was passed on through generations.
> granted, sometimes one wants a bigger limit though, and sometimes a bigger
> limit is used (as there is no real technical reason for this particular
> limit apart from convention), ...
You've mentioned games many times, so maybe in that domain this is a
convention. In many other fields this doesn't wash, and it seems you
may be crossing up c and c++ to boot.
possibly, but not that young anymore...
but, yeah, admittedly AFAIK punch cards went out of style decades before I
was born (which was I guess during the high point of 5.25" floppies and the
IBM PC, which were themselves a rapidly dying technology by the time I was
really old enough to do much, in the days of the dying DOS and rise of
Windows...).
but, now, much time has passed, and age is setting in...
>>> Accepted by who? I'm serving 10MB HTTP packets through std::string so
>>> I'm
>>> sorry I have never heard of this convention. (There was a 256-char
>>> string
>>> limitation in Turbo Pascal 3.3, but fortunately this is about 15 years
>>> in
>>> history ;-)
>>
>> "accepted" by traditional practice.
>>
>> typically, constants like PATH_MAX, ... are 256.
>
> POSIX and most linux disagree, check linux/limits.h.
>
odd, I had seen PATH_MAX as 256, but then again, I am in Windows-land...
>> it doesn't take long (for example, if one digs around in system headers),
>> before a string-length limit of 256 becomes a recurring pattern (even if
>> there are variations, for example, UNIX_MAX is 108, but I think this is
>> because of a general rule that (sizeof(sockaddr_storage)==128) or so,
>> ...).
>>
>> there are many other examples of this particular limit being in use.
>
> Calling this "accepted" or "common" is a stretch. Basic, Pascal, and
> environments that represented strings with embedded size were limited
> but even in research unix the constant was often larger even with real
> core memory constraints. It's hard to find or imagine a modern
> unix/linux/Gnu app with a 256 limit.
>
ok.
> Windows has some real ideas about accepted string limits scattered
> around randomly but I can't agree that either the concept of limits or
> even a common 256 default exists in c and it's nonsense for c++.
>
granted.
it depends on usage I guess, since it is worth noting that a longer limit
usually means using up more space on the stack, and stack space is not
exactly free...
likewise, heap isn't exactly free either, and allocating/freeing memory can
hurt performance if done poorly (such as in a function which is called in a
loop).
as others have noted, it may not be a big issue if one is processing input
which comes from disk, but I guess it is a question of what and how much
comes from disk, and how much is being shoved around intra-app (say, for
inter-component communication, ...).
>> it is an accepted limit, much like how i, j, and k, are accepted names
>> for
>> integer variables, ...
>
> Mostly habit from Fortran, where it was part of the language and not
> just a convention, that was passed on through generations.
>
yep.
granted, it is not good to defy traditions though, since usually things are
some particular way for a good reason...
>> granted, sometimes one wants a bigger limit though, and sometimes a
>> bigger
>> limit is used (as there is no real technical reason for this particular
>> limit apart from convention), ...
>
> You've mentioned games many times, so maybe in that domain this is a
> convention. In many other fields this doesn't wash, and it seems you
> may be crossing up c and c++ to boot.
I use both C and C++, though generally more C than C++.
note that, for example, in Quake 2, most string limits are shorter than
this, for example, 16 and 64 character string limits are common (for
example, QPATH_MAX is defined as 64, ...).
I also deal a lot with VM type stuff:
interpreters, JIT compilers, ... where basically the compiler may be working
(say, compiling code fragments, ...) at the same time as other parts of the
application are doing other tasks, ...
interpreting code, and wasting time in an interpreter, can easily kill
performance. an interpreter, for example, often has to cut a lot of corners
in an attempt to keep speed up (and, even then, interpreters still tend to
be rather slow, hence the usage of JIT in many cases, but then one needs to
have relatively fast compiler machinery, ...).
I typically use larger limits, but usually the 256-char limit is for a
single token (in parsing).
for buffers which may deal with globs of text, I usually use either larger
limits, expanding buffers, or size limit checks.
I usually use a limit of around 1024 or so for name-mangled tokens (say,
when the function name and signature are mangled together for linker-related
purposes, ...).
I think by convention though, PE/COFF informally has a limit of 256 here
(for valid function names), and the C standard has a limit around 32 (for a
minimum allowed implementation limit), where a longer name is not required
to be necessarily valid in a conforming compiler (the usual idea being that
the identifier would be truncated).
granted, this should not be a problem so long as one is not using the
MyFunctionNameIsDamnNearAWholeParagraph naming scheme...
granted, it is worth noting though that some bulk to names is usually added
as a result of using a naming convention which tends to add library and
subsystem prefixes to exported names (except public API functions, which
tend to have a much shorter prefix).
this is common in C and mixed C & C++ codebases, given the non-availability
of namespaces, ...
it is also common practice (in C) not to wrap strings in any sort of
container, since this makes things typically more awkward, and generally
hurts performance (say, due to added pointer indirections, function calls,
...).
or such...