Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to handle newline character(s) in a TCP server

2,421 views
Skip to first unread message

InterFan

unread,
Mar 29, 2001, 1:34:06 AM3/29/01
to

I want to design a TCP server. It just receives a line from client and
handles it. Then, it sends the response back to the client. The
problem is that a line of message may end with ASCII characeter 13, 10
or 13+10. I use blocking I/O function. How can I handle these
conditions smoothly.

Thank a lot.

Best Regards,
Chuan He

Michel Bardiaux

unread,
Mar 29, 2001, 4:26:20 AM3/29/01
to

If '\r' ends a line, then the sequence '\r''\n' is ambiguous, is it one
or 2 lines, the 2nd one being empty?

--
Michel Bardiaux
Peaktime Belgium S.A. Rue Margot, 37 B-1457 Nil St Vincent
Tel : +32 10 65.44.15 Fax : +32 10 65.44.10

Kasper Dupont

unread,
Mar 29, 2001, 7:52:40 AM3/29/01
to
Michel Bardiaux wrote:
>
> InterFan wrote:
> >
> > I want to design a TCP server. It just receives a line from client and
> > handles it. Then, it sends the response back to the client. The
> > problem is that a line of message may end with ASCII characeter 13, 10
> > or 13+10. I use blocking I/O function. How can I handle these
> > conditions smoothly.
> >
> > Thank a lot.
> >
> > Best Regards,
> > Chuan He
>
> If '\r' ends a line, then the sequence '\r''\n' is ambiguous, is it one
> or 2 lines, the 2nd one being empty?
>

I have not seen any programs using ascii 13 as a line break.
But even that could be handled, I think you can assume that
a single client always uses the same sequence.

I think this approach would work for almost anything:

int mygetchar()
{
static int state=0;
while (1) {
int c=getchar();
switch(c) {
case 10:
if (state==13) state=0;
else { state=10; return 10; }
break;
case 13:
if (state==10) state=0;
else { state=13; return 10; }
break;
default:
state=0;
return c;
}
}
}

The next problem is how to respond to the client,
you can either respond with the same linebreak as
the client, or you can choose a fixed style.

--
Kasper Dupont

Terran Melconian

unread,
Mar 29, 2001, 9:35:41 AM3/29/01
to
In article <3AC33018...@daimi.au.dk>,

Kasper Dupont <kas...@daimi.au.dk> wrote:
>I have not seen any programs using ascii 13 as a line break.

I believe MacIntosh programs do that, though it's been a long time and
I could be remembering wrong.

10 (\n): UNIX
13,10 (\r\n): Microsoft
13 (\r): Macintosh

Lew Pitcher

unread,
Mar 29, 2001, 10:14:59 PM3/29/01
to
InterFan wrote:
>
> I want to design a TCP server. It just receives a line from client and
> handles it. Then, it sends the response back to the client. The
> problem is that a line of message may end with ASCII characeter 13, 10
> or 13+10. I use blocking I/O function. How can I handle these
> conditions smoothly.

Well, why don't you
a) treat both <cr> and <lf> as end-of-line characters.
That way lines that end with <cr> only will be
handled correctly, as will lines that end with <lf>
only.

b) as a special case, if <lf> was preceeded by <cr>,
don't count the <lf> as a line, but silently discard it
That way, <cr><lf> is treated as a special <cr>
terminated line, and not as a <cr> terminated line
followed by an empty <lf> terminated line.

Of course, this will take a bit of state programming.

--
Lew Pitcher

Master Codewright and JOAT-in-training
Registered Linux User #112576

Andrew Gierth

unread,
Mar 30, 2001, 6:54:40 AM3/30/01
to
>>>>> "InterFan" == InterFan <hechu...@sina.com> writes:

InterFan> I want to design a TCP server. It just receives a line from
InterFan> client and handles it. Then, it sends the response back to
InterFan> the client. The problem is that a line of message may end
InterFan> with ASCII characeter 13, 10 or 13+10. I use blocking I/O
InterFan> function. How can I handle these conditions smoothly.

the best solution is to define your protocol as using a specific line
terminator (the standard for Internet protocols is CR+LF) and require
all clients to send in that format.

--
Andrew.

comp.unix.programmer FAQ: see <URL: http://www.erlenstar.demon.co.uk/unix/>
or <URL: http://www.whitefang.com/unix/>

Kasper Dupont

unread,
Apr 1, 2001, 5:40:47 AM4/1/01
to
Andrew Gierth wrote:
>
> >>>>> "InterFan" == InterFan <hechu...@sina.com> writes:
>
> InterFan> I want to design a TCP server. It just receives a line from
> InterFan> client and handles it. Then, it sends the response back to
> InterFan> the client. The problem is that a line of message may end
> InterFan> with ASCII characeter 13, 10 or 13+10. I use blocking I/O
> InterFan> function. How can I handle these conditions smoothly.
>
> the best solution is to define your protocol as using a specific line
> terminator (the standard for Internet protocols is CR+LF) and require
> all clients to send in that format.
>

If you are designing a new protocol it would be best to
specify one sequence that is the only way a linebreak is
ever sent. In that case I would prefer using a single
character instead of two, the character 10 is the standard
line break on Linux/Unix systems.

OTOH if you are implementing an existing protocol you
might have to talk to all kinds of existing lousy
implementations. If you want to be able to talk to
everybody, you must be able to handle all kinds of strange
input. Of cause none of your smart tricks to talk with a
lousy implementation may course your program to fail when
talking to someone who obeys the standard.

--
Kasper Dupont

David Schwartz

unread,
Apr 1, 2001, 3:45:00 PM4/1/01
to

Kasper Dupont wrote:

> If you are designing a new protocol it would be best to
> specify one sequence that is the only way a linebreak is
> ever sent. In that case I would prefer using a single
> character instead of two, the character 10 is the standard
> line break on Linux/Unix systems.

Sadly, the 'Internet standard line ending' is a carriage return
followed by a newline. So you probably should use it even if designing a
new protocol.

DS

anthony stuckey

unread,
Apr 1, 2001, 4:46:07 PM4/1/01
to

Why is that sad? It's the correct way to do this.
--
Anthony Stuckey stu...@uiuc.edu
System Administrator, students.uiuc.edu

David Schwartz

unread,
Apr 1, 2001, 9:24:37 PM4/1/01
to

anthony stuckey wrote:

> > Sadly, the 'Internet standard line ending' is a carriage return
> >followed by a newline. So you probably should use it even if designing a
> >new protocol.
>
> Why is that sad? It's the correct way to do this.

It's sad for three reasons:

1) It wastes a byte.

2) It complicates the process of locating and processing line endings.

3) It adds an extra character that can't legally be part of a line.

DS

Andrew Gierth

unread,
Apr 1, 2001, 9:43:45 PM4/1/01
to
>>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

>>> Sadly, the 'Internet standard line ending' is a carriage return
>>> followed by a newline. So you probably should use it even if
>>> designing a new protocol.

>> Why is that sad? It's the correct way to do this.

David> It's sad for three reasons:
David> 1) It wastes a byte.
David> 2) It complicates the process of locating and processing line endings.
David> 3) It adds an extra character that can't legally be part of a line.

4) it causes interminable arguments about what to do about broken
clients that only send LF without CR.

Joe Pfeiffer

unread,
Apr 1, 2001, 11:04:40 PM4/1/01
to
David Schwartz <dav...@webmaster.com> writes:

4) It confuses the logical notion of ``start a new line'' with sending
commands to a device rendering the text.
--
Joseph J. Pfeiffer, Jr., Ph.D. Phone -- (505) 646-1605
Department of Computer Science FAX -- (505) 646-1002
New Mexico State University http://www.cs.nmsu.edu/~pfeiffer
SWNMRSEF: http://www.nmsu.edu/~scifair

Kasper Dupont

unread,
Apr 2, 2001, 3:29:34 AM4/2/01
to
Joe Pfeiffer wrote:
>
> David Schwartz <dav...@webmaster.com> writes:
>
> > anthony stuckey wrote:
> >
> > > > Sadly, the 'Internet standard line ending' is a carriage return
> > > >followed by a newline. So you probably should use it even if designing a
> > > >new protocol.
> > >
> > > Why is that sad? It's the correct way to do this.
> >
> > It's sad for three reasons:
> >
> > 1) It wastes a byte.
> >
> > 2) It complicates the process of locating and processing line endings.
> >
> > 3) It adds an extra character that can't legally be part of a line.
>
> 4) It confuses the logical notion of ``start a new line'' with sending
> commands to a device rendering the text.

This is exactly the reasons why I don't think you should
use two character line breaks in new protocols.

TCP is a binary stream protocol, you are free to implement
whatever text stream you want on top of that. Two char
line breaks should only be used if you need to be
compatible with existing software.

The two char line breaks are not used everywhere on the
internet. When I fetch a page from my webserver it sends
only 10 as line break.

--
Kasper Dupont

Villy Kruse

unread,
Apr 2, 2001, 6:57:01 AM4/2/01
to
On Mon, 02 Apr 2001 07:29:34 +0000, Kasper Dupont <kas...@daimi.au.dk> wrote:

>
>The two char line breaks are not used everywhere on the
>internet. When I fetch a page from my webserver it sends
>only 10 as line break.
>

Actualy it IS used everywhere on the internet.

The http as well as ftp, smtp, and telnet mandates CRLF line terminators
in the protocol. For ASCII mode ftp that also includes the data files,
which is why the binary/ascii setting is important for a FTP transfer.
The web contents are usualy html where any CR and LF is stripped before
display, and image files which are transfered as is in binary form.
Any mail message is translated to CRLF format before being transferred
via the smtp protocol, and is then up to the receiving system to translate
that to whatever format is suitable for this system.


Villy

Andrew Gierth

unread,
Apr 2, 2001, 7:54:13 AM4/2/01
to
>>>>> "Kasper" == Kasper Dupont <kas...@daimi.au.dk> writes:

Kasper> The two char line breaks are not used everywhere on the
Kasper> internet. When I fetch a page from my webserver it sends only
Kasper> 10 as line break.

HTTP relaxes the CRLF requirements for the contents of text entity
bodies only; protocol elements such as header lines must still use
CR+LF.

David Schwartz

unread,
Apr 2, 2001, 3:02:17 PM4/2/01
to

Kasper Dupont wrote:

> This is exactly the reasons why I don't think you should
> use two character line breaks in new protocols.
>
> TCP is a binary stream protocol, you are free to implement
> whatever text stream you want on top of that. Two char
> line breaks should only be used if you need to be
> compatible with existing software.
>
> The two char line breaks are not used everywhere on the
> internet. When I fetch a page from my webserver it sends
> only 10 as line break.

The relevant standard says, "HTTP/1.1 defines the sequence CR LF as the
end-of-line marker for all protocol elements except the entity-body (see
appendix 19.3 for tolerant applications). The end-of-line marker within
an entity-body is defined by its associated media type, as described in
section 3.7." So you MUST use a CRLF every place the standard mandates
an end-of-line sequence, such as after each header and between the
header and the enetity-body.

DS

0 new messages