WSAAsyncSelect( sock, hDlg, WM_MY_SOCKET_SELECT, FD_READ | FD_CLOSE |
FD_WRITE );
According to the MSDN documentation, FD_CLOSE should *only* be
triggered if there is any other data waiting to be read. My log data
is showing that to not be the case. I added an additional "read-
until-EOF" loop inside the FD_CLOSE handler, but now that "trailing
data transfer" is no longer async, as I'm inside a single FD_CLOSE
event at that point. I have log triggers on each event handler, and
during a file transfer of approx 27K, receive the following sequence:
13:38:10:675 FD_READ Read 1400 bytes
13:38:10:675 FD_READ Read 4096 bytes
13:38:10:675 FD_READ Read 4096 bytes
13:38:10:675 FD_CLOSE close received with 9592 bytes received
13:38:10:675 FD_CLOSE read of 4096 bytes
13:38:10:675 FD_CLOSE read of 4096 bytes
13:38:10:675 FD_CLOSE read of 4096 bytes
13:38:10:675 FD_CLOSE read of 4096 bytes
13:38:10:675 FD_CLOSE read of 1598 bytes
13:38:10:675 FD_CLOSE final read of 0 bytes, closing up...
13:38:10:675 FD_READ Read 0 bytes
13:38:10:675 FD_READ Read 0 bytes
13:38:10:675 FD_READ Read 0 bytes
13:38:10:675 FD_READ Read 0 bytes
13:38:10:675 FD_READ Read 0 bytes
(note that the FD_READ log entries are each seperate FD_READ events,
where as the FD_CLOSE logged is a single event with multiple log
entries)
So 3 FD_READS are recv()'ed, then a FD_CLOSE is received after only
9500 bytes have been recv()'ed. To work around FD_CLOSE being
triggered "early", in that same FD_CLOSE I then enter a recv() loop to
read the rest of the data, until EOF is received. Then some
trailing FD_READ triggers appear to be caused by the recv()s being
called in the FD_CLOSE handler, which retriggers them as there is
still data left in the queue to be read - of course by the time they
actually run there is no data left to be read so they just see EOF.
This works, but since theres a loop of recv()'s being done inside a
single FD_CLOSE handler, the last "bit" of transfer is no longer
async. I tried to rework the logic to instead set a "pending" close
flag on FD_CLOSE, but without those ending recv()'s in the FD_CLOSE
handler, no additional FD_READS are triggered.
I understand that Winsock 1 had a different model for all this, but it
looks like MS fixed it for Winsock2 and it *should* work as described
in the documentation.
My initial implementation had no recv() inside FD_CLOSE as described
in the documentation, as I assumed that FD_CLOSE wouldn't be asserted
while there was still pending data to be read. Perhaps "pending
data" depends on where its pending inside the network stack
buffers....
I also noted that if i increased the size of my recv() buffer,
FD_CLOSE *would* sometimes be called only after all data had been read
by FD_READ, but it wasn't deterministic.
I've read various newsgroup posts but not much is brought to light
about possible issues here. One last "winsock2 buglist" post did
have a known issue that FD_CLOSE could be triggered before all the
data had been read, but not much else had been mentioned about it.
Anyone have any experience with this issue?
> I have a socket which has been setup to use Async handlers through a
> dialog's procedure::
>
> WSAAsyncSelect( sock, hDlg, WM_MY_SOCKET_SELECT, FD_READ | FD_CLOSE |
> FD_WRITE );
>
> According to the MSDN documentation, FD_CLOSE should *only* be
> triggered if there is any other data waiting to be read. [...]
Did you mean to write "if there ISN'T any other data..."?
That would be more consistent with the documentation, as well as what your
concern appears to be.
And, I would agree that the behavior you describe looks wrong to me. You
should only get FD_CLOSE after you've gotten an FD_READ with 0 bytes
read. But, you didn't post any code, so there's no way to know for sure
whether the bug is in your code or Windows.
You should post a concise-but-complete code sample that reliably
demonstrates the problem. There's no way to comment on the behavior in a
useful way if we have to guess what your code looks like.
Pete
> You should post a concise-but-complete code sample that reliably
> demonstrates the problem. There's no way to comment on the behavior in a
> useful way if we have to guess what your code looks like.
Here's some code snippets:
// Read from socket, write to file
static int do_sockread_filewrite(SOCKET insock, HANDLE outfile, HWND
hDlg) {
BYTE abIn[4096]; // Buffer to read data into
int uLength;
uLength = recv( insock, (LPSTR) abIn, sizeof(abIn), 0 );
if (uLength>0) {
DWORD nWritten;
if ( WriteFile(hDataFile, abIn, uLength, &nWritten, NULL) &&
nWritten != (DWORD) uLength ) {
// failed, log failure.. etc
uLength=0;
} else {
// Success
lReadCount += (LONG)uLength;
}
}
return(uLength);
}
// MY_SOCKET_SELECT handler:
static void OnSocketSelect(HWND hDlg, SOCKET sock, SOCKERR
serr, SOCKEVENT sevent ) {
if (serr) {
// Log error, and close everything out)
PostMessage(hDlg, ....);
return;
}
// Handle the event
switch (event) {
case FD_READ:
uLength=do_sockread_filewrite(sock, datafile, hDlg);
Log("FD_READ", "Read %ld bytes", uLength);
if (uLength==0) {
// EOF
} else if (uLength == SOCKET_ERROR) {
// handle error
}
break;
case FD_CLOSE:
Log(FD_CLOSE, "close received with %d bytes received",
lReadCount);
// Read any remaining data on the socket
while (1) {
int bytes=do_sockread_filewrite(sock, hDatafile, hDlg);
if (bytes>0) {
Log("FD_CLOSE", " read of %d bytes", bytes);
} else if (bytes ==0) {
Log("FD_CLOSE", "final read of 0 bytes, closing up...");
break;
} else {
// error
Log("FD_CLOSE", "error value %d, winsock error=%d",
bytes, ...);
break;
}
}
// Socket should now be closed, post back to our window loop to
complete the dialog
PostMessage(hDlg, ....);
break;
}
> [...]
>> You should post a concise-but-complete code sample that reliably
>> demonstrates the problem. There's no way to comment on the behavior in
>> a
>> useful way if we have to guess what your code looks like.
>
> Here's some code snippets:
And here are some of my favorite web pages:
http://www.yoda.arachsys.com/csharp/complete.html
http://www.yoda.arachsys.com/csharp/incomplete.html
http://sscce.org/
The only obvious bug I found in the code you posted is here:
> if ( WriteFile(hDataFile, abIn, uLength, &nWritten, NULL) &&
> nWritten != (DWORD) uLength )
I think you meant:
> if ( !WriteFile(hDataFile, abIn, uLength, &nWritten, NULL) ||
> nWritten != (DWORD) uLength )
Otherwise you'll return a successful read even if you fail to write to the
file.
Of course, IMHO it's a very bad idea to be spoofing a network i/o error
(by setting the receive count to 0) just because you had a file system i/o
error. Your network code relies on the return value from this function to
mean a very specific thing, but when you do that, it doesn't mean that
very specific thing.
But at the very least, it seems like if an incorrect-length write
justifies returning 0 bytes received, then so too should an outright
failure of WriteFile().
Keep in mind that when dealing with networking code, a
concise-but-complete code sample requires code for both endpoints.
Pete
Suffice to say though in testing I'm not getting WriteFile I/O errors
so thats not the problem with the FD_CLOSE issues. My real code has
additional logging in it to point out if there was such an error from
WriteFile.
> Keep in mind that when dealing with networking code, a
> concise-but-complete code sample requires code for both endpoints.
Getting that together will take some more time. The sender side is a
different application all together and is "straight" sockets in C on
Unix. Its just a simple listen()/accept()/write() loop/close() type
program.
Thanks for responding, I guess I'll just plunk around with it some
more. I would like to build a standalone working test case to show
the issue, but I'm not sure I've got the time.
> [...]
> Thanks for responding, I guess I'll just plunk around with it some
> more. I would like to build a standalone working test case to show
> the issue, but I'm not sure I've got the time.
That's your prerogative. Just keep in mind that the quality of answers
depends a lot on the quality of questions. If you're happy hoping that
someone's seen similar behavior and that their experience applies in your
case, and that they happen to be one of the people reading your question,
then doing without a proper code sample is just fine.
My last useful input that I can think of at the moment is simply to point
out that I've written code with WSAAsyncSelect() and did not run into the
issue you're describing. I have run that code on all versions of Windows
from 95 through Vista, and it's worked fine. So, it seems to me that
you're either dealing with a bug in your own code, or you've got some
third-party thing installed on your computer that is somehow interfering
with the normal behavior.
Pete
I'm building a template test case now, at least for the receiving
side. Not sure what to do about the sender, although it might be
possible to show my issue with a simple HTTP fetch using my same
logic.
I never had an excuse to have a "template" Win32 skeleton app sitting
around, now I do. So thats something I'm getting done even if I can't
figure it out.
>
> My last useful input that I can think of at the moment is simply to point
> out that I've written code with WSAAsyncSelect() and did not run into the
> issue you're describing. I have run that code on all versions of Windows
> from 95 through Vista, and it's worked fine. So, it seems to me that
> you're either dealing with a bug in your own code, or you've got some
> third-party thing installed on your computer that is somehow interfering
> with the normal behavior.
Actually that is helpful to know and shows that the documentation
should be trusted and works as correct for others. Perhaps its that
I've got VMware Workstation installed on this development machine as
well, and its doing something funky to the network stack. More
likely its a bug with my code so building a test case will help point
fingers or at least rule things out.
Have you had to deal with *any* data being unread when FD_CLOSE is
triggered? Or in your experience FD_CLOSE never triggers (unless an
error condition) if there is still data to be read?
> [...]
> Have you had to deal with *any* data being unread when FD_CLOSE is
> triggered? Or in your experience FD_CLOSE never triggers (unless an
> error condition) if there is still data to be read?
I have only ever seen FD_CLOSE happen when the connection was reset or
shutdown gracefully. Now, I also wasn't using WSAAsyncSelect() for
transmissions where a large amount of data was being sent from one end all
at once...the application was more of a transactional thing. So there are
some functional differences between the application I wrote and what
you're apparently doing.
But even so, I still think it's more likely there's a bug or config issue
here. I think that if Winsock was _normally_ doing what you're
describing, all sorts of applications wouldn't work right (even
acknowledging that most Winsock applications probably don't use
WSAAsyncSelect()).
Pete
accept()
listen()
while data {
write(data)
}
close()
So when you say you've not seen FD_CLOSE including a graceful shutdown
I'm still reading the documentation to say that FD_CLOSE would still
be delayed until all data was recv()'ed. My logging indicates that
FD_CLOSE asserts while data is still in the queue waiting to be
read. But as you said if that was the case a lot of WSSAsync() based
apps would be borked.
I should have a test program tomorrow. I'm interested in testing it.
> My sender is not waiting for any sort of ACK after all the data is
> received, it just does:
>
> accept()
> listen()
> while data {
> write(data)
> }
> close()
No "shutdown()"? No "recv()" to block until the client also calls
"shutdown()".
I think at the very least, the sender ought to be calling "shutdown()".
If that's what the sender really looks like, then it may be the behavior
you're seeing is by design. In particular, the docs are somewhat vague,
but I read this phrase -- "when all the received data has been read if
this is a graceful close" -- to mean that FD_CLOSE is guaranteed to be
delayed until all received data has been read only in the graceful close
scenario.
Note that the docs also say "an application should check for remaining
data upon receipt of FD_CLOSE to avoid any possibility of losing data".
So while the docs say that you should get FD_CLOSE last in a graceful
close scenario, you should also code defensively, in case of other
scenarios (in case you want to get all the data...if you don't care about
that -- and in a connection reset scenario, maybe you don't -- then you
can ignore that advice).
Pete
> I think at the very least, the sender ought to be calling "shutdown()".
I had shutdown(...,2) after all write()s on the sender, but wanted to
see if there would be any difference on the client side with regards
to FD_CLOSE sequencing so I took it out. No change.
> Note that the docs also say "an application should check for remaining
> data upon receipt of FD_CLOSE to avoid any possibility of losing data".
> So while the docs say that you should get FD_CLOSE last in a graceful
> close scenario, you should also code defensively, in case of other
> scenarios (in case you want to get all the data...if you don't care about
> that -- and in a connection reset scenario, maybe you don't -- then you
> can ignore that advice).
Right, I agree it should be coded a bit more defensively, just for
this test I was marking it up quick and dirty.
I have a set of sample code ready. It shows the same behavior I was
seeing on my own app, but it does it connecting to any webserver.
Using HTTP/1.0 (non persistent connections), you see the same
behavior. A couple FD_READs, followed by a FD_CLOSE, followed by a
bunch more FD_READs. Its not the cleanest code but it shows whats
going on.
I have a copy of it in a .ZIP file if you'd like to take a look.
http://www.stup.net/corey/sample.zip Source/VC6 project/executable.
The source is also pasted at: http://rafb.net/p/0lOUGy69.html (expires
in 24 hours)
Run as follows:
sample.exe <hostname> <port>
ie:
sample.exe www.yoda.arachsys.com 80
If you don't pass arguments it segfaults. Ugh, sorry about that.
> [...]
> I have a set of sample code ready. It shows the same behavior I was
> seeing on my own app, but it does it connecting to any webserver.
> Using HTTP/1.0 (non persistent connections), you see the same
> behavior. A couple FD_READs, followed by a FD_CLOSE, followed by a
> bunch more FD_READs. Its not the cleanest code but it shows whats
> going on.
Okay...well, unfortunately (I think?) I don't see any obvious bug in the
code you posted. I tested it on my current Windows installation, which is
Windows 7, and it does exactly what you said it would.
I've got installations of 98, 2K, and XP that I can try it on later, which
I'll do when I have time just to see. But it looks like a bug to me.
What OS are you using?
I think this statement in the documentation is reasonably definitive: "Be
aware that, if data has been received and is waiting to be read when the
remote system initiates a graceful close, the FD_CLOSE is not delivered
until all pending data has been read".
I don't see any way to read that other than that you should _not_ be
getting FD_CLOSE until you reach the end of the stream, and I know that
the last time I did any Winsock stuff (admittedly, several years ago now),
it was behaving as I expected (but as I mentioned, my use case wasn't
exactly the same as yours).
All that said, I'd feel a lot more comfortable with a TRUE
concise-but-complete code sample. I think that what you've got is a
reasonable proof-of-concept, but it still has this unknown variable in it
(the third-party server to which you connect).
Anyway, that's all I've got for now. There are others who read this
newsgroup who know a lot more about the particulars of TCP/IP and TCP
specifically, and they may be able to offer some insight (and perhaps even
explain why the docs are wrong and your observed behavior is right).
Pete
The documentation explicitly allows for `FC_CLOSE' to be received before all
remaining data has been read by the application:
`The FD_CLOSE message is posted when a close indication is received for the
virtual circuit corresponding to the socket. In TCP terms, this means that
the FD_CLOSE is posted when the connection goes into the TIME WAIT or CLOSE
WAIT states. This results from the remote end performing a shutdown on the
send side or a closesocket. FD_CLOSE should only be posted after all data is
read from a socket, but an application should check for remaining data upon
receipt of FD_CLOSE to avoid any possibility of losing data.'
This tells me that if the `FD_CLOSE' was in response to a graceful close,
then the application should still continue to execute `recv' until a zero
byte receive has occurred.
Anyway, I have not ever used `WSAAsyncSelect', however, this following hack
to your code (e.g., the `OnSocketSelect' procedure) "seems" to "work" for
me:
// Async socket code begins here
VOID OnSocketSelect(HWND hWnd, SOCKET sock, SOCKERR serr, SOCKEVENT sevent)
{
char s[255];
char buf[256];
int rc;
static char status[10] = "active";
if (serr) {
AddReportEntry(hWnd, TEXT("error in async handler"));
}
switch (sevent) {
case FD_CONNECT:
AddReportEntry(hWnd, TEXT("FD_CONNECT"));
break;
case FD_READ:
AddReportEntry(hWnd, TEXT("FD_READ"));
// Receive a single buffer
rc=recv(sock, buf, sizeof(buf)-1, 0);
if (rc>0) {
// nul terminate the string so we can log it
buf[rc]=NULL;
AddReportEntry(hWnd, TEXT("FD_READ(%s): read \"%s\" (%d bytes)"), status,
buf, rc);
} else if (rc==0) {
AddReportEntry(hWnd, TEXT("FD_READ(%s): EOF read"), status);
shutdown(sock, SD_RECEIVE);
PostMessage(hWnd, WM_DO_ACTION, ACT_DISCONNECT, 0);
} else {
AddReportEntry(hWnd, TEXT("FD_READ(%s): error %d"), status,
WSAGetLastError());
shutdown(sock, SD_RECEIVE);
PostMessage(hWnd, WM_DO_ACTION, ACT_DISCONNECT, 0);
}
break;
case FD_WRITE:
AddReportEntry(hWnd, TEXT("FD_WRITE"));
// Send an HTTP request. By forcing a HTTP 1.0 request, the HTTP server
will
// hang up on us immediately after sending all the data,
// duplicating our server's behavior
sprintf(s, "GET / HTTP/1.0\r\nHost: %s\r\n\r\n",hostname);
if ((rc=send(sock, s, strlen(s), 0)) == SOCKET_ERROR) {
AddReportEntry(hWnd, TEXT("Error %d sending request %s"), rc, s );
} else {
AddReportEntry(hWnd, TEXT("Sent \"%s\" (%d bytes)"), s, strlen(s));
}
shutdown(sock, SD_SEND);
break;
case FD_CLOSE:
AddReportEntry(hWnd, TEXT("FD_CLOSE"));
// Trigger a shutdown of the socket
// PostMessage(hWnd, WM_DO_ACTION, ACT_DISCONNECT, 0);
strcpy(status, "inactive");
// Receive a single buffer
rc=recv(sock, buf, sizeof(buf)-1, 0);
if (rc>0) {
// nul terminate the string so we can log it
buf[rc]=NULL;
AddReportEntry(hWnd, TEXT("FD_READ / FD_CLOSE(%s): read \"%s\" (%d
bytes)"), status, buf, rc);
} else if (rc==0) {
AddReportEntry(hWnd, TEXT("FD_READ / FD_CLOSE(%s): EOF read"), status);
shutdown(sock, SD_RECEIVE);
PostMessage(hWnd, WM_DO_ACTION, ACT_DISCONNECT, 0);
} else {
AddReportEntry(hWnd, TEXT("FD_READ / FD_CLOSE(%s): error %d"), status,
WSAGetLastError());
shutdown(sock, SD_RECEIVE);
PostMessage(hWnd, WM_DO_ACTION, ACT_DISCONNECT, 0);
}
break;
}
}
I am always getting a single `FD_READ / FD_CLOSE(inactive): EOF' or
`FD_READ(inactive): EOF' message, and a single corresponding `Closing socket
N' message. If I comment out the single call to recv within the `FD_CLOSE'
handler, I never see any EOF or Closing socket messages.
BTW, is there any particular reason why you are using `WSAAsyncSelect'?
Also, you should not really issue any `send()' calls in response to
`FD_CONNECT'. You will automatically receive a `FD_WRITE' message which
corresponds to the `FD_CONNECT'. Only then should you send data. Think of it
as if the `FD_CONNECT' message automatically calls `send()' and gets a
return value of `WSAEWOULDBLOCK'...
> I am always getting a single `FD_READ / FD_CLOSE(inactive): EOF' or
> `FD_READ(inactive): EOF' message, and a single corresponding `Closing socket
> N' message. If I comment out the single call to recv within the `FD_CLOSE'
> handler, I never see any EOF or Closing socket messages.
Thanks for replying Chris.
What I was trying to do was make FD_READ events *read* and FD_CLOSE
events *close*, but that doesn't seem to be the case. I follow what
you're pointing out in the documentation, but it seems to imply still
that the "remaining data" after an FD_CLOSE wouldn't be the norm.
The wording "FD_CLOSE should only be posted after all data is read
from a socket but..." is a bit of a redflag I guess.
With your changes, it seems that whats happening is that when FD_CLOSE
is asserted "early", that FD_CLOSE recv() basically "primes the pump"
of another FD_READ event, if there is any more data past the FD_CLOSE
recv() that you performed. So you go back to FD_READ events - then
later when a couple more FD_READs are received you're now handling the
EOF in FD_READ, vs strictly handling it in FD_CLOSE. If you look at
your changes, your FD_READ and FD_CLOSE handlers are basically the
same thing. Both must recv(), and both must handle EOF or error as
well. As you said, this should work, its just different than I
expected from the way its described in the documentation.
> BTW, is there any particular reason why you are using `WSAAsyncSelect'?
It seemed like the cleanest approach to a single thread based async
socket i/o, tied through a window/dialog's message pump. I needed my
code to work on Win9x and up as well.
Thanks again!
>> BTW, is there any particular reason why you are using `WSAAsyncSelect'?
> It seemed like the cleanest approach to a single thread based async
> socket i/o, tied through a window/dialog's message pump. I needed my
> code to work on Win9x and up as well.
Just my $0.02 here.. You are asking for the 2.2 interface in your code
example and there is MsgWaitForMultipleObjects()
http://msdn.microsoft.com/en-us/library/ms684242(VS.85).aspx
MS says minimum is Win2K, but it isn't true. All 32-bit including
win32s. That would make a nicer message pump for you so you can use
overlapped sockets with event notification.
MsgWaitForMultipleObjects() documentation doesn't indicate that it can
wait on a socket directly, so would one associate an event with
WSAEventSelect() and then use MsgWaitForMultipleObjects() on that
event? Or am I missing something?
Actually I'm doing something similar to that in a WinCE version of my
application. Since WSAAsyncSelect() doesn't exist on CE, I created
my own version of it. I create an thread that creates an event with
WSAEventSelect()'s that socket. The thread then
WaitForMultipleObjects() on a "time to die" event and the socket
event. For any of the FD_XXX events received on the socket event,
the thread PostMessage()'s back to the window/dialog that created
it. It uses the same wparam/lparam protocol that WSAAsyncSelect()
uses. The dispatched routine through the window/dialog's pump is
unchanged, it looks just like WSAAsyncSelect()'s events.
No, you got it: WSAEventSelect()
> Actually I'm doing something similar to that in a WinCE version of my
> application. Since WSAAsyncSelect() doesn't exist on CE, I created
> my own version of it. I create an thread that creates an event with
> WSAEventSelect()'s that socket. The thread then
> WaitForMultipleObjects() on a "time to die" event and the socket
> event. For any of the FD_XXX events received on the socket event,
> the thread PostMessage()'s back to the window/dialog that created
> it. It uses the same wparam/lparam protocol that WSAAsyncSelect()
> uses. The dispatched routine through the window/dialog's pump is
> unchanged, it looks just like WSAAsyncSelect()'s events.
If this was me, I wouldn't go that way. Your pump gets a bit different
and you handle the socket events right there. But that's just me.
> On Mon, 26 Jan 2009 17:13:32 -0800, Corey Stup <Core...@gmail.com>
> wrote:
>
>> [...]
>> I have a set of sample code ready. It shows the same behavior I was
>> seeing on my own app, but it does it connecting to any webserver.
>> Using HTTP/1.0 (non persistent connections), you see the same
>> behavior. A couple FD_READs, followed by a FD_CLOSE, followed by a
>> bunch more FD_READs. Its not the cleanest code but it shows whats
>> going on.
>
> Okay...well, unfortunately (I think?) I don't see any obvious bug in the
> code you posted. I tested it on my current Windows installation, which
> is Windows 7, and it does exactly what you said it would.
>
> I've got installations of 98, 2K, and XP that I can try it on later,
> which I'll do when I have time just to see. But it looks like a bug to
> me. What OS are you using?
Not sure if you're still following this, Corey, but just a quick
follow-up...
I got around to testing this on my other computers, except for Win98. I
couldn't figure out the correct modifications to make to the VS2008
default build settings to let the executable run on Win98; I always got a
dialog saying that the executable was built for a newer version of the OS
and that I should upgrade.
But, for 2K and XP, in addition to 7, I was able to confirm the behavior.
It seems pretty clear that at least for Vista I'd have gotten the same
results, if I'd tried it on that.
So, the fact that I never noticed that behavior must have entirely to do
with the way my program uses the network as opposed to a "lengthy"
download as you'd get from an HTTP server, for example.
I still agree that it doesn't match what the docs say. But I don't know
which is wrong...the behavior or the docs. Given how long it's been in
there, I'd say that at this point even if it was a bug in the OS, it's
probably really a bug in the docs now. :)
Pete
Well, AFAICT, the docs clearly state that `FD_CLOSE' can indeed be
encountered while there is still more data to be read:
"The FD_CLOSE message is posted when a close indication is received for the
virtual circuit corresponding to the socket. In TCP terms, this means that
the FD_CLOSE is posted when the connection goes into the TIME WAIT or CLOSE
WAIT states. This results from the remote end performing a shutdown on the
send side or a closesocket. FD_CLOSE should only be posted after all data is
read from a socket, but an application should check for remaining data upon
receipt of FD_CLOSE to avoid any possibility of losing data."
pay attention to the last sentence. I hacked Corey's sample application into
a form that works well for me. Issuing a final recv upon reception of
`FD_CLOSE' sure seems to do the trick, and it takes the docs advise as well;
that's probably why it "seems" to work...
Also, one needs to determine if `FD_CLOSE' was issued by a graceful shutdown
condition or not. If it was indeed graceful, well, the docs basically tell
you to post an additional recv.
> "Peter Duniho" <NpOeS...@nnowslpianmk.com> wrote in message
> news:op.uolwx...@macbook-pro.local...
>> [...]
>> I still agree that it doesn't match what the docs say. But I don't
>> know which is wrong...the behavior or the docs. Given how long it's
>> been in there, I'd say that at this point even if it was a bug in the
>> OS, it's probably really a bug in the docs now. :)
>
> Well, AFAICT, the docs clearly state that `FD_CLOSE' can indeed be
> encountered while there is still more data to be read:
I feel the docs _also_ clearly state that for this to happen is a bug.
> "The FD_CLOSE message is posted when a close indication is received for
> the virtual circuit corresponding to the socket. In TCP terms, this
> means that the FD_CLOSE is posted when the connection goes into the TIME
> WAIT or CLOSE WAIT states. This results from the remote end performing a
> shutdown on the send side or a closesocket. FD_CLOSE should only be
> posted after all data is read from a socket, but an application should
> check for remaining data upon receipt of FD_CLOSE to avoid any
> possibility of losing data."
>
> pay attention to the last sentence. I hacked Corey's sample application
> into a form that works well for me. Issuing a final recv upon reception
> of `FD_CLOSE' sure seems to do the trick, and it takes the docs advise
> as well; that's probably why it "seems" to work...
I read all that. I still think it's a bug, either in the docs or the
library.
Even the text you quoted specifically says "FD_CLOSE should only be posted
after all data is read from a socket", which is clearly not happening
here. Immediately after that, the docs say "Be aware that the application
will only receive an FD_CLOSE message to indicate closure of a virtual
circuit, and only when all the received data has been read if this is a
graceful close". So, for a graceful close (i.e. shutdown() was called),
"ONLY when all the received data has been read" is the condition for
getting FD_CLOSE.
Elsewhere in the same page, it says "After remote system initiated
graceful close, when no data currently available to receive (Be aware
that, if data has been received and is waiting to be read when the remote
system initiates a graceful close, the FD_CLOSE is not delivered until all
pending data has been read)", which seems pretty black & white to me (not
only do they specifically say you won't get FD_CLOSE until no data is
available to receive, they even put a parenthetical warning reiterating
the point).
> Also, one needs to determine if `FD_CLOSE' was issued by a graceful
> shutdown condition or not. If it was indeed graceful, well, the docs
> basically tell you to post an additional recv.
The closest they come is to essentially say "the provider _should_ only
post FD_CLOSE once all the data's been read, but just in case, you ought
to do the defensive thing and check for more data anyway". It's true that
calling recv() kick-starts the FD_READ pump again (since one of the
triggering conditions is to call recv() and have more data remaining than
is returned in that call). So your work-around is reasonable.
But that's a long way from saying that the behavior is as documented.
Pete
> Elsewhere in the same page, it says "After remote system initiated
> graceful close, when no data currently available to receive (Be aware
> that, if data has been received and is waiting to be read when the remote
> system initiates a graceful close, the FD_CLOSE is not delivered until all
> pending data has been read)", which seems pretty black & white to me (not
> only do they specifically say you won't get FD_CLOSE until no data is
> available to receive, they even put a parenthetical warning reiterating
> the point).
I think the reason for the apparent conflict is that here they are
describing how their own stack generates FD_CLOSE and elsewhere they
are explaining FD_CLOSE itself.
DS
Well, even ignoring that an API document really should only talk about the
specification of the API, not implementation details (unless those
implementation details are in fact part of the specification), as I
pointed out in my previous reply, the text you're quoting me as quoting is
not the only place they say that. In fact, the "elsewhere" to which you
are presumably referring includes a statement to pretty much the same
effect.
As I said: there's a bug here somewhere. Either the docs are wrong or the
implementation is. At this point, the implementation has stood for so
long that IMHO that makes it likely the de facto standard. But one way or
the other Microsoft needs to decide and fix _something_.
Pete
I'm still following. I ended up doing basically the same thing Chris
recommended, and my FD_CLOSE and FD_READ event handlers are the same
thing now.
case FD_READ:
case FD_CLOSE:
rc=recv(...);
if (rc > 0) { // handle data
} else if (rc==0) { // signal EOF;
} else { // handle error
}
This way even though the FD_CLOSE comes "early", the corresponding recv
() not only processes data, but primes the pump for more FD_READ
events to occur, until EOF is read. This works and is handled
properly.
> I still agree that it doesn't match what the docs say. But I don't know
> which is wrong...the behavior or the docs. Given how long it's been in
> there, I'd say that at this point even if it was a bug in the OS, it's
> probably really a bug in the docs now. :)
I agree with you. While I concede that the docs mention that one
must be prepared to recv() data even after an FD_CLOSE, what stood out
to me was the eariler statement about FD_CLOSE not asserting while
data was still waiting to be read. At best, I found it misleading.
Perhaps a reference implementation provided in MSDN would clarify what
they meant to say.
I appreciate everyones input on the matter!
I'm not really sure how to implement MsgWaitForMultipleObjects() when
I've got nested message pumps going, and can have more than one async
socket being processed at the same time.
In my implementation, I have a main window that has an async socket.
To do a download, that window starts a dialog (via DialogBoxParam()),
which then creates another socket with its own async handlers.
In some of the references of MsgWaitForMultipleObjects() I found
online, it seemed to be 50-50 about whether it was a good idea to use
it for event sockets, as it required some semi-major surgery to all
the message pumps.
I tried your approach and it doesn't work. You will not always get
informed about EOF. Try different buffer sizes and you will see what i
mean.
My solution was:
case FD_READ:
{
if(eofReached)
{
break;
}
char buffer[bufferSize];
int res = recv(sock, buffer, sizeof(buffer), 0);
std::cout.write(buffer, res);
}
break;
case FD_CLOSE:
{
WSAAsyncSelect(sock, windowHandle, 0, 0);
for(;;)
{
char buffer[bufferSize];
int res = recv(sock, buffer, sizeof(buffer), 0);
if(res > 0)
{
std::cout.write(buffer, res);
}
else if(res == 0)
{
std::cout << "EOF\n";
eofReached = true;
closesocket(sock);
break;
}
}
}
break;
> I tried your approach and it doesn't work. You will not always get
> informed about EOF.
I think you failed to flesh out his example correctly.
> Try different buffer sizes and you will see what i
> mean.
>
> My solution was:
>
> case FD_READ:
> {
> if(eofReached)
> {
> break;
> }
>
> char buffer[bufferSize];
> int res = recv(sock, buffer, sizeof(buffer), 0);
> std::cout.write(buffer, res);
> }
> break;
> case FD_CLOSE:
> {
> WSAAsyncSelect(sock, windowHandle, 0, 0);
>
> for(;;)
> {
> char buffer[bufferSize];
> int res = recv(sock, buffer, sizeof(buffer), 0);
>
> if(res > 0)
> {
> std::cout.write(buffer, res);
> }
> else if(res == 0)
> {
> std::cout << "EOF\n";
> eofReached = true;
> closesocket(sock);
> break;
> }
> }
> }
> break;
Your code is just his bastardized a bit and fleshed out. Your code is
a special case of his code, where the commented code is replaced with
what his comments say the code should do and FD_CLOSE and FD_READ are
separated just to add extra fragility. (For example, his code will
work even if it does not get an FD_CLOSE, yours will not, since you
don't check the return value from 'recv' in the FD_READ case.)
DS
Humm... What exactly does different buffer sizes have to do with anything
wrt this subject at hand?
> My solution was:
> [...]
Why are you calling `recv()' in `FD_CLOSE' multiple times? You only really
need to call it once in response to a `FD_CLOSE' generated by a graceful
shutdown.
I will post an example this weekend.
> The problem is that a FD_READ event will NOT be posted when recv would
> return zero.
So what? His code treated FD_READ and FD_CLOSE the same. In either
case, he'll call 'recv', and if it returns zero, treat it as a normal
connection close. And if neither a FD_READ or FD_CLOSE is generated,
your code won't work either.
DS
> Why are you calling `recv()' in `FD_CLOSE' multiple times? You only really
> need to call it once in response to a `FD_CLOSE' generated by a graceful
> shutdown.
The problem is that some providers, non-Microsoft providers, will
issue an FD_CLOSE while there is still data left to be received. If
you just call 'recv' once, you'll get the data. No new events will
ever be generated for the socket because its closure has already been
reported.
This is a required workaround to a documented bug.
DS
http://www.nopaste.com/p/abrvoacufb
If you compile it (you can use the project I made with the sample.zip
setup I posted earlier in this thread), and run in the same manner,
you get the correct behavior. FD_CLOSE() happens "early", but
because you're processing the data on every recv() (in this case, just
logging it), and then do your EOF/error processing if EOF or error
happens on EITHER FD_READ or FD_CLOSE, it works as expected.
Can you remember where you read that particular documentation?
Thanks David.
> > The problem is that some providers, non-Microsoft providers, will
> > issue an FD_CLOSE while there is still data left to be received. If
> > you just call 'recv' once, you'll get the data. No new events will
> > ever be generated for the socket because its closure has already been
> > reported.
> > This is a required workaround to a documented bug.
> Can you remember where you read that particular documentation?
You can find it earlier in this thread.
"[A]n application should check for remaining data upon receipt of
FD_CLOSE to avoid any possibility of losing data."
FD_CLOSE has no re-enabling function. If an implementation sends you
an FD_CLOSE before all data has been read, which the documentation
specifically allows, there is no documented way to re-arm the read
engine to send you a new indication.
DS
I tried your example and it has the problem! Try a buffer size of 3
and do multiple tests. Sometimes you will get EOF, sometimes you won't
get it.
Indeed. Well, you have to call `recv()' in a loop in response to `FD_CLOSE'.
shi% happens! Now, I am glad I never used `WSAAsyncSelect()'.
Anyway, try this one:
http://www.nopaste.com/p/a7adzy44j
I cannot get it to fail on my end. Can you?
Yup; your right and I am wrong. I thought that one could reliably re-enable
the pump. Here is a hack to Corey's code:
http://www.nopaste.com/p/a7adzy44j
Unfortunately, it calls `recv()' in a loop during `FD_CLOSE' handler.
My apologies to the OP!
;^(...
I does not work all the time. It seemed to work, but, has random failures.
Ugh. I'm seeing the same thing. I thought the pump was being
reprimed reliably as well, and it seems to be as long as the buffers
are being emptied fast enough. Using the small buffer size does
indeed show that there is loss of (at least one) FD_READ event that
would detect the final EOF. And of course reading the remaining data
in a single FD_CLOSE handler isn't async...
So much for that. Anyone have any other ideas? A separate thread
with a normal select()?
> [...]
> So much for that. Anyone have any other ideas? A separate thread
> with a normal select()?
That's certainly one reasonable alternative, especially if you are dealing
with 64 or fewer sockets. Or you could go whole-hog and use IOCP.
Of course, you could just change your code so that it drains the socket in
response to FD_CLOSE. Rather than relying on Winsock to post the FD_READ
notifications, just post them yourself until recv() returns 0, after
you've seen FD_CLOSE (you can call WSAAsyncSelect() again when you get
FD_CLOSE to disable FD_READ notifications, so that you don't get duplicate
messages).
Do you have the option of using .NET? The .NET Socket API is actually
_really_ nice (IMHO, of course), and even wraps up IOCP for you for the
asynchronous version.
Pete
> On Feb 3, 5:49 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
Can you clarify what you mean by "non-Microsoft providers"? I was able to
reproduce the issue on three different operating system versions, and had
I bothered to figure out what I needed to fix in my VS project settings to
get a compiled version that runs on Win98, it'd probably be four.
AFAIK, on every single one of those systems, the only Winsock
implementation is the one provided by Microsoft.
Pete
> Of course, you could just change your code so that it drains the socket in
> response to FD_CLOSE. Rather than relying on Winsock to post the FD_READ
> notifications, just post them yourself until recv() returns 0, after
> you've seen FD_CLOSE (you can call WSAAsyncSelect() again when you get
> FD_CLOSE to disable FD_READ notifications, so that you don't get duplicate
> messages).
Right. I may go that route just to make it work, but then we're no
longer async and my UI thread doesn't get updated (think: progress bar
of transfer). When transferring say a 3MB file, if FD_CLOSE didn't
come until the last couple recv()'s, that wouldn't be that big of a
deal, but I'm seeing it triggered *way* early in the stream. Since
I'm also controlling the sender, I could always put an artificial
delay after all the writes() complete before it closes the socket, but
thats a poor workaround.
> Do you have the option of using .NET? The .NET Socket API is actually
> _really_ nice (IMHO, of course), and even wraps up IOCP for you for the
> asynchronous version.
Unforunately no. I'm at the 11th hour to get this completed and
porting to .NET isn't an option currently. Plus there are .NET
distribution issues on Win98 and Win2K (and even XP). Oh, and don't
even get me started on .NET CF (for CE).
Where do you see the problem? When FD_CLOSE is posted no new data will
come in you only have to drain the internal receive buffer once.
As far as i can see you used the same workaround which i already
posted before. ;-)
The issue is the recv() loop inside FD_CLOSE is no longer posting
async "chunks" for progress indication back to my GUI thread. So in
my tests of transferring say a 200K file, I'm getting FD_CLOSE with
about 60-100K left in the transfer. That last batch of recv()s isn't
allowing the message pump to run, so the GUI looks hung.
> [...]
>> Of course, you could just change your code so that it drains the socket
>> in
>> response to FD_CLOSE. Rather than relying on Winsock to post the
>> FD_READ
>> notifications, just post them yourself until recv() returns 0, after
>> you've seen FD_CLOSE (you can call WSAAsyncSelect() again when you get
>> FD_CLOSE to disable FD_READ notifications, so that you don't get
>> duplicate
>> messages).
>
> Right. I may go that route just to make it work, but then we're no
> longer async and my UI thread doesn't get updated (think: progress bar
> of transfer).
I'm not sure why you say "we're no longer async". My proposal is to
_post_ FD_READ messages yourself, once you've seen an FD_CLOSE (I'm
thinking static local variable here, but just set a flag
somewhere...doesn't really matter where). In other words, simulate
exactly the behavior that recv() ought to do on its own. It's not clear
to me why Winsock doesn't continue to trigger FD_READ messages until
there's no more data to read (that's how I interpret the concerns about
the work-arounds posted so far...sounds like sometimes it fails to), but
it seems to me you can fix that by adding code to make sure that they are
posted as required.
Doing it that way, you would continue to go through the thread's message
queue for notifications; it's just that it's your code posting the
notifications instead of Winsock.
The only part of my suggestion that would remove the "async" behavior is
the suggestion to call WSAAsyncSelect() to remove FD_READ notifications
(and _only_ FD_READ notifications, though at that point that's likely to
be the only relevant ones left anyway). But that's just to get Winsock to
stop trying to notify you; you'll still be notifying yourself, in the same
way Winsock would have, asynchronously (inasmuch as we can call anything
that always happens on the same thread asynchronous :) ).
Pete
Is this what you mean?
http://www.nopaste.com/p/avzgDbJ0T
This does have the side effect of making the socket event handler have
to know what the message type (WM_SOCKET_SELECT) is (for its own call
to WSAAsyncSelect and PostMessage), so the same code can't be reused
for multiple message types.
In testing, I'm getting two FD_READ EOF's at the end for some reason.
Not sure what thats about. I also noted that when calling
WSAAsyncSelect() in FD_CLOSE with all the flags *except* FD_READ, you
get another FD_SEND/FD_CONNECT event again. So for this test I just
clear all flags to disable all "winsock generated" events.
It also "feels" kinda kludgy to, but as you said forces the trailing
recvs() to go back through the dialog's procedure. I guess there
are some tradeoffs here...
I think this is basically the same thing as what you posted, but
without duplicating the processing code in the FD_READ/FD_CLOSE
handlers.
http://www.nopaste.com/p/aHetrnBsS
This does function as expected, except for forcing the FD_CLOSE recv
()'s to be non-async. I'm not sure I like the idea of requiring a
static/global flag though, as that limits this code to a single async
socket instance per handler. My code could have multiple download
dialogs at the same time...
> Is this what you mean?
>
> http://www.nopaste.com/p/avzgDbJ0T
Pretty much.
> This does have the side effect of making the socket event handler have
> to know what the message type (WM_SOCKET_SELECT) is (for its own call
> to WSAAsyncSelect and PostMessage), so the same code can't be reused
> for multiple message types.
You could always just pass the message ID into the handler too. That's a
common enough idiom for generel-purpose message handler functions.
> In testing, I'm getting two FD_READ EOF's at the end for some reason.
> Not sure what thats about.
Probably because Winsock posted an FD_READ along with the FD_CLOSE, before
you get a chance to disable notifications. That puts an extra FD_READ
into the message queue.
I don't know whether it's safe for you to assume it will always do that.
Seems simple enough to just ignore any FD_READs for the socket after
you've seen the end-of-stream.
> I also noted that when calling
> WSAAsyncSelect() in FD_CLOSE with all the flags *except* FD_READ, you
> get another FD_SEND/FD_CONNECT event again. So for this test I just
> clear all flags to disable all "winsock generated" events.
Ah, yes. That makes sense. At least that behavior's documented. :) I
agree that disabling async-select altogether should work fine. You are
basically taking over from Winsock at this point.
> It also "feels" kinda kludgy to, but as you said forces the trailing
> recvs() to go back through the dialog's procedure. I guess there
> are some tradeoffs here...
Well, I agree it's kludgy. But, you're working around a bug. Work-around
are, in my experience, always kludgy. :)
Pete
> [...] I'm not sure I like the idea of requiring a
> static/global flag though, as that limits this code to a single async
> socket instance per handler. My code could have multiple download
> dialogs at the same time...
The flag issue exits in my proposal too. It's easily addressed by instead
of having a single flag, maintaining a collection of flags associated with
the socket.
Ideally, you already have a mechanism for associating the socket with some
data structure describing the state of the connection (e.g. where to put
the bytes you've read from the socket, that sort of thing). You can just
put the flag in there, rather than making it global.
Pete
case FD_CLOSE:
case FD_READ:
{
char buffer[BUFFER_SIZE];
int res = recv(sock, buffer, sizeof(buffer), 0);
if(res == 0)
{
closesocket(sock);
std::cout << "EOF\n";
break;
}
else if(res > 0)
{
std::cout.write(buffer, res);
}
else
{
MessageBox(NULL, "recv failed!", "", MB_OK);
break;
}
if(WSAGETSELECTEVENT(lParam) == FD_CLOSE)
{
WSAAsyncSelect(sock, windowHandle, WM_SOCKET_EVENT,
FD_CLOSE);
}
}
break;
Please report if you find a case where the code fails.
Yes, exactly; it seems to work okay.
;^)
Although, it really does suck that we have to work around a bug like this in
the first place. Humm, it does not sound as it its fixed in Windows 7
either; I have not downloaded the beta yet.
Also, to the OP, I think the idea of posting custom messages that represent
a `FD_READ_AFTER_FD_CLOSE' condition is a good one. It should solve the
phantom GUI stall problem. I say phantom because a user thinks the
application is hung-up doing nothing, when in reality its churning right
along perfectly fine receiving the "remaining" data after an `FD_CLOSE'
message...
To the OP, how are you storing per-socket meta-data? Are you using
per-window storage area, or using a hash collection with socket to meta-data
mapping? You can easily store the "close" flag on a per-socket basis to
ignore any possible remaining `FD_READ' messages after you disable
everything with `WSAAsyncSelect(socket, hwnd, 0, 0)'.
You should push for another design in future projects. I know that you can
create a crude emulation of IOCP using `WSAEventSelect()'. I did this in an
old windows server framework I used to support for NT. With the emulation
layer, it ran on Windows CE perfectly fine, and would also run and scale
well on NT using native IOCP. I am in the process of creating a new
scaleable IOCP framework. Not sure when I can present anything, but I will
let this group know. It will work on NT and CE, win9x. I have to make some
more design decisions. Here is one idea I am thinking of incorporating into
the system:
http://groups.google.com/group/alt.winsock.programming/browse_frm/thread/f06ce523e5d3bd11
> [...]
> I got around to testing this on my other computers, except for Win98. I
> couldn't figure out the correct modifications to make to the VS2008
> default build settings to let the executable run on Win98; I always got
> a dialog saying that the executable was built for a newer version of the
> OS and that I should upgrade.
>
> But, for 2K and XP, in addition to 7, I was able to confirm the
> behavior. It seems pretty clear that at least for Vista I'd have gotten
> the same results, if I'd tried it on that.
>
> So, the fact that I never noticed that behavior must have entirely to do
> with the way my program uses the network as opposed to a "lengthy"
> download as you'd get from an HTTP server, for example.
Okay, maybe I'll win an award for longest delay before a relevant
follow-up, but...
I wasn't able to run the program on Win98 because I was using VS2008,
which doesn't produce executables compatible with Win98. I eventually got
around to compiling using VS2005 (which is still compatible with Win98),
and tested the program. And in stark contrast to what we've found on
other platforms, it does _not_ send the FD_CLOSE message until the last
byte has actually been read from the stream.
In other words, on Win98 it works just like you'd expect. It's the NT
code-base that has the "problem" (I think it's a real problem, but I put
the word in quotes because I think we've hashed that part of the question
out enough at this point :) ). The Win9x implementation is fine.
Anyway, just thought y'all would like to know. :)
Pete