MacOS error reporting results

8 views
Skip to first unread message

Greg Childers

unread,
Nov 10, 2021, 7:55:57 PM11/10/21
to Boinc Projects
Hi,

At NFS@Home I'm seeing errors when some MacOS clients report results. Looking at the latest Apache access.log, 667 of the latest 1030 MacOS attempts to report results failed, while none of the over 30000 attempts failed for other OSs. This was first reported in late August, but we have fewer participants using MacOS so I'm just now trying to track it down. The MacOS clients have no issue connecting to the project, getting work, or uploading results. But when they try to report results the client reports the following:

Project communication failed: attempting access to reference site
Scheduler request failed: Failure when receiving data from the peer

The corresponding entry in access.log shows HTTP status code 400, Bad Request:

10.67.149.53 - - [10/Nov/2021:16:20:53 -0800] "POST /nfs_cgi/cgi HTTP/1.1" 400 25 "-" "BOINC client (x86_64-apple-darwin 7.14.2)"

while error.log has the following corresponding entry:

[Wed Nov 10 16:20:53.434734 2021] [cgi:error] [pid 29762] (104)Connection reset by peer: [client 10.67.149.53:61801] AH01225: Error reading request entity data

There is no corresponding entry in the BOINC logs. The IP address in the above is mine. Client versions exhibiting the error range from 7.12.0 to 7.16.19. 

The server is running fully updated Ubuntu 18.04.6 LTS and the current BOINC server code. Do you have any suggestions about how I might track down what's happening with the result reporting on MacOS clients that's different from the others?

Thanks,
Greg 






Jérôme Cadet

unread,
Nov 11, 2021, 6:49:56 AM11/11/21
to Greg Childers, Boinc Projects
Hi Greg

I wonder why your email had white font ?

Anyway I had those "
Failure when receiving data from the peer" errors on NFS and TN-Grid since end of August on my recent iMac (end 2020, latest intel iMac, I was in a hurry not to rush to M1), just after I upgraded it to Big Sur. I had some tasks I couldn't report, and I couldn't get any new task on both projects, always with that error.

I made some topics on both projects forum without success, until very recently when someone simply suggested me to reset the project (on TN-Grid forum) : hurray, it worked ! I lost these old outdated tasks but now it connects to the projects without any error.

So "something from the old world" (pre Big Sur) was stuck into my boinc and got cleaned with this simple action.

I feel dull I didn't think about this earlier...

Hope this helps, bon courage :)

Jérôme

--
You received this message because you are subscribed to the Google Groups "boinc_projects" group.
To unsubscribe from this group and stop receiving emails from it, send an email to boinc_project...@ssl.berkeley.edu.
To view this discussion on the web visit https://groups.google.com/a/ssl.berkeley.edu/d/msgid/boinc_projects/CAPOvAR4zkyEgDE4BhXbr%3DH_ayxSO4BddHyUY7pY5tW80rrxpRw%40mail.gmail.com.

--
Jérôme Cadet

(sent from Postbox)

Jérôme Cadet

unread,
Nov 11, 2021, 6:52:32 AM11/11/21
to Greg Childers, Boinc Projects
There is something very weird with your initial email, even my answer (font changed back to black) seems to have been sent with white font again !!

J.

Greg Childers

unread,
Nov 11, 2021, 1:28:31 PM11/11/21
to Jérôme Cadet, Boinc Projects
Resetting the project isn't working for me. I even tried removing the project and attaching it again. It connects to the project, downloads files, gets works, processes the work, and uploads the result files all successfully. But it fails when it tries to report the results. Below is the log file from the client.

Greg

Thu Nov 11 08:30:13 2021 | NFS@Home | Resetting project
Thu Nov 11 08:30:13 2021 | NFS@Home | Detaching from project
Thu Nov 11 08:30:25 2021 |  | Project communication failed: attempting access to reference site
Thu Nov 11 08:30:27 2021 |  | Internet access OK - project servers may be temporarily down.
Thu Nov 11 08:30:35 2021 |  | Fetching configuration file from https://escatter11.fullerton.edu/nfs/get_project_config.php
Thu Nov 11 08:31:15 2021 |  | Fetching configuration file from https://escatter11.fullerton.edu/nfs/get_project_config.php
Thu Nov 11 08:32:29 2021 |  | Fetching configuration file from https://escatter11.fullerton.edu/nfs/get_project_config.php
Thu Nov 11 08:32:50 2021 | NFS@Home | Master file download succeeded
Thu Nov 11 08:32:55 2021 | NFS@Home | Sending scheduler request: Project initialization.
Thu Nov 11 08:32:55 2021 | NFS@Home | Requesting new tasks for CPU and Intel GPU
Thu Nov 11 08:32:57 2021 | NFS@Home | Scheduler request completed: got 1 new tasks
Thu Nov 11 08:32:59 2021 | NFS@Home | Started download of lasieved_1.10_x86_64-apple-darwin
Thu Nov 11 08:32:59 2021 | NFS@Home | Started download of C160_79_97_30.poly
Thu Nov 11 08:33:00 2021 | NFS@Home | Finished download of lasieved_1.10_x86_64-apple-darwin
Thu Nov 11 08:33:00 2021 | NFS@Home | Finished download of C160_79_97_30.poly
Thu Nov 11 08:33:00 2021 | NFS@Home | Started download of banner_290.png
Thu Nov 11 08:33:00 2021 | NFS@Home | Started download of nfs_40.png
Thu Nov 11 08:33:00 2021 | NFS@Home | Starting task C160_79_97_30_58864_0
Thu Nov 11 08:33:01 2021 | NFS@Home | Finished download of banner_290.png
Thu Nov 11 08:33:01 2021 | NFS@Home | Finished download of nfs_40.png
Thu Nov 11 08:33:07 2021 | NFS@Home | Sending scheduler request: To fetch work.
Thu Nov 11 08:33:07 2021 | NFS@Home | Requesting new tasks for CPU
Thu Nov 11 08:33:10 2021 | NFS@Home | Scheduler request completed: got 3 new tasks
Thu Nov 11 08:33:12 2021 | NFS@Home | Started download of lasievee_1.10_x86_64-apple-darwin
Thu Nov 11 08:33:12 2021 | NFS@Home | Started download of 13_2_912m1.poly
Thu Nov 11 08:33:12 2021 | NFS@Home | Starting task C160_79_97_30_59040_0
Thu Nov 11 08:33:13 2021 | NFS@Home | Finished download of lasievee_1.10_x86_64-apple-darwin
Thu Nov 11 08:33:13 2021 | NFS@Home | Finished download of 13_2_912m1.poly
Thu Nov 11 08:33:13 2021 | NFS@Home | Starting task 13_2_912m1_162908_0
Thu Nov 11 08:33:13 2021 | NFS@Home | Starting task 13_2_912m1_163016_0
Thu Nov 11 08:36:16 2021 |  | Resuming GPU computation
Thu Nov 11 09:03:43 2021 |  | Suspending GPU computation - computer is in use
Thu Nov 11 09:06:58 2021 |  | Resuming GPU computation
Thu Nov 11 09:11:14 2021 |  | Suspending GPU computation - computer is in use
Thu Nov 11 09:17:14 2021 |  | Resuming GPU computation
Thu Nov 11 09:34:25 2021 | NFS@Home | Computation for task 13_2_912m1_163016_0 finished
Thu Nov 11 09:34:28 2021 | NFS@Home | Started upload of 13_2_912m1_163016_0_r1203217953_0
Thu Nov 11 09:34:30 2021 | NFS@Home | Sending scheduler request: To fetch work.
Thu Nov 11 09:34:30 2021 | NFS@Home | Requesting new tasks for CPU
Thu Nov 11 09:34:31 2021 | NFS@Home | Finished upload of 13_2_912m1_163016_0_r1203217953_0
Thu Nov 11 09:34:31 2021 | NFS@Home | Scheduler request completed: got 5 new tasks
Thu Nov 11 09:34:33 2021 | NFS@Home | Started download of lasieve5f_1.11_x86_64-apple-darwin
Thu Nov 11 09:34:33 2021 | NFS@Home | Started download of 8m3_341.poly
Thu Nov 11 09:34:33 2021 | NFS@Home | Starting task C160_79_97_30_62784_0
Thu Nov 11 09:34:34 2021 | NFS@Home | Finished download of 8m3_341.poly
Thu Nov 11 09:34:34 2021 | NFS@Home | Started download of 9p2_542L.poly
Thu Nov 11 09:34:35 2021 | NFS@Home | Finished download of lasieve5f_1.11_x86_64-apple-darwin
Thu Nov 11 09:34:35 2021 | NFS@Home | Finished download of 9p2_542L.poly
Thu Nov 11 09:35:42 2021 | NFS@Home | Computation for task C160_79_97_30_59040_0 finished
Thu Nov 11 09:35:42 2021 | NFS@Home | Starting task C160_79_97_30_60096_0
Thu Nov 11 09:35:44 2021 | NFS@Home | Started upload of C160_79_97_30_59040_0_r1603864018_0
Thu Nov 11 09:35:50 2021 | NFS@Home | Finished upload of C160_79_97_30_59040_0_r1603864018_0
Thu Nov 11 09:40:00 2021 | NFS@Home | Computation for task C160_79_97_30_58864_0 finished
Thu Nov 11 09:40:00 2021 | NFS@Home | Starting task 8m3_341_190697_0
Thu Nov 11 09:40:02 2021 | NFS@Home | Started upload of C160_79_97_30_58864_0_r1601277300_0
Thu Nov 11 09:40:08 2021 | NFS@Home | Finished upload of C160_79_97_30_58864_0_r1601277300_0
Thu Nov 11 09:40:55 2021 | NFS@Home | Computation for task 13_2_912m1_162908_0 finished
Thu Nov 11 09:40:55 2021 | NFS@Home | Starting task 9p2_542L_143168_0
Thu Nov 11 09:40:57 2021 | NFS@Home | Started upload of 13_2_912m1_162908_0_r947559821_0
Thu Nov 11 09:41:03 2021 | NFS@Home | Finished upload of 13_2_912m1_162908_0_r947559821_0
Thu Nov 11 09:41:15 2021 | NFS@Home | Computation for task 8m3_341_190697_0 finished
Thu Nov 11 09:41:15 2021 | NFS@Home | Starting task 13_2_912m1_164420_0
Thu Nov 11 09:41:17 2021 | NFS@Home | Started upload of 8m3_341_190697_0_r2072136589_0
Thu Nov 11 09:41:19 2021 | NFS@Home | Finished upload of 8m3_341_190697_0_r2072136589_0
Thu Nov 11 10:19:25 2021 |  | Suspending GPU computation - computer is in use
Thu Nov 11 10:20:21 2021 | NFS@Home | update requested by user
Thu Nov 11 10:20:25 2021 | NFS@Home | Sending scheduler request: Requested by user.
Thu Nov 11 10:20:25 2021 | NFS@Home | Reporting 5 completed tasks
Thu Nov 11 10:20:25 2021 | NFS@Home | Requesting new tasks for CPU
Thu Nov 11 10:20:26 2021 |  | Project communication failed: attempting access to reference site
Thu Nov 11 10:20:26 2021 | NFS@Home | Scheduler request failed: Failure when receiving data from the peer
Thu Nov 11 10:20:28 2021 |  | Internet access OK - project servers may be temporarily down.


Greg Childers

unread,
Nov 11, 2021, 5:31:09 PM11/11/21
to Jérôme Cadet, Boinc Projects
I did a packet capture on a failed reporting of the results from my Mac.

$ tcpdump -r mac.pcap
reading from file mac.pcap, link-type EN10MB (Ethernet)
13:45:50.932508 IP 10.67.149.53.62797 > escatter11.fullerton.edu.http: Flags [S], seq 3931536691, win 65535, options [mss 1386,nop,wscale 6,nop,nop,TS val 2162807049 ecr 0,sackOK,eol], length 0
13:45:50.933011 IP escatter11.fullerton.edu.http > 10.67.149.53.62797: Flags [S.], seq 2568786091, ack 3931536692, win 65160, options [mss 1460,sackOK,TS val 2141512305 ecr 2162807049,nop,wscale 7], length 0
13:45:50.935690 IP 10.67.149.53.62797 > escatter11.fullerton.edu.http: Flags [.], ack 1, win 2061, options [nop,nop,TS val 2162807073 ecr 2141512305], length 0
13:45:50.936852 IP 10.67.149.53.62797 > escatter11.fullerton.edu.http: Flags [P.], seq 1:281, ack 1, win 2061, options [nop,nop,TS val 2162807073 ecr 2141512305], length 280: HTTP: POST /nfs_cgi/cgi HTTP/1.1
13:45:50.936958 IP escatter11.fullerton.edu.http > 10.67.149.53.62797: Flags [.], ack 281, win 507, options [nop,nop,TS val 2141512309 ecr 2162807073], length 0
13:45:50.946875 IP escatter11.fullerton.edu.http > 10.67.149.53.62797: Flags [P.], seq 1:26, ack 281, win 507, options [nop,nop,TS val 2141512319 ecr 2162807073], length 25: HTTP: HTTP/1.1 100 Continue
13:45:50.967650 IP 10.67.149.53.62797 > escatter11.fullerton.edu.http: Flags [.], ack 26, win 2060, options [nop,nop,TS val 2162807086 ecr 2141512319], length 0
13:45:50.974096 IP 10.67.149.53.62797 > escatter11.fullerton.edu.http: Flags [.], seq 281:9899, ack 26, win 2060, options [nop,nop,TS val 2162807086 ecr 2141512319], length 9618: HTTP
13:45:50.974316 IP 10.67.149.53.62797 > escatter11.fullerton.edu.http: Flags [.], seq 9899:14021, ack 26, win 2060, options [nop,nop,TS val 2162807086 ecr 2141512319], length 4122: HTTP
13:45:50.974534 IP escatter11.fullerton.edu.http > 10.67.149.53.62797: Flags [.], ack 14021, win 431, options [nop,nop,TS val 2141512346 ecr 2162807086], length 0
13:45:50.977021 IP 10.67.149.53.62797 > escatter11.fullerton.edu.http: Flags [.], seq 14021:15395, ack 26, win 2060, options [nop,nop,TS val 2162807113 ecr 2141512346], length 1374: HTTP
13:45:50.977161 IP 10.67.149.53.62797 > escatter11.fullerton.edu.http: Flags [.], seq 15395:16769, ack 26, win 2060, options [nop,nop,TS val 2162807113 ecr 2141512346], length 1374: HTTP
13:45:50.977224 IP escatter11.fullerton.edu.http > 10.67.149.53.62797: Flags [.], ack 16769, win 495, options [nop,nop,TS val 2141512349 ecr 2162807113], length 0
13:45:50.977423 IP 10.67.149.53.62797 > escatter11.fullerton.edu.http: Flags [.], seq 16769:18143, ack 26, win 2060, options [nop,nop,TS val 2162807113 ecr 2141512346], length 1374: HTTP
13:45:50.977955 IP 10.67.149.53.62797 > escatter11.fullerton.edu.http: Flags [R.], seq 19517, ack 26, win 2060, length 0
13:45:50.978026 IP escatter11.fullerton.edu.http > 10.67.149.53.62797: Flags [.], ack 18143, win 501, options [nop,nop,TS val 2141512350 ecr 2162807113], length 0
13:45:50.978043 IP 10.67.149.53.62797 > escatter11.fullerton.edu.http: Flags [R.], seq 18143, ack 26, win 2060, length 0

It appears to be sending the results as expected, but then the Mac sends a reset to the server. I'm not sure at all if this helps clarify where the problem lies.

Greg

Charlie Fenton

unread,
Nov 11, 2021, 7:59:56 PM11/11/21
to boinc_projects email List, Greg Childers
Hi Greg,

I wonder if these two bits of information might imply that the version of MacOS plays a role in this issue:

On Nov 10, 2021, at 4:55 PM, Greg Childers <jgchilde...@gmail.com> wrote:
> 667 of the latest 1030 MacOS attempts to report results failed


On Nov 11, 2021, at 3:49 AM, Jérôme Cadet <jerome...@pobox.com> wrote:
> just after I upgraded it to Big Sur.


Might it be worth checking if there is any correlation between the failed tasks and the version of MacOS on the client machines?

Cheers,
--Charlie
> To view this discussion on the web visit https://groups.google.com/a/ssl.berkeley.edu/d/msgid/boinc_projects/375439c2-fe50-a6be-3c0c-4e11aa938830%40pobox.com.

Greg Childers

unread,
Nov 12, 2021, 4:53:24 PM11/12/21
to Charlie Fenton, boinc_projects email List
I took a sampling of the Mac IPs failing to connect to /nfs_cgi/cgi with HTTP error 400. BOINC reports most are on Big Sur, but there were a few running Catalina and one still on High Sierra. It's not strictly related to the OS version.

Greg

James Wanless

unread,
Nov 12, 2021, 6:14:04 PM11/12/21
to Greg Childers, Charlie Fenton, boinc_projects email List

Hi Greg,
I have two suggestions for diagnosing this:
1) I expect you’re already aware, but fairly recently, Windows users had problems with SSL certs, so maybe this is similar??
2) The web is crazy these days with its levels of redirection - if you copy a local file from one machine to another within your local network, the DNS at least, and possibly more, goes all round the world, via Server Alley, now in Virginia, in your neck of the woods… I have noticed that Apple is a particular offender, and when the level of redirect exceeds some max value, weird things like you’re reporting can happen. On the bright side, if that is what it is, at some point somebody will notice and up the level of redirect allowed for your project specifically (or at least some AI might)
Regards,
J

Sent from my iPad

On 12 Nov 2021, at 21:53, Greg Childers <jgchilde...@gmail.com> wrote:


Sent from my iPad

On 12 Nov 2021, at 21:53, Greg Childers <jgchilde...@gmail.com> wrote:



Greg Childers

unread,
Nov 12, 2021, 7:02:10 PM11/12/21
to James Wanless, Charlie Fenton, boinc_projects email List
Thanks James. I don't think this is related to the SSL certs issue since the timing is wrong, and I don't use encryption for scheduler or file uploader URLs. Perhaps that's somehow the issue? Should I enable SSL for everything rather than just setting SECURE_URL_BASE?

I switched the project to FastCGI to see if it made any difference. I'm still seeing the problem, but it's now giving an HTTP status error 500 instead of 400:

10.67.149.53 - - [12/Nov/2021:15:50:15 -0800] "POST /nfs_cgi/cgi HTTP/1.1" 500 25 "-" "BOINC client (x86_64-apple-darwin 7.14.2)"

And the error message is a bit different:

[Fri Nov 12 15:50:15.520691 2021] [fcgid:warn] [pid 40094] (104)Connection reset by peer: [client 10.67.149.53:63819] mod_fcgid: can't get data from http client

Greg



James Wanless

unread,
Nov 12, 2021, 7:07:27 PM11/12/21
to Greg Childers, Charlie Fenton, boinc_projects email List
You’re welcome :) I’m not an expert on SSL certs, but I would add that I suspect my points 1) and 2) are actually related…
J

Sent from my iPad

On 13 Nov 2021, at 00:02, Greg Childers <jgchilde...@gmail.com> wrote:


Sent from my iPad

On 13 Nov 2021, at 00:02, Greg Childers <jgchilde...@gmail.com> wrote:



Charlie Fenton

unread,
Nov 12, 2021, 8:52:41 PM11/12/21
to boinc_projects email List
The Mac clients have not had the same problem with SSL certs as Windows clients, as the Mac clients have used a newer version of openSSL for a very long time.

Cheers,
--Charlie

Greg Childers

unread,
Nov 13, 2021, 2:56:55 AM11/13/21
to Charlie Fenton, boinc_projects email List
I changed NFS@Home to use SSL for all traffic including scheduler requests and file uploads, and the problem with Mac clients vanished. Mac clients are now successfully getting and reporting work. Perhaps something was mangling the network traffic from only Macs when it was unencrypted...

Greg

Reply all
Reply to author
Forward
0 new messages