mender-1.3.0: client update fails systematically

356 views
Skip to first unread message

Alan Alberghini

unread,
May 15, 2018, 6:33:54 AM5/15/18
to Mender List mender.io
Greetings,

I'm trying to deploy a new artifact version on a client but it keeps failing every time and I don't understand what's wrong.

This is is the only client (as of now) that exhibits this problem: other similar clients have no problems at all updating from the same server and using the very same artifact.

I tried also launching the mender client with `-debug -daemon` to better see error messages, if any, but to no avail.
The relevant debug session output is this:
```
INFO[0001] successfuly received new authorization data   module=mender
INFO[0001] State transition: authorize [Sync] -> check-wait [Idle]  module=mender
DEBU[0001] statescript: timeout for executing scripts is not defined; using default of 60 seconds  module=executor
DEBU[0001] statescript: timeout for executing scripts is not defined; using default of 60 seconds  module=executor
DEBU[0001] handle check wait state                       module=state
DEBU[0001] check wait state; next checks: (update: 0001-01-01 00:05:00 +0000 UTC) (inventory: 0001-01-01 00:00:00 +0000 UTC)  module=state
DEBU[0001] next check: 0001-01-01 00:00:00 +0000 UTC:&{{4 2}}, (2018-05-15 09:08:43.194575885 +0000 UTC m=+1.201161619)  module=state
DEBU[0001] check wait returned: &{{4 2}}                 module=state
INFO[0001] State transition: check-wait [Idle] -> inventory-update [Sync]  module=mender
DEBU[0001] statescript: timeout for executing scripts is not defined; using default of 60 seconds  module=executor
DEBU[0001] statescript: timeout for executing scripts is not defined; using default of 60 seconds  module=executor
DEBU[0004] Read data from device manifest file: device_type=my-device-type  module=mender
DEBU[0004] Found needed line: device_type=my-device-type     module=mender
DEBU[0004] Current manifest data: my-device-type             module=mender
DEBU[0004] Read data from device manifest file: artifact_name=update_20180322_1  module=mender
DEBU[0004] Found needed line: artifact_name=update_20180322_1  module=mender
DEBU[0004] Current manifest data: update_20180322_1      module=mender
DEBU[0004] inventory update sent, response &{200 OK 200 HTTP/1.1 1 1 map[Content-Type:[application/json; charset=utf-8] Content-Length:[0] Vary:[Accept-Encoding] X-Inventory-Version:[unknown] X-Men-Requestid:[cbc781ff-4f9a-40ea-84a8-6f9e731da1e2] X-Xss-Protection:[1; mode=block] Cache-Control:[no-cache, no-store] Connection:[keep-alive] X-Frame-Options:[DENY] Date:[Tue, 15 May 2018 09:08:46 GMT] Content-Encoding:[gzip] Server:[nginx/1.10.3] X-Content-Type-Options:[nosniff] Pragma:[no-cache]] 0x44201c8020 0 [] false false map[] 0x44201a0600 0x442033a000}  module=client_inventory
DEBU[0004] inventory refresh complete                    module=state
INFO[0004] State transition: inventory-update [Sync] -> check-wait [Idle]  module=mender
DEBU[0004] statescript: timeout for executing scripts is not defined; using default of 60 seconds  module=executor
DEBU[0004] statescript: timeout for executing scripts is not defined; using default of 60 seconds  module=executor
DEBU[0004] handle check wait state                       module=state
DEBU[0004] check wait state; next checks: (update: 0001-01-01 00:05:00 +0000 UTC) (inventory: 2018-05-15 09:23:43.196025803 +0000 UTC m=+901.202611336)  module=state
DEBU[0004] next check: 0001-01-01 00:05:00 +0000 UTC:&{{6 2}}, (2018-05-15 09:08:46.720001488 +0000 UTC m=+4.726587121)  module=state
DEBU[0004] check wait returned: &{{6 2}}                 module=state
INFO[0004] State transition: check-wait [Idle] -> update-check [Sync]  module=mender
DEBU[0004] statescript: timeout for executing scripts is not defined; using default of 60 seconds  module=executor
DEBU[0004] statescript: timeout for executing scripts is not defined; using default of 60 seconds  module=executor
DEBU[0004] handle update check state                     module=state
DEBU[0004] Read data from device manifest file: artifact_name=update_20180322_1  module=mender
DEBU[0004] Found needed line: artifact_name=update_20180322_1  module=mender
DEBU[0004] Current manifest data: update_20180322_1      module=mender
DEBU[0004] Read data from device manifest file: device_type=my-device-type  module=mender
DEBU[0004] Found needed line: device_type=my-device-type     module=mender
DEBU[0004] Current manifest data: my-device-type             module=mender
DEBU[0004] Received response:200 OK                      module=client_update
DEBU[0004] Have update available                         module=client_update
INFO[0004] Correct request for getting image from: https://update.example.com/mender-artifact-storage/<OMISSIS> [name: 0x5eeeb0; devices: [my-device-type]]  module=client_update
DEBU[0004] received update response: {{{https://update.example.com/mender-artifact-storage/<OMISSIS> 2018-05-15T10:08:46.760786361Z} [my-device-type] update_20180328_2} 0cc6f0e3-e6c6-42c5-9d1d-0cc68983446d}  module=mender
INFO[0004] State transition: update-check [Sync] -> update-fetch [Download]  module=mender
DEBU[0004] statescript: timeout for executing scripts is not defined; using default of 60 seconds  module=executor
DEBU[0004] statescript: timeout for executing scripts is not defined; using default of 60 seconds  module=executor
INFO[0004] State transition: update-fetch [Download] -> update-status-report [none]  module=mender
DEBU[0004] handle update status report state             module=state
DEBU[0004] status reported, response &{204 No Content 204 HTTP/1.1 1 1 map[Server:[nginx/1.10.3] X-Frame-Options:[DENY] X-Content-Type-Options:[nosniff] Access-Control-Allow-Headers:[Authorization,Content-type] Connection:[keep-alive] X-Deployments-Version:[unknown] Cache-Control:[no-cache, no-store] Access-Control-Allow-Origin:[*] Date:[Tue, 15 May 2018 09:08:46 GMT] Content-Encoding:[gzip] Pragma:[no-cache] X-Xss-Protection:[1; mode=block] Access-Control-Allow-Methods:[GET, POST, OPTIONS, PUT, DELETE] Content-Type:[application/json; charset=utf-8] Vary:[Accept-Encoding] X-Men-Requestid:[c5a15df4-645f-4cb1-a5be-290734ae950c]] 0x44201b8880 0 [] false false map[] 0x44204d8100 0x442033a000}  module=client_status
DEBU[0004] attempting to upload deployment logs for failed update  module=state
ERRO[0004] got unexpected HTTP status when uploading log: 400  module=client_log
ERRO[0004] error uploading logs: uploading logs failed, bad status 400  module=mender
ERRO[0004] failed to report deployment logs: transient error: uploading logs failed, bad status 400  module=state
ERRO[0004] failed to send deployment logs to server: transient error: failed to send deployment logs: transient error: uploading logs failed, bad status 400  module=state
INFO[0004] State transition: update-status-report [Download] -> update-retry-report [none]  module=mender
DEBU[0124] wait complete                                 module=state
INFO[0124] State transition: update-retry-report [Download] -> update-status-report [Download]  module=mender
DEBU[0124] handle update status report state             module=state
DEBU[0124] attempting to upload deployment logs for failed update  module=state
ERRO[0125] got unexpected HTTP status when uploading log: 400  module=client_log
ERRO[0125] error uploading logs: uploading logs failed, bad status 400  module=mender
ERRO[0125] failed to report deployment logs: transient error: uploading logs failed, bad status 400  module=state
ERRO[0125] failed to send deployment logs to server: transient error: failed to send deployment logs: transient error: uploading logs failed, bad status 400  module=state
INFO[0125] State transition: update-status-report [Download] -> update-retry-report [none]  module=mender
```

The client seems to fail as soon as the "Download" state is reached, but I cannot understand what's causing it to fail.
It then starts trying to upload the error log, but the file does not exist at all, resulting in HTTP 400 errors when trying to PUT it on the server.
The corresponding errors on the deployments server are these:

```
mender-deployments_1        | time="2018-05-15T09:33:34Z" level=error msg="no messages: invalid deployment log" device_id=5abb88e148949e0001934536 file="controller_deployments.go" func="controller.(*DeploymentsController).PutDeploymentLogForDevice" line=377 request_id=ba8816f4-8311-4776-82be-c1e79f634e23
mender-deployments_1        | time="2018-05-15T09:33:34Z" level=info msg="400 1922μs PUT /api/devices/v1/deployments/device/deployments/db2a7bfd-e25b-4fe9-b87e-e7af32ed2aa8/log HTTP/1.0 - Go-http-client/1.1" device_id=5abb88e148949e0001934536 file=middleware.go func="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1" line=60 request_id=ba8816f4-8311-4776-82be-c1e79f634e23
```

I tried creating a fake log file in `/var/lib/mender/deployments.0001.0cc6f0e3-e6c6-42c5-9d1d-0cc68983446d.log` and "unlocked" the client which could now upload something to the server and return to the "Idle" state. From here, trying to perform another deployment results in the client failing the same way again as soon as the "Download" state is reached, failing to upload a non-existing log file.

Is this a known issue?

Alan Alberghini

unread,
May 17, 2018, 4:20:31 AM5/17/18
to Mender List mender.io
Just found out this is happening on another device with a different (older) artifact installed.

Any advice?

Mirza Krak

unread,
May 17, 2018, 4:39:23 AM5/17/18
to mender
On 17 May 2018 at 10:20, Alan Alberghini <al...@iooota.com> wrote:
> Just found out this is happening on another device with a different (older)
> artifact installed.
>
> Any advice?

Definitely some odd things going on here and I do not really have any
good answers yet. But will try :).

>INFO[0004] State transition: update-check [Sync] -> update-fetch [Download] module=mender
>DEBU[0004] statescript: timeout for executing scripts is not defined; using default of 60 seconds module=executor
>DEBU[0004] statescript: timeout for executing scripts is not defined; using default of 60 seconds module=executor
>INFO[0004] State transition: update-fetch [Download] -> update-status-report [none] module=mender

Can you verify that the download URL is valid that the client receives
from the server?

If I understand correctly this should be the full URL that you get
from the server to download the artifact from:

https://update.example.com/mender-artifact-storage/<OMISSIS>

Which seems a bit malformed? Or did you mask it when pasting log?

>DEBU[0004] attempting to upload deployment logs for failed update module=state
>ERRO[0004] got unexpected HTTP status when uploading log: 400 module=client_log

Above seems like a bug to me, that is that it keeps failing when the
update log file does not exist.

--
Med Vänliga Hälsningar / Best Regards

Mirza Krak

Alan Alberghini

unread,
May 17, 2018, 4:55:00 AM5/17/18
to Mender List mender.io

Il giorno giovedì 17 maggio 2018 10:39:23 UTC+2, Mirza Krak ha scritto:

On 17 May 2018 at 10:20, Alan Alberghini <al...@iooota.com> wrote:
> Just found out this is happening on another device with a different (older)
> artifact installed.
>
> Any advice?

Definitely some odd things going on here and I do not really have any
good answers yet. But will try :).

Much appreciated :-)


>INFO[0004] State transition: update-check [Sync] -> update-fetch [Download]  module=mender
>DEBU[0004] statescript: timeout for executing scripts is not defined; using default of 60 seconds  module=executor
>DEBU[0004] statescript: timeout for executing scripts is not defined; using default of 60 seconds  module=executor
>INFO[0004] State transition: update-fetch [Download] -> update-status-report [none]  module=mender

Can you verify that the download URL is valid that the client receives
from the server?

If I understand correctly this should be the full URL that you get
from the server to download the artifact from:

     https://update.example.com/mender-artifact-storage/<OMISSIS>

Which seems a bit malformed? Or did you mask it when pasting log?

I masked it before pasting here, but the real download URL is definitely legit and working: I tested it directly on the affected devices via wget and it started downloading an application/vnd.mender-artifact (so the mime type is correct) file of the expected size.

>DEBU[0004] attempting to upload deployment logs for failed update  module=state
>ERRO[0004] got unexpected HTTP status when uploading log: 400  module=client_log

Above seems like a bug to me, that is that it keeps failing when the
update log file does not exist.

Could there be a race condition or something that hinders the creation of the log file on mender client and then other parts assume it to be present while it is not?
I think, however, that this problem with the log is only a symptom (i.e. an unforeseen state) and not the root cause of the update failure: for that, I think we should look at what could go wrong in the client immediately after entering the Download state…

Mirza Krak

unread,
May 17, 2018, 6:36:42 AM5/17/18
to mender
Agree with the above, that it is only a unforeseen state, and not the
root cause.

Mirza Krak

unread,
May 17, 2018, 6:40:38 AM5/17/18
to mender
On 17 May 2018 at 10:55, Alan Alberghini <al...@iooota.com> wrote:
Looking at the code at
https://github.com/mendersoftware/mender/blob/1.3.0/client/client_update.go#L85-L116,
it does not seem that it even reaches this method based on your log.

Which means it fails "silently" somewhere else. Harder to find where :)

Mirza Krak

unread,
May 17, 2018, 7:02:11 AM5/17/18
to mender
My guess is that it fails here,
https://github.com/mendersoftware/mender/blob/master/state.go#L604-L608.
There is very little log output if it fails here which explains what
we are seeing in your log and that explains the missing deployment log
file.

Which could mean:

- Out of storage on device and it fails to create a deployment log file
- insufficient permissions to create/open a file deployment log file

These are speculations at best :).

Alan Alberghini

unread,
May 17, 2018, 7:16:25 AM5/17/18
to Mender List mender.io
You're definitely right! it didn't occur to me that the space on the partition where `/var/lib/mender` is redirected to got accidentally 100% full.
By freeing up some space I was finally able to start updating those devices.
Thank you very much for your help!
Do you think this should make it into a bug report, at least to have an informative message printed by the client?

Mirza Krak

unread,
May 17, 2018, 7:24:55 AM5/17/18
to men...@lists.mender.io
Definitely.

Room for improvement here when it comes to:

- log output 
- this can be avoided by pre-allocating storage space for the Mender client deployment logs (full /data part should not result in not being to do updates)
- able to handle "non existing" files when trying to upload deployment log.

/ Mirza

Reply all
Reply to author
Forward
0 new messages