mender-deployments service is re-starting constantly

319 views
Skip to first unread message

Shawn Stevenson

unread,
Jan 28, 2019, 8:27:30 PM1/28/19
to Mender List mender.io
Hello,

I have followed the instructions for production installation for Mender 1.7 (https://docs.mender.io/1.7/administration/production-installation) and I have a problem with the mender-deployments service. It is restarting about once every minute. There appears to be a problem occurring inside the file migrator_simple.go. Here is a small piece of the log file (the same messages repeat over and over again):


^[[36mmender-deployments_1    |^[[0m WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
^[[36mmender-deployments_1    |^[[0m time="2019-01-29T00:22:16Z" level=info msg="Deployments Service, version unknown starting up" file=main.go func=main.cmdServer line=103
^[[36mmender-deployments_1    |^[[0m time="2019-01-29T00:22:16Z" level=info msg="automigrate is ON, will apply migrations" file=migrations.go func=migrations.Migrate line=48
^[[36mmender-deployments_1    |^[[0m time="2019-01-29T00:22:16Z" level=info msg="migrating deployment_service" file=migrations.go func=migrations.MigrateSingle line=70
^[[36mmender-deployments_1    |^[[0m time="2019-01-29T00:22:16Z" level=info msg="applying migration from version 0.0.0 to 1.2.1" db="deployment_service" file="migrator_simple.go" func="migrate.(*SimpleMigrator).Apply" line=101
^[[36mmender-deployments_1    |^[[0m time="2019-01-29T00:22:16Z" level=error msg="migration from 0.0.0 to 1.2.1 failed: index not found with name [deploymentconstructor.name_text_deploymentconstructor.artifactname_text]" db="deployment_service" file="migrator_simple.go" func="migrate.(*SimpleMigrator).Apply" line=106
^[[36mmender-deployments_1    |^[[0m failed to run migrations: failed to apply migrations: failed to apply migration from 0.0.0 to 1.2.1: index not found with name [deploymentconstructor.name_text_deploymentconstructor.artifactname_text]

Does anybody have any idea what the problem could be?

Docker engine version is 18.09.1
Docker-compose version is 1.23.2 (docker-py version 3.6.0)
Ubuntu version is 16.04

Thanks,
Shawn

Mirza Krak

unread,
Jan 29, 2019, 12:50:23 PM1/29/19
to Mender List mender.io
I have pinged some people to get some feedback.

Can you also provide the output of "docker ps" and "docker images |
grep mendersoftware"

--
Mirza Krak | Embedded Solutions Architect | https://mender.io

Northern.tech AS | @northerntechHQ

Shawn Stevenson

unread,
Jan 29, 2019, 1:32:22 PM1/29/19
to Mender List mender.io, mirza...@northern.tech
I was able to find a temporary work-around to the problem by commenting out the " command: server --automigrate" line for mender-deployments in the prod.yml file.

Here are the logs you asked for (with the work-around out again):

docker ps
CONTAINER ID        IMAGE                                      COMMAND                  CREATED              STATUS                          PORTS                    NAMES
0f32a56eb783        mendersoftware/api-gateway:1.6.0           "/entrypoint.sh"         44 seconds ago       Up 27 seconds                   0.0.0.0:443->443/tcp     menderproduction_mender-api-gateway_1
b2808b122eeb        mendersoftware/deployments:1.6.0           "/entrypoint.sh --co…"   About a minute ago   Restarting (3) 12 seconds ago                            menderproduction_mender-deployments_1
a481bd329137        openresty/openresty:1.13.6.2-0-alpine      "/usr/local/openrest…"   About an hour ago    Up 58 seconds                   0.0.0.0:9000->9000/tcp   menderproduction_storage-proxy_1
624edf14a27a        mendersoftware/inventory:1.5.0             "/usr/bin/inventory …"   About an hour ago    Up 44 seconds                   8080/tcp                 menderproduction_mender-inventory_1
f20deb5f282c        mendersoftware/deviceauth:1.7.0            "/usr/bin/deviceauth…"   About an hour ago    Up 45 seconds                   8080/tcp                 menderproduction_mender-device-auth_1
28b4ff1ea244        mendersoftware/useradm:1.7.0               "/usr/bin/useradm --…"   About an hour ago    Up 46 seconds                   8080/tcp                 menderproduction_mender-useradm_1
b4a9efef2f29        mendersoftware/mender-conductor:1.2.0      "/srv/start_conducto…"   About an hour ago    Up 47 seconds                   8080/tcp                 menderproduction_mender-conductor_1
2d135cae3c9c        minio/minio:RELEASE.2018-09-25T21-34-43Z   "/usr/bin/docker-ent…"   About an hour ago    Up About a minute (healthy)     9000/tcp                 menderproduction_minio_1
f6a8987b8cf2        mendersoftware/gui:1.7.0                   "/entrypoint.sh"         About an hour ago    Up About a minute               80/tcp                   menderproduction_mender-gui_1
193ce2d061ee        mongo:3.4                                  "docker-entrypoint.s…"   About an hour ago    Up About a minute               27017/tcp                menderproduction_mender-mongo_1
a73401ce828a        redis:3.2.11-alpine                        "/redis/entrypoint.sh"   19 hours ago         Up About a minute               6379/tcp                 menderproduction_mender-redis_1
56bf06d86a7b        mendersoftware/elasticsearch:2.4           "/docker-entrypoint.…"   19 hours ago         Up About a minute               9200/tcp, 9300/tcp       menderproduction_mender-elasticsearch_1

mendersoftware/useradm            1.7.0                          df9da74a42e1        6 weeks ago         20.5MB
mendersoftware/mender-conductor   1.2.0                          64a0c19ffb56        6 weeks ago         213MB
mendersoftware/api-gateway        1.6.0                          b59195670aad        6 weeks ago         54MB
mendersoftware/inventory          1.5.0                          82654929df3a        6 weeks ago         20.3MB
mendersoftware/gui                1.7.0                          a18fdc3136e9        6 weeks ago         42.2MB
mendersoftware/deviceauth         1.7.0                          4f8a5f15ed41        6 weeks ago         20.9MB
mendersoftware/deployments        1.6.0                          269b4f9da434        6 weeks ago         25.5MB
mendersoftware/useradm            1.6.0                          130aa6597c5d        6 weeks ago         20.4MB
mendersoftware/mender-conductor   1.1.0                          4c6241bb901f        6 weeks ago         213MB
mendersoftware/api-gateway        1.5.0                          746efa095c2e        6 weeks ago         54.1MB
mendersoftware/inventory          1.4.1                          808629dab7d9        6 weeks ago         19.1MB
mendersoftware/gui                1.6.0                          bc3ef0473f7a        6 weeks ago         42.2MB
mendersoftware/deviceauth         1.6.0                          34218cad95f5        6 weeks ago         20.5MB
mendersoftware/deviceadm          1.4.1                          cbec9beee1c7        6 weeks ago         18.9MB
mendersoftware/deployments        1.5.0                          0225f24fc43a        6 weeks ago         24.9MB
mendersoftware/elasticsearch      2.4                            122c75b1adca        23 months ago       344MB
mendersoftware/minio              RELEASE.2016-12-13T17-19-42Z   52d05f1497a3        2 years ago         275MB
mendersoftware/openresty          1.11.2.2-alpine                79d83bdde8a7        2 years ago         44.9MB

Drew Moseley

unread,
Jan 30, 2019, 8:49:31 AM1/30/19
to men...@lists.mender.io

On Tuesday, January 29, 2019 at 9:50:23 AM UTC-8, Mirza Krak wrote:
On Tue, Jan 29, 2019 at 2:27 AM Shawn Stevenson
<shawn.e....@gmail.com> wrote:
>
> Hello,
>
> I have followed the instructions for production installation for Mender 1.7 (https://docs.mender.io/1.7/administration/production-installation) and I have a problem with the mender-deployments service. It is restarting about once every minute. There appears to be a problem occurring inside the file migrator_simple.go. Here is a small piece of the log file (the same messages repeat over and over again):
>
>
> ^[[36mmender-deployments_1    |^[[0m WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
> ^[[36mmender-deployments_1    |^[[0m time="2019-01-29T00:22:16Z" level=info msg="Deployments Service, version unknown starting up" file=main.go func=main.cmdServer line=103
> ^[[36mmender-deployments_1    |^[[0m time="2019-01-29T00:22:16Z" level=info msg="automigrate is ON, will apply migrations" file=migrations.go func=migrations.Migrate line=48
> ^[[36mmender-deployments_1    |^[[0m time="2019-01-29T00:22:16Z" level=info msg="migrating deployment_service" file=migrations.go func=migrations.MigrateSingle line=70
> ^[[36mmender-deployments_1    |^[[0m time="2019-01-29T00:22:16Z" level=info msg="applying migration from version 0.0.0 to 1.2.1" db="deployment_service" file="migrator_simple.go" func="migrate.(*SimpleMigrator).Apply" line=101
> ^[[36mmender-deployments_1    |^[[0m time="2019-01-29T00:22:16Z" level=error msg="migration from 0.0.0 to 1.2.1 failed: index not found with name [deploymentconstructor.name_text_deploymentconstructor.artifactname_text]" db="deployment_service" file="migrator_simple.go" func="migrate.(*SimpleMigrator).Apply" line=106
> ^[[36mmender-deployments_1    |^[[0m failed to run migrations: failed to apply migrations: failed to apply migration from 0.0.0 to 1.2.1: index not found with name [deploymentconstructor.name_text_deploymentconstructor.artifactname_text]
>
> Does anybody have any idea what the problem could be?

I have pinged some people to get some feedback.

Can you also provide the output of "docker ps" and "docker images |
grep mendersoftware"


FWIW, I ran my local Mender 1.7.0 production environment and logs from mender-deployments are the same up to the "migration failed" from your logs. ie:

mender-deployments_1    | WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
mender-deployments_1    | time="2019-01-30T13:30:28Z" level=info msg="Deployments Service, version unknown starting up" file=main.go func=main.cmdServer line=103
mender-deployments_1    | time="2019-01-30T13:30:28Z" level=info msg="automigrate is ON, will apply migrations" file=migrations.go func=migrations.Migrate line=48
mender-deployments_1    | time="2019-01-30T13:30:28Z" level=info msg="migrating deployment_service" file=migrations.go func=migrations.MigrateSingle line=70
mender-deployments_1    | time="2019-01-30T13:30:28Z" level=info msg="applying migration from version 0.0.0 to 1.2.1" db="deployment_service" file="migrator_simple.go" func="migrate.(*SimpleMigrator).Apply" line=101
mender-deployments_1    | time="2019-01-30T13:30:28Z" level=info msg="DB migrated to version 1.2.1" db="deployment_service" file="migrator_simple.go" func="migrate.(*SimpleMigrator).Apply" line=140
mender-deployments_1    | time="2019-01-30T13:30:28Z" level=info msg="Deployments Service, version unknown starting up" file=main.go func=main.cmdServer line=123

In my case, I started with a completely empty docker environment, so no existing volumes, containers, etc. 

I replicated this on but Ubuntu 16 and 18.

Did you have anything existing in your docker environment?



Drew Moseley | Technical Solutions Architect | (+1) 480-797-0552 | https://mender.io

Northern.tech AS | @northerntechHQ | @drewmoseley

Shawn Stevenson

unread,
Jan 30, 2019, 11:32:54 AM1/30/19
to Mender List mender.io
I started with a new virtual machine and installed Docker Engine and Docker Compose and then set up Mender. So no, there was nothing existing in the Docker environment.

Peter Grzybowski

unread,
Jan 30, 2019, 1:25:28 PM1/30/19
to men...@lists.mender.io
Hello Shawn!

 I have recreated the environment exactly with versions you mentioned:

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.5 LTS
Release:        16.04
Codename:       xenial
# docker -v
Docker version 18.09.1, build 4c52b90
# docker-compose --version
docker-compose version 1.23.2, build 1110ad01
# uname -a
Linux xxxx 4.4.0-1072-aws #82-Ubuntu SMP Fri Nov 2 15:00:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

(note slight difference in docker-compose)

I got this in deployments logs:

WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
time="2019-01-30T18:05:27Z" level=info msg="Deployments Service, version unknown starting up" file=main.go func=main.cmdServer line=103
time="2019-01-30T18:05:27Z" level=info msg="automigrate is ON, will apply migrations" file=migrations.go func=migrations.Migrate line=48
time="2019-01-30T18:05:27Z" level=info msg="migrating deployment_service" file=migrations.go func=migrations.MigrateSingle line=70
time="2019-01-30T18:05:27Z" level=info msg="migration to version 1.2.1 skipped" db="deployment_service" file="migrator_simple.go" func="migrate.(*SimpleMigrator).Apply" line=125
time="2019-01-30T18:05:27Z" level=info msg="DB migrated to version 1.2.1" db="deployment_service" file="migrator_simple.go" func="migrate.(*SimpleMigrator).Apply" line=140
time="2019-01-30T18:05:27Z" level=info msg="Deployments Service, version unknown starting up" file=main.go func=main.cmdServer line=123

could you please provide the output of:

# while read; do echo; echo $REPLY; docker logs "${REPLY}"; done < <(docker ps | tail -n+2 | awk '{print($1);}')

and maybe ./run ps

maybe you also could share the list of commands you issued, to arrive at this state. could you also provide uname and host setup, like mem, swap, hard disks?
 I guess I would try again, just remove all containers:

# docker system prune; docker container stop $(docker container ls -aq); docker container rm $(docker container ls -aq); while read; do docker image rm $REPLY; done < <(docker image ls | awk '{print($(NF-4));}')

and run

./run up -d

again.
 Please keep in mind, that all those commands assume that you have a clean and new host for testing only, since they will play with all docker images, so do not run them on any existing env of yours.

cheers,
peter


--
You received this message because you are subscribed to the Google Groups "Mender List mender.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mender+un...@lists.mender.io.
To post to this group, send email to men...@lists.mender.io.
Visit this group at https://groups.google.com/a/lists.mender.io/group/mender/.

Shawn Stevenson

unread,
Jan 30, 2019, 2:20:35 PM1/30/19
to Mender List mender.io, piotr.gr...@northern.tech
Hello Peter,

I have attached a file, log.txt, with the output of the docker logs command you provided. Here is the output of ./run ps:

                 Name                                Command                  State               Ports        
----------------------------------------------------------------------------------------------------------------
menderproduction_mender-api-gateway_1     /entrypoint.sh                   Up             0.0.0.0:443->443/tcp 
menderproduction_mender-conductor_1       /srv/start_conductor.sh          Up             8080/tcp             
menderproduction_mender-deployments_1     /entrypoint.sh --config /e ...   Restarting                          
menderproduction_mender-device-auth_1     /usr/bin/deviceauth --conf ...   Up             8080/tcp             
menderproduction_mender-elasticsearch_1   /docker-entrypoint.sh elas ...   Up             9200/tcp, 9300/tcp   
menderproduction_mender-gui_1             /entrypoint.sh                   Up             80/tcp               
menderproduction_mender-inventory_1       /usr/bin/inventory --confi ...   Up             8080/tcp             
menderproduction_mender-mongo_1           docker-entrypoint.sh mongod      Up             27017/tcp            
menderproduction_mender-redis_1           /redis/entrypoint.sh             Up             6379/tcp             
menderproduction_mender-useradm_1         /usr/bin/useradm --config  ...   Up             8080/tcp             
menderproduction_minio_1                  /usr/bin/docker-entrypoint ...   Up (healthy)   9000/tcp             
menderproduction_storage-proxy_1          /usr/local/openresty/bin/o ...   Up             0.0.0.0:9000->9000/tcp


And here are the versions of Docker, Ubuntu, kernel:

$ lsb_release -a

No LSB modules are available.
Distributor ID:    Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:    16.04
Codename:    xenial

$ docker -v

Docker version 18.09.1, build 4c52b90

$ docker-compose --version

docker-compose version 1.23.2, build 1110ad01

$ uname -a
Linux ubuntu16 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

So my kernel build appears to be a lot older than yours. Perhaps I need to apply some updates...

I removed all containers and started again. I still had the same problem.

Cheers,
Shawn
log.txt

Peter Grzybowski

unread,
Jan 30, 2019, 5:35:26 PM1/30/19
to Shawn Stevenson, Mender List mender.io

 thanks.
 I do not see these lines in my logs:

2019-01-30T18:36:28.632+0000 I COMMAND  [conn6] CMD: dropIndexes deployment_service.deployments

 I have to investigate further.
 is that a virtualbox vm you are running?

peter

Shawn Stevenson

unread,
Jan 30, 2019, 6:19:43 PM1/30/19
to Mender List mender.io, shawn.e....@gmail.com, piotr.gr...@northern.tech
No it is not virtualbox. It is a KVM (kernel-based virtual machine) running under EdgeLinux.

Drew Moseley

unread,
Jan 31, 2019, 3:11:13 AM1/31/19
to men...@lists.mender.io

On 1/30/19 8:20 PM, Shawn Stevenson wrote:

So my kernel build appears to be a lot older than yours. Perhaps I need to apply some updates...

Hi Shawn,

Is upgrading your VM a possibility? You are on an older point release of Ubuntu as well.  It would definitely be interesting to know if that is somehow the cause.

Shawn Stevenson

unread,
Jan 31, 2019, 12:32:47 PM1/31/19
to men...@lists.mender.io
Hi Drew,

I upgraded the VM to Ubuntu 18.04.1 yesterday. I then removed all containers as Peter had shown and restarted. The problem is still there.

Virus-free. www.avg.com

--
You received this message because you are subscribed to a topic in the Google Groups "Mender List mender.io" group.
To unsubscribe from this topic, visit https://groups.google.com/a/lists.mender.io/d/topic/mender/cjzMdWESh9k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mender+un...@lists.mender.io.

Peter Grzybowski

unread,
Jan 31, 2019, 6:26:23 PM1/31/19
to Mender List mender.io
Hello Shawn,

 I have created a vm and run it with kvm. I could not reproduce the issue.
 I can share the image and the config of the vm with you, if you would like to try to run it on your host. it is about 1.7GB. We could in this way eliminate some strange host-related problem.

cheers,
peter


You received this message because you are subscribed to the Google Groups "Mender List mender.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mender+un...@lists.mender.io.

Piotr Grzybowski

unread,
Feb 1, 2019, 3:38:21 AM2/1/19
to men...@lists.mender.io
Hello Shawn,

 one more thing I need: your prod.yml file and contents of /etc/resolv.conf from the vm. Could you please provide them?

yours,
peter


You received this message because you are subscribed to the Google Groups "Mender List mender.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mender+un...@lists.mender.io.

Shawn Stevenson

unread,
Feb 1, 2019, 6:55:23 PM2/1/19
to men...@lists.mender.io
Hi Peter,

I am not in the office today, so I'll have to get you those files next week. 

One thing that I noticed yesterday when starting/stopping the Docker containers was a warning about network timeout exceeded. So I suspect I have performance problems to deal with. I will investigate more next week. I may also try my same setup on a different computer running an Ubuntu virtual machine with VMware. 

Peter Grzybowski

unread,
Feb 2, 2019, 2:50:27 AM2/2/19
to Mender List mender.io
Hi Shawn,

 Of course. Please check that what you put in prod.yml in:

 mender-api-gateway.ALLOWED_HOSTS

 storage-proxy.networks.mender.aliases

 mender-deployments.environment.DEPLOYMENTS_AWS_URI

and what you supplied in environment variables for this command:

CERT_API_CN=here CERT_STORAGE_CN=and.here ../keygen

is resolvable, and reachable from the host you run the containers:

ping -c1 here;
ping -c1 and.here;
nc -z -v and.here 9000;

and matches (e.g.: 'ALLOWED_HOSTS: here' and keeping the same example names 'aliases: - and.here', 'DEPLOYMENTS_AWS_URI: https://and.here:9000')
 Good luck.

cheers,
peter





Shawn Stevenson

unread,
Feb 4, 2019, 1:47:09 PM2/4/19
to Mender List mender.io, piotr.gr...@northern.tech
Hi Peter,

Here is the contents of /etc/resolv.conf:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.

nameserver 192.168.1.1
nameserver 127.0.0.53
search bblv

I have also emailed you prod.yml separately.

Thanks,
Shawn

On Friday, February 1, 2019 at 12:38:21 AM UTC-8, Piotr Grzybowski wrote:
Hello Shawn,

 one more thing I need: your prod.yml file and contents of /etc/resolv.conf from the vm. Could you please provide them?

yours,
peter


On Thu, Jan 31, 2019 at 6:32 PM Shawn Stevenson <shawn.e....@gmail.com> wrote:
Hi Drew,

I upgraded the VM to Ubuntu 18.04.1 yesterday. I then removed all containers as Peter had shown and restarted. The problem is still there.

Virus-free. www.avg.com

Shawn Stevenson

unread,
Feb 4, 2019, 1:56:01 PM2/4/19
to Mender List mender.io, piotr.gr...@northern.tech
Hi Peter,

The ping and netcat checks work fine. In my setup the URL for "here" and "and.here" is exactly the same. Is that a problem?

I tried moving my setup to a different VM on a different computer. The host machine is running VMware Workstation 12 Player, and the VM itself is running Ubunutu 16.04.1. I still see the same behaviour with the different setup.

Thanks,
Shawn

On Friday, February 1, 2019 at 11:50:27 PM UTC-8, Peter Grzybowski wrote:
Hi Shawn,

 Of course. Please check that what you put in prod.yml in:

 mender-api-gateway.ALLOWED_HOSTS

 storage-proxy.networks.mender.aliases

 mender-deployments.environment.DEPLOYMENTS_AWS_URI

and what you supplied in environment variables for this command:

CERT_API_CN=here CERT_STORAGE_CN=and.here ../keygen

is resolvable, and reachable from the host you run the containers:

ping -c1 here;
ping -c1 and.here;
nc -z -v and.here 9000;

and matches (e.g.: 'ALLOWED_HOSTS: here' and keeping the same example names 'aliases: - and.here', 'DEPLOYMENTS_AWS_URI: https://and.here:9000')
 Good luck.

cheers,
peter






On Sat, Feb 2, 2019 at 12:55 AM Shawn Stevenson <shawn.e....@gmail.com> wrote:
Hi Peter,

I am not in the office today, so I'll have to get you those files next week. 

One thing that I noticed yesterday when starting/stopping the Docker containers was a warning about network timeout exceeded. So I suspect I have performance problems to deal with. I will investigate more next week. I may also try my same setup on a different computer running an Ubuntu virtual machine with VMware. 

Peter Grzybowski

unread,
Feb 5, 2019, 6:09:03 PM2/5/19
to Shawn Stevenson, Mender List mender.io
Hello Shawn,

 I have verified with the same setup you have (concerning "here", and "and.here" ;-)) including the names and the limits, and still cant replicate the issue.
 I would configure dnsmasq and point your domain to localhost (as we discussed), since that is the only difference I can see.

cheers,
peter


Reply all
Reply to author
Forward
0 new messages