Frequent deploys slowing QA down or blocking it

41 views
Skip to first unread message

jbe...@soldevelo.com

unread,
Dec 6, 2017, 9:27:00 AM12/6/17
to OpenLMIS Dev
I would like to mention again the issue that had, to my knowledge, been already discussed several times but still, no solution to it was found. One has to perform tests with the use of the most recent changes. It takes some time for the changes to appear on the test and performance servers, it is also impossible to test for some time after the server restarts because of the Bad Gateway error. When changes are made frequently, it can block testing for a really long time. There are situations in which for this reason the testing of a ticket which normally should take up to 20 minutes can take even two hours (e.g. this is what happened today). I think something needs to be done about it, as it can considerably slow the work down.

Mateusz Kwiatkowski

unread,
Dec 6, 2017, 12:22:03 PM12/6/17
to jbe...@soldevelo.com, OpenLMIS Dev
Hi everyone,

I've started a topic about this a few days ago. I was trying to test something on test server and it reloaded 3 times in 15 minutes. In my opinion this could be fixed simply by creating one job for deploying test server with 1h delay instead of having one for each repo. The problem could occur when someone wants to start testing right asap and changes haven't deployed on test server yet or if changes divided between multiple repositories are somehow connected and deploy occurred after first commit. For this issue I would simply create another build without a delay that could be run if needed. What do you think about this approach?

Regards
Mateusz

On Wed, Dec 6, 2017 at 3:27 PM, <jbe...@soldevelo.com> wrote:
I would like to mention again the issue that had, to my knowledge, been already discussed several times but still, no solution to it was found. One has to perform tests with the use of the most recent changes. It takes some time for the changes to appear on the test and performance servers, it is also impossible to test for some time after the server restarts because of the Bad Gateway error. When changes are made frequently, it can block testing for a really long time. There are situations in which for this reason the testing of a ticket which normally should take up to 20 minutes can take even two hours (e.g. this is what happened today). I think something needs to be done about it, as it can considerably slow the work down.


SolDevelo
Sp. z o.o. [LLC] / www.soldevelo.com
Al. Zwycięstwa 96/98, 81-451, Gdynia, Poland
Phone: +48 58 782 45 40 / Fax: +48 58 782 45 41

--
You received this message because you are subscribed to the Google Groups "OpenLMIS Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev+unsubscribe@googlegroups.com.
To post to this group, send email to openlm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openlmis-dev/e1e543e7-c4b6-4179-bd95-fb54fc4c470c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



SolDevelo
Sp. z o.o. [LLC] / www.soldevelo.com
Al. Zwycięstwa 96/98, 81-451, Gdynia, Poland
Phone: +48 58 782 45 40 / Fax: +48 58 782 45 41

Paweł Gesek

unread,
Dec 6, 2017, 12:44:22 PM12/6/17
to Mateusz Kwiatkowski, Joanna Bebak, OpenLMIS Dev
So yes, in the past we throttled each deploy to one per hour. This was sufficient at the time, but that was back when we had one UI repo. I understand that now the throttle hardly matters with our number of repositories. From what I recall, the reason we wanted separate deploy jobs for each components had to do with pipelines, but I'm not entirely sure about that. Would one throttled deploy job reused by all pipelines work here? I remember I also was looking into plugins that could throttle a group of jobs, but don't think we found anything sufficient.

Regards,
Paweł




For more options, visit https://groups.google.com/d/optout.



--

Paweł Gesek
Technical Project Manager
pge...@soldevelo.com / +48 690 020 875

Nick Reid

unread,
Dec 6, 2017, 12:48:44 PM12/6/17
to Mateusz Kwiatkowski, jbe...@soldevelo.com, OpenLMIS Dev

I'd be ok with changing when test is reloaded, so it only happens once an hour (perhaps on the hour, if possible). I think Pawel is right that we can point all our tasks to one deployment task, not ... 7 or 8


This does assume the root of the problem is that test.openlmis.org isn't predictable, which makes working with the test server difficult.


An alternative (just to bring it up) would be to use UAT as our QA server, and make updating that server something that is only done when QA-ing a ticket.


Nick Reid | nick...@villagereach.org
Software Developer, Information Systems Group


VillageReach Starting at the Last Mile
2900 Eastlake Ave. E, Suite 230, Seattle, WA 98102, USA
CELL: +1.510.410.0020
SKYPE: nickdotreid
www.villagereach.org



From: openlm...@googlegroups.com <openlm...@googlegroups.com> on behalf of Mateusz Kwiatkowski <mkwiat...@soldevelo.com>
Sent: Wednesday, December 6, 2017 9:22:01 AM
To: jbe...@soldevelo.com
Cc: OpenLMIS Dev
Subject: Re: [openlmis-dev] Frequent deploys slowing QA down or blocking it
 
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev...@googlegroups.com.

To post to this group, send email to openlm...@googlegroups.com.

Łukasz Lewczyński

unread,
Dec 7, 2017, 3:20:33 AM12/7/17
to Nick Reid, Mateusz Kwiatkowski, jbe...@soldevelo.com, OpenLMIS Dev
I am little afraid about having a single deploy job mainly because there could be a situations in which a developer will need to provide changes to several services and then on the test server there will be only changes from one repository because another waits 1 hour for a next deploy. I think maybe there could be only manual deploy to test server (connected with deploy to perftest server) that will be executed when there will be tickets to QA but this probably break CI/CD rules so this is more like a free thought. Also if we decide to change deploy jobs maybe it would be good to remove *-deploy-to-uat jobs because from what I know we don't use them.


Łukasz Lewczyński
Software Developer
llewc...@soldevelo.com

On Wed, Dec 6, 2017 at 6:48 PM, Nick Reid <nick...@villagereach.org> wrote:

I'd be ok with changing when test is reloaded, so it only happens once an hour (perhaps on the hour, if possible). I think Pawel is right that we can point all our tasks to one deployment task, not ... 7 or 8


This does assume the root of the problem is that test.openlmis.org isn't predictable, which makes working with the test server difficult.


An alternative (just to bring it up) would be to use UAT as our QA server, and make updating that server something that is only done when QA-ing a ticket.


--
You received this message because you are subscribed to the Google Groups "OpenLMIS Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev+unsubscribe@googlegroups.com.
To post to this group, send email to openlm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

jbe...@soldevelo.com

unread,
Dec 7, 2017, 3:53:02 AM12/7/17
to OpenLMIS Dev
I'm not sure if this would be a feasible and a good solution in this project but to me as a tester, it would be best e.g. to use UAT more as a test server (for the changes concerning finished tickets to be deployed), and e.g. to use the test server more as a dev server, i.e. for the deployment of changes that e.g. make some of the tests fail and still need to be worked on. This is only my suggestion, of course.

Josh Zamor

unread,
Dec 7, 2017, 6:58:40 AM12/7/17
to OpenLMIS Dev
I agree with many of the thoughts here.  test.openlmis.org doesn't quite fit the QA need, and we should look at using UAT in the meantime for QA purposes.  If that's not possible, we need a separate qa.openlmis.org.  This server's re-deployment would be owned by QA.  There is still a need for multiple QA environments, notably when doing team-wide regression testing, however we can still wait on that.

There is also a desire I'm hearing from QA to be able to deploy to this QA server finished tickets.  We would need a mechanism such as GitHub Flow or feature flags to be able to do something like that, and indeed this is similar to our need for having a Git flow approach which we think we need for a more stable release process.  That said, those are tools we don't have today.  A tool which we do have today is our CI and CD approach, which can very well help with in-process tickets effecting the stability of the SNAPSHOT versions QA is testing.  CI and CD however rely on good automated testing - enough coverage and testing the right things.  When was the last time we added a new contract test?  Why don't our unit / integration / component tests catch more and provide us early feedback?  While we work on the other tools (Git flow, GitHub flow, etc), I'd like to encourage us to remember the importance of our existing test infrastructure.  We already have that tool and it can help us in many different ways, QA testing included.

That's what I'd encourage.  As for immediate next step on my end, I'll be sure to bring this up to Team ILL for the UAT server.

Best,
Josh

Nick Reid

unread,
Dec 7, 2017, 2:17:49 PM12/7/17
to Łukasz Lewczyński, Mateusz Kwiatkowski, jbe...@soldevelo.com, OpenLMIS Dev

Lukasz --


Could you explain the need for a developer to get their changes "live" on test quickly?


I feel like getting to some concrete needs might help guide this discussion....


❤znick


Nick Reid | nick...@villagereach.org

Software Developer, Information Systems Group


VillageReach Starting at the Last Mile
2900 Eastlake Ave. E, Suite 230, Seattle, WA 98102, USA
CELL: +1.510.410.0020
SKYPE: nickdotreid
www.villagereach.org



From: Łukasz Lewczyński <llewc...@soldevelo.com>
Sent: Thursday, December 7, 2017 12:20:09 AM
To: Nick Reid
Cc: Mateusz Kwiatkowski; jbe...@soldevelo.com; OpenLMIS Dev

samim.vil...@gmail.com

unread,
Dec 7, 2017, 3:26:58 PM12/7/17
to OpenLMIS Dev
I agree with Josh's points about the automated testing, but that is not the problem that I've experienced, and what I believe Joanna is experiencing in the test environment. When one ticket in the sprint is ready to be tested there are others that are in progress, and as a tester starts validating the first ticket other changes are being introduced to the test environment and can cause problems with our testing or sometimes wipe out the test data. 

I've talked to Josh about these problems many times, and I think the best case scenario is the one he suggested, where we have a separate qa.openlmis.org environment that the qa team owns and controls when changes are deployed. 

I would like suggestions on how to manage this environment or manage the tickets better so that we continue to support scenarios where tickets might need to be tested together if they require changes to the same component so that we catch any regressions. Scheduling an hourly refresh also sounds like a good idea.

- Sam

Paweł Albecki

unread,
Dec 11, 2017, 5:47:02 AM12/11/17
to Josh Zamor, OpenLMIS Dev
I agree that we should make more use of our existing test infrastructure. We definitely need more contract tests which wasn't added for new features for a while. From my experience I can say lack of them is one of the most frequent reason that test server is "blocked". The question is when such tests should be added (during work on new feature or in separate ticket created by QA?) and how to enforce that they will be sufficient. 
Regarding integration and component tests, do even really have the latter? I feel like in IT Gradle task we only have integration tests and only some of them meet definition of component test (from RTD).

Regards,
Paweł

--
You received this message because you are subscribed to the Google Groups "OpenLMIS Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev+unsubscribe@googlegroups.com.
To post to this group, send email to openlm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Paweł Albecki
Software Developer
palb...@soldevelo.com

Łukasz Lewczyński

unread,
Dec 11, 2017, 7:12:58 AM12/11/17
to Paweł Albecki, Josh Zamor, OpenLMIS Dev
From my point of view it would be better to create contract tests in separate tickets to avoid situations where we create tests that only check the happy path.


Łukasz Lewczyński
Software Developer
llewc...@soldevelo.com


For more options, visit https://groups.google.com/d/optout.

Łukasz Lewczyński

unread,
Dec 11, 2017, 7:26:52 AM12/11/17
to Nick Reid, OpenLMIS Dev
Nick It is easy to imagine a situation where we modify the existing endpoint that is used in many places by different services (e.g. facility endpoints). If now we change the endpoint in that way the schema will be modified and other services will not be able to handle the new schema - like move endpoint to pageable version. This will probably make the system unavailable, the QA will be blocked and we are not be able to do anything until to the next deploy to the test server which will fix the issue. Probably there will be another job to manually do the deploy (without any delay) but I think it would be used to many time because if it does not have any delay so why I should wait.

I think a separate server for QA (handled only by the QA) is a very good idea and I hope it will speed up the QA process. I am only afraid that we could have too many servers to maintain. From what I know now we have four: test server, UAT, perftest server, demo server. Maybe instead of creating a new one we could use the UAT?


Łukasz Lewczyński
Software Developer
llewc...@soldevelo.com

Paweł Albecki

unread,
Dec 11, 2017, 8:07:24 AM12/11/17
to Łukasz Lewczyński, OpenLMIS Dev
If now we change the endpoint in that way the schema will be modified and other services will not be able to handle the new schema - like move endpoint to pageable version.  This will probably make the system unavailable, the QA will be blocked and we are not be able to do anything until to the next deploy to the test server which will fix the issue. '

Is it not possible to push all changes at the same time so they will be deployed once?
 

For more options, visit https://groups.google.com/d/optout.



--

Paweł Albecki
Software Developer
palb...@soldevelo.com

Łukasz Lewczyński

unread,
Dec 11, 2017, 8:33:31 AM12/11/17
to Paweł Albecki, OpenLMIS Dev
Even if a developer push changes from different repositories at the same time, jobs will be executed in different times and basically it is impossible to achieve it.


Łukasz Lewczyński
Software Developer
llewc...@soldevelo.com

Paweł Albecki

unread,
Dec 11, 2017, 8:47:41 AM12/11/17
to Łukasz Lewczyński, OpenLMIS Dev
How so? I thought we are talking about one deploy at hour, so all that developer needs to do is push all changes before the bell tolls.

Łukasz Lewczyński

unread,
Dec 11, 2017, 8:56:02 AM12/11/17
to Paweł Albecki, OpenLMIS Dev
Yes the will be a single deploy job but before this job we have a lot of other jobs for instance: reference-data-service, reference-data-erd-generation, reference-data-sonar, requisition-service, etc. So it is possible that the referencedata-service pipeline will execute the deploy job before requisition-service pipeline will be completed for example because contract tests for requisition takes longer than the same type of tests for reference-data.


Łukasz Lewczyński
Software Developer
llewc...@soldevelo.com

Paweł Albecki

unread,
Dec 11, 2017, 9:02:37 AM12/11/17
to Łukasz Lewczyński, OpenLMIS Dev
This is actually good point, currently our job for deploy all services doesn't wait for contract test to pass. It's enough for him that image is built. We will have to modify this somehow, so Jenkins job pull from Docker Hub image only if contract tests pass for given service.

Łukasz Lewczyński

unread,
Dec 11, 2017, 9:05:33 AM12/11/17
to Paweł Albecki, OpenLMIS Dev
I this this is handled by jenkins so if any job in pipeline fails then the deploy job is not executed: Pipeline #1848 <- I hope this pipeline will still be visible.


Łukasz Lewczyński
Software Developer
llewc...@soldevelo.com

Paweł Albecki

unread,
Dec 11, 2017, 9:21:04 AM12/11/17
to Łukasz Lewczyński, OpenLMIS Dev
You said yourself that "referencedata-service pipeline will execute the deploy job before requisition-service pipeline will be completed". So is that not possible that requisition service is built to image and is deployed with deploy-all-job before requisition contract tests finish?

Łukasz Lewczyński

unread,
Dec 11, 2017, 9:29:25 AM12/11/17
to Paweł Albecki, OpenLMIS Dev
Probably yes, but what in situation where I have to modify 10 services? Currently the Jenkins only handle 3 jobs in the same time (also there could be a gap because sometimes a slave instance have to be created)  and for some type of jobs only one can be executed (like deploy, performance). In this situation not all docker images will be created before deploy.

Also I don't think is correct to think like: this pipeline is not completed but the test server will probably have new version of image. In my opinion, If service pipeline is not completed, the service should not be deployed on the test server because what should we do in situation where a new version of service was created, deployed on test server but then related contract-tests fails?

Łukasz Lewczyński
Software Developer
llewc...@soldevelo.com

Paweł Albecki

unread,
Dec 11, 2017, 10:33:36 AM12/11/17
to Łukasz Lewczyński, OpenLMIS Dev
Yes, this is what I'm trying to raise and I wonder how we can resolve this. Maybe we should stay with many deploy-to-test jobs and change them so only one service is deployed without restarting whole OpenLMIS.

Łukasz Lewczyński

unread,
Dec 12, 2017, 3:19:25 AM12/12/17
to Paweł Albecki, OpenLMIS Dev
According to the slack channel the QA will use the UAT server. I hope this will resolve some issues and speed up the testing.


Łukasz Lewczyński
Software Developer
llewc...@soldevelo.com

jbe...@soldevelo.com

unread,
Dec 21, 2017, 8:43:43 AM12/21/17
to OpenLMIS Dev
On December 12, Sam posted the following message on the #qa Slack channel:

@here Re: Dev forum discussion about testing environment "Frequent deploys slowing QA down or blocking it"
We would like to come to an agreement on which environment to use for testing. Since uat.openlmis.org is not used for demos anymore (we demo in demo-v3.openlmis.org), we would like to have the test team (@sammieim, @jbebak) use uat instead of test.openlmis.org so that we can avoid QA downtime. I would like to hear back from the teams if anyone disagrees. Otherwise we will move forward with uat. 
- If anyone is using uat for any other purposes, please let @joshzamor, @jbebak and myself know
- Deployment to uat will be scheduled. Should this be hourly or a different schedule? Please provide feedback here.
- @joshzamor will notify the channel here when the environment is ready (deploy jobs scheduled and running)

Yet nobody responded to this message up until now. Do any of you have any feedback as far as the above ideas are concerned? Has any progress been made with resolving the frequent deploys issue? As far as I can see, UAT still doesn't have the latest changes, so it can't be used for testing.

To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev...@googlegroups.com.

To post to this group, send email to openlm...@googlegroups.com.
--

Paweł Albecki
Software Developer
palb...@soldevelo.com



SolDevelo
Sp. z o.o. [LLC] / www.soldevelo.com
Al. Zwycięstwa 96/98, 81-451, Gdynia, Poland
Phone: +48 58 782 45 40 / Fax: +48 58 782 45 41

--
You received this message because you are subscribed to the Google Groups "OpenLMIS Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev...@googlegroups.com.

To post to this group, send email to openlm...@googlegroups.com.


SolDevelo
Sp. z o.o. [LLC] / www.soldevelo.com
Al. Zwycięstwa 96/98, 81-451, Gdynia, Poland
Phone: +48 58 782 45 40 / Fax: +48 58 782 45 41

--
You received this message because you are subscribed to the Google Groups "OpenLMIS Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev...@googlegroups.com.

Josh Zamor

unread,
Dec 29, 2017, 6:11:28 PM12/29/17
to OpenLMIS Dev
It took longer than we thought as there was a blocking issue that needed to be resolved first, however the UAT instance is now ready for QA to use it:

  • it's running the latest SNAPSHOT versions, just as test is.
  • it rebuilds every hour on the hour - and wipes any previous data.

This issue tracked the change:  https://openlmis.atlassian.net/browse/OLMIS-3873


Thanks for bringing this up Joanna, please take a look at http://uat.openlmis.org and let us know if this new approach helps your process.


Best,

Josh

jbe...@soldevelo.com

unread,
Jan 18, 2018, 3:30:12 AM1/18/18
to OpenLMIS Dev
After a whole sprint with the new solution, I can say that moving testing to UAT and scheduling once-an-hour deploys has helped me greatly. I am not slowed down or blocked by deploys anymore, so thank you very much for implementing this solution.
Reply all
Reply to author
Forward
0 new messages