ETT hosted locally, locks up when multiple simultaneous large documents are validated

108 views
Skip to first unread message

dan venton

unread,
Mar 11, 2019, 7:10:05 AM3/11/19
to Edge Test Tool (ETT)
My QA says, "The java service never recovers till a hard restart.  With a 3mb document, about 5 to 6 pseduo concurrent hits is enough for a kill. With a ~1.3mb doc too can kill it rather easily with my load test.  While not exactly scientific, via trial & error I am trying to find a doc size it can sustain under concurrently load."

AFAIK we are running default settings on almost everything. If there are particular config files that you want to see I can provide them.

Java 8 151
tomcat 8.5.27
referenceccdaservice.war Jan 2019 release

Dan Brown

unread,
Mar 12, 2019, 5:45:27 PM3/12/19
to Edge Test Tool (ETT)
To be clear, to diagnose and fix this issue - are we talking about a possible need to enhance the validator's multi-threaded support or are we talking about isolated documents being tested sequentially by the same person from the same server? How is the psuedo concurrent testing being completed? 
In either case it seems apparent the issue is related to large-sized documents - do you know how much RAM is allocated to your server?

Thanks,
Dan

Dan Brown

unread,
Mar 12, 2019, 5:47:11 PM3/12/19
to Edge Test Tool (ETT)
Also, I'd like to know if your QA has the same issues on the Feb 2019 release as this is the current base:

dan venton

unread,
Mar 13, 2019, 3:40:38 PM3/13/19
to Edge Test Tool (ETT)
Our server has 24GB of memory. Java was running in 4GB (--JvmMx 4096) mode. (We have since upped that value to 8192 to see if it helps.)

I've provided the Feb 2019 war file to them. We'll see what that does.

They are using multiple simultaneous submits.

Tedford Johnson

unread,
Mar 13, 2019, 5:42:31 PM3/13/19
to Edge Test Tool (ETT)
Using v1.0.39 of the war we ran a 20 minute test with 12 callers each sending 1.3 MB documents.  During that time 48 requests were issued and only 2 succeeded with an average call duration of 295s.  Tomcat used all 8GB allocated for it and kept the CPU pegged.

Thanks,
Ted

Dan Brown

unread,
Mar 19, 2019, 11:45:44 AM3/19/19
to Edge Test Tool (ETT)
Thanks. Any chance you can provide the tomcat log file for the test described? Also, after the test was completed did the application crash or run out of memory, etc?

Sundar

unread,
Mar 29, 2019, 10:03:38 AM3/29/19
to Edge Test Tool (ETT)
Hello Dan,
Sorry about the delay.  I reproduced the issue with my load test.  Details below. 


I have put zip of logs folder, a zip of the entire apache-tomcat folder and also the 3mb CDA document I send via my load test. Let me know if you cannot download these.

Details:
Server: windows 20012 R2.     Ram:16gb
Tomcat version: 8.5.27
You War file version: 1.0.39
JvmMx 8192
Java version: java version "1.8.0_151"

Loadtest: 
- 12 virtual users, sending requests with a 3mb cda document.  Over a 14 minutes, 44 requests were set (so approx 3 requests per minute). 
- The first 10 requests passed (got a 200 response) after that all requests started failing. 
- The tomcat service used all 8gb and was continued to peg the cpu at almost 100% even after the load test was stopped.  I have to kill the tomcat service and restart to get it operational again.

NOTE: If I use cda documents under 1mb,the tomcat service does not lock up. So per our trialsa a fe whits with larger documents is the culprit to lock up the tomcat service. 
Also: I the 44 requests I had sent with the 3mb CDA, I saw exactly 44 .tmp files 3mb each under this path: ReferenceCcdaService\apache-tomcat-8.5.27\work\Catalina\localhost\referenceccdaservice


Let us know what you find.  This has been come a high issue for us. As a poor man's workaround, we are rejecting requests larger than 1mb to keep the tomcat service alive.
Thank you

Sundar

unread,
Apr 8, 2019, 4:23:16 PM4/8/19
to Edge Test Tool (ETT)
Hello Dan,
Any update on this thread for us?

Even after we filter out larger requests (cda document > 1mb), we are seeing the service still falling over in our production. It won't recover till we restart tomcat.  Now that we have eliminated large cda documents, our speculation is that it's some data in the cda documents that is causing it to fall over.  

Dan Brown

unread,
Apr 9, 2019, 10:10:29 AM4/9/19
to Edge Test Tool (ETT)
Is there somehwere else you can host the data? I can't access it.

Sundar

unread,
Apr 9, 2019, 2:08:57 PM4/9/19
to Edge Test Tool (ETT)
1. I have attached the logs (zipped) to this reply. 
2. I have also zipped up our entire apache-tomcat folder and put it here (you should be able to download). https://drive.google.com/open?id=1pgcGSTXimsjVFopThmnU781P8Pz7TtXe

Please let us know you need anything more from us.  Or if you need us to try something etc...  Also is there any way we can get on a call with you? 

Thanks again.
Sundar.
logs.zip
CDADoc3mb.zip

Dan Brown

unread,
Apr 16, 2019, 11:48:55 AM4/16/19
to Edge Test Tool (ETT)
Hi,

I had a short window to look into this.

In the logs there are multiple issues going on:
-Issue loading vocab directory on startup (not sure the cause, not familiar to me - I will look at the tomcat dir provided when time)
--08:42:10,517 ERROR [VocabularyLoadRunner:107] Failed to load 
-Every call seems to provide an invalid objective
--That means vocab is not run so no reason to have loaded it
-Issues parsing the specific file
--org.eclipse.emf.ecore.xml.type.impl.XMLTypeDocumentRootImpl cannot be cast to org.eclipse.mdht.uml.cda.DocumentRoot
at org.eclipse.mdht.uml.cda.util.CDAUtil.load(CDAUtil.java:273)
-It is likely there are issues with concurrency at this exterme

At the most basic level, I think we need to improve both the handling of large documents as well as (and most importantly in this case) concurrency handling in the application to meet the demand you have specified. A fix for this this would require a large overhaul of the codebase. This would be a focused effort so I can't be certain when it will be completed. We are looking into in the near future though and are always interested in improving the tool.

On a more specific level, I tested CDADoc3mb.xml locally and it ran without issue vs IG_Only only and IG_with vocab (although takes ~ 5 mins for the latter).

Curiously though, the first test in your log encounters a service error. This would seemingly be unrelated to concurrency. Was the first file you tested CDADoc3mb.xml l? Do you always get a service error in the log for the file CDADoc3mb.xml, even if it's the first and only file tested and only test run? If so, that is some sort of local issue. Otherwise, it is related to the concurrency.

Thanks,
Dan

Sundar

unread,
Apr 18, 2019, 9:50:53 AM4/18/19
to Edge Test Tool (ETT)
Hello Dan,
Please seem my answers in-line below. (Also when you review the tomcat dir, google drive link provided in my earlier post, let us know if something is wrong that will help improve our situation.)

Thanks again.


On Tuesday, April 16, 2019 at 11:48:55 AM UTC-4, Dan Brown wrote:
Hi,

I had a short window to look into this.

In the logs there are multiple issues going on:
-Issue loading vocab directory on startup (not sure the cause, not familiar to me - I will look at the tomcat dir provided when time)
--08:42:10,517 ERROR [VocabularyLoadRunner:107] Failed to load 
-Every call seems to provide an invalid objective
--That means vocab is not run so no reason to have loaded it
-Issues parsing the specific file
--org.eclipse.emf.ecore.xml.type.impl.XMLTypeDocumentRootImpl cannot be cast to org.eclipse.mdht.uml.cda.DocumentRoot
at org.eclipse.mdht.uml.cda.util.CDAUtil.load(CDAUtil.java:273)
-It is likely there are issues with concurrency at this exterme

At the most basic level, I think we need to improve both the handling of large documents as well as (and most importantly in this case) concurrency handling in the application to meet the demand you have specified. A fix for this this would require a large overhaul of the codebase. This would be a focused effort so I can't be certain when it will be completed. We are looking into in the near future though and are always interested in improving the tool.
[Sundar] Yes, even after preventing larger documents from reaching the service, the service does fall down at our production load levels. So we too believe there is a concurrency issue. 
Question: The 'referenceccdaservice.yyy-mm-dd.log file, can you confirm if it can have any PHI (patient health information like patient demographics, ID's etc)? I want to get you this log file from our production environment, but I don't know if it could have PHI. 
Also, for the interim, any suggestion on how we can minimize this concurrency issue? May be scale out many tomcat validation serivce instances (as docker containers) or something?

On a more specific level, I tested CDADoc3mb.xml locally and it ran without issue vs IG_Only only and IG_with vocab (although takes ~ 5 mins for the latter).
[Sundar] Yes, if it's the only request and the calling client waits long enough, then it works. 

Curiously though, the first test in your log encounters a service error. This would seemingly be unrelated to concurrency. Was the first file you tested CDADoc3mb.xml l?
[Sundar]: I did some more testing and observed. the referenceccdaservice log. So we have a health check feature that runs every 30 seconds and sends a request to the tomcat validator service, the payload there is an tiny xml stud and not a cda document. This error '"The service has encountered an error parsing the document." you see  repeating every 30 secs is from those health check requests. 


Do you always get a service error in the log for the file CDADoc3mb.xml, even if it's the first and only file tested and only test run? If so, that is some sort of local issue. Otherwise, it is related to the concurrency.
[Sundar] For test purposes, we turned off that health check. Then send a request with that 3mb cda.  While I got a successful response back after about 5 minutes, the error reported in the log file (repeated 15 times)   was this:
"14:29:09,636 INFO  [VocabularyValidationService:101] Property referenceccda.isDynamicVocab is false; using preloaded default config for this and all future validations.
14:29:23,246 ERROR [CodeSystemCodeValidator:81] The following error was encountered when trying to check codeRepository.codeIsActive(...). It will be handled internally and considered inactive as the source is likely corrupt since it is returning multiple values.javax.persistence.NonUniqueResultException: result returns more than one elements
at org.hibernate.jpa.internal.QueryImpl.getSingleResult(QueryImpl.java:539)"

I have attached snip of the log file with this error

Thanks
errorswhen3mbCDA.txt
Reply all
Reply to author
Forward
0 new messages