jobs not kicking off

59 views
Skip to first unread message

azhrei

unread,
May 3, 2019, 4:53:50 PM5/3/19
to IRIS, Caché, Ensemble
HI 

we are running cache 2015-2 on VMS 8.4, and for some reason we can not find a reason for the following, once a month (usually mid month) right after midnight we have a window of time usually 2 hours sometimes less sometimes more.
that we are unable to "start background job"  has anyone seen this? and if so where? our vendor has not been able to come up with a solution or where the problem lies,  and it only started in November. 

also there is a time during that saturday that ODBC jobs can not connect either.

we are pegged CPU wise from midnight till about 7am at 100% across 20 CPUS.

any thoughts would be appreciated.


thanks 
Paul 

Bharathsimha reddy

unread,
May 3, 2019, 5:31:43 PM5/3/19
to intersystems...@googlegroups.com
Hey Paul,
Do you happen to have any dates that the issue specifically occurred ?

Do you know the list of daily midnight jobs that run on your application ?

Is there any different job on the days when you happen to see issues ?

Is your disk space good ?

Journal DB disk space ?

Any shadow server scheduled ?

Do you have any disk mount/unmount activities or backups scheduled ?

Did you try capturing the process list when your system hits 100% ?


Some of the questions may not be valid but help rule out some root causes.

Thanks,
Bharath.


--
--
IRIS, Caché, Ensemble
---
You received this message because you are subscribed to the Google Groups "IRIS, Caché, Ensemble" group.
To unsubscribe from this group and stop receiving emails from it, send an email to intersystems-publi...@googlegroups.com.
To post to this group, send email to intersystems...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

azhrei

unread,
May 9, 2019, 1:15:44 PM5/9/19
to IRIS, Caché, Ensemble
see below as to the answers...


On Friday, May 3, 2019 at 5:31:43 PM UTC-4, Simha wrote:
Hey Paul,
Do you happen to have any dates that the issue specifically occurred ?  it's happens friday night midnight to saturday am can last up to 2 hours once or so a  month in the middle.

Do you know the list of daily midnight jobs that run on your application ?  yes our night jobs which don't change.

Is there any different job on the days when you happen to see issues ?  nope

Is your disk space good ? yes

Journal DB disk space ? also good

Any shadow server scheduled ? no we are using mirroring, 2 ayncs, a DR and reporting

Do you have any disk mount/unmount activities or backups scheduled ?  backups on cloned disks from a freeze and thaw in the 6pm hour

Did you try capturing the process list when your system hits 100% ? yes, nothing is jumping out at us, we are at 100% every night from midnight to 6am

 

Some of the questions may not be valid but help rule out some root causes.

Thanks,
Bharath.


On Fri, May 3, 2019, 4:53 PM azhrei <pcov...@gmail.com> wrote:
HI 

we are running cache 2015-2 on VMS 8.4, and for some reason we can not find a reason for the following, once a month (usually mid month) right after midnight we have a window of time usually 2 hours sometimes less sometimes more.
that we are unable to "start background job"  has anyone seen this? and if so where? our vendor has not been able to come up with a solution or where the problem lies,  and it only started in November. 

also there is a time during that saturday that ODBC jobs can not connect either.

we are pegged CPU wise from midnight till about 7am at 100% across 20 CPUS.

any thoughts would be appreciated.


thanks 
Paul 

--
--
IRIS, Caché, Ensemble
---
You received this message because you are subscribed to the Google Groups "IRIS, Caché, Ensemble" group.
To unsubscribe from this group and stop receiving emails from it, send an email to intersystems-public-cache+unsub...@googlegroups.com.

Jim

unread,
May 17, 2019, 3:44:16 AM5/17/19
to IRIS, Caché, Ensemble


On Friday, May 3, 2019 at 4:53:50 PM UTC-4, azhrei wrote:

we are unable to "start background job" 


Any entries recorded in Cache's CCONSOLE.LOG during this time?

Do you know that the jobs never start? Or might they start and 
then abort? If the latter then you'll likely have VMS accounting 
entries that will have recorded the exit status (which may be 
informative).

OldMster

unread,
May 17, 2019, 3:44:16 AM5/17/19
to IRIS, Caché, Ensemble
If ODBC jobs can't connect either, then the machine doesn't have enough resources to start a new process.  Starting a process on VMS is very 'expensive', and if the machine is pegged at 100% on all CPU's, then it probably can't get a new process created before the connection/cache job creation attempt times out.  Find out what is running at that time taking all the CPU, and you'll find your problem.
Mark


On Friday, May 3, 2019 at 4:53:50 PM UTC-4, azhrei wrote:

Brendan Bannon

unread,
May 17, 2019, 7:05:51 AM5/17/19
to intersystems...@googlegroups.com

To see if this is what is happening you could turn on ODBC logging, there is a check box in the DSN config page.

 

Each request from the client to the server is 2 capital letter and is made up of a send and receive pairs.

 

Here is what the connection looks like from my system

 

>> Sent: (11:32:00:040)

  0000:  02 00 00 00 01 00 00 00 00 00 00 00 48 53            ............HS

  000E:  39 00                                                9.

<< Received: (11:32:00:091)

  0000:  0D 00 00 00 01 00 00 00 00 00 00 00 00 00            ..............

<< Received: (11:32:00:091)

  0000:  35 00 01 00 09 01 55 6E 69 63 6F 64 65               5.....Unicode

>> Sent: (11:32:00:091)

  0000:  61 00 00 00 02 00 00 00 00 00 00 00 43 4E            a...........CN

  000E:  05 01 57 52 43 09 01 39 F8 B8 99 D8 5B 79            ..WRC..9....[y

  001C:  09 01 B2 F3 33 33 B9 79 7A 09 01 62 72 65            ....33.yz..bre

  002A:  6E 64 61 6E 10 01 55 53 50 35 35 32 30 42            ndan..USP5520B

  0038:  52 45 4E 44 41 4E 0C 01 57 69 6E 53 51 4C            ……....WinSQL

  0046:  2E 65 78 65 0E 01 06 00 00 00 02 00 00 00            .exe..........

  0054:  F0 23 00 00 0F 01 31 37 32 2E 31 36 2E 36            .#....XXX.XX.X

  0062:  32 2E 32 34 32 03 01 00 03 04 01 02 04               X.XXX........

<< Received: (11:32:00:123)

  0000:  4C 00 00 00 02 00 00 00 00 00 00 00 00 00            L.............

<< Received: (11:32:00:123)

  0000:  3B 01 43 61 63 68 65 20 4F 62 6A 65 63 74            ;.Cache Object

  000E:  73 20 56 65 72 73 69 6F 6E 20 32 30 31 37            s Version 2017

  001C:  2E 32 2E 31 2E 38 30 31 20 43 61 63 68 65            .2.1.801 Cache

  002A:  20 78 44 42 43 20 53 65 72 76 65 72 20 38             xDBC Server 8

  0038:  2E 30 31 03 04 01 03 04 01 03 04 23 06 01            .01........#..

  0046:  34 34 37 36 02 04                                    4476..

 

Check the timestamps.  You have 10 seconds to make a connection before the client fails.

--

--
IRIS, Caché, Ensemble
---
You received this message because you are subscribed to the Google Groups "IRIS, Caché, Ensemble" group.

To unsubscribe from this group and stop receiving emails from it, send an email to intersystems-publi...@googlegroups.com.


To post to this group, send email to intersystems...@googlegroups.com.

azhrei

unread,
May 17, 2019, 10:42:15 AM5/17/19
to IRIS, Caché, Ensemble

I'll try to answer all the questions... 

there is nothing in the cconsole.log

we get a handful of errors stating “cannot start background job” in the application so the job never starts hence nothing in the accountng.log...

ODBC and Samba connections again not all 

I'll look at the ODBC logging, though I'm not sure how that will help in showing the issue on the system.

I have tried capturing packets during this time frame, again nothing is showing having issues.

while I have logged into the system after the midnight hour it is always slow for the initial login response time.

thanks
pc

To unsubscribe from this group and stop receiving emails from it, send an email to intersystems-public-cache+unsub...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages