Creation of indices - SCI_SUBMITTED_ENTITY

30 views
Skip to first unread message

Pablo Maldonado

unread,
Jul 19, 2019, 12:18:01 PM7/19/19
to schedulix
Hello,

I wanted to know if generating an index on the table SCI_SUBMITTED_ENTITY field submit_ts could cause problems.
The reason is that by some controls I do the query takes 16 seconds and it runs every 1 minute on a table of case 1400000 rows, but if I put an index the query takes 0.59 seconds.

What I control through Nagios is if there are any batch canceled.

I did the test on a copy of the SCI_SUBMITTED_ENTITY table that calls it SCI_SUBMITTED_ENTITY_PABLO.

Step what probe:
-------------------------------------------------
SELECT sme.master_id,
       'CANCELADO' status,
       se.name,
       getFullPath(se.folder_id) fullPath
FROM   SCI_SUBMITTED_ENTITY_PABLO        sme,
       SCI_V_SCHEDULING_ENTITY     se,
       SCI_V_EXIT_STATE_DEFINITION esd,
       ( SELECT DISTINCT master_id
         FROM   SCI_SUBMITTED_ENTITY_PABLO sme2
         WHERE      sme2.commandline IS NOT NULL
                AND sme2.merge_mode = 'NOMERGE'
                AND sme2.state IN ('FINISHED','ERROR')
                AND sme2.submit_ts >= (NOW() - INTERVAL 24 HOUR)
       ) btfail
WHERE      sme.master_id = btfail.master_id
       AND se.id = sme.se_id
       AND sme.se_version >= se.valid_from
       AND sme.se_version <  se.valid_to
       AND esd.id = sme.final_esd_id
       AND sme.commandline IS NULL
       AND sme.final_ts IS NULL
       AND sme.merge_mode='FAILURE'
       AND sme.state='FINISHED'
       AND sme.submit_ts >= (NOW() - INTERVAL 24 HOUR)
       
CREATE TABLE SCI_SUBMITTED_ENTITY_PABLO AS SELECT * FROM SCI_SUBMITTED_ENTITY;
CREATE INDEX index_pablo ON SCI_SUBMITTED_ENTITY_PABLO(submit_ts);

Thank you.

Pablo Maldonado

Pablo Maldonado

unread,
Jul 19, 2019, 1:57:59 PM7/19/19
to schedulix
Hello,

Improve the time to 2 seconds by getting the master_id that are running.
Maybe if you can give me your advice I would appreciate it.

SELECT sme.master_id,
       'CANCELADO' status,
       se.name,
       getFullPath(se.folder_id) fullPath
FROM   SCI_SUBMITTED_ENTITY        sme,
       SCI_V_SCHEDULING_ENTITY     se,
       SCI_V_EXIT_STATE_DEFINITION esd,
       ( SELECT DISTINCT ms_m_id master_id 
         FROM   MASTER_STATE 
         WHERE      ms_m_state IN (11) 
                AND ms_m_final_ts IS NULL 
       ) btrun
WHERE      sme.master_id = btrun.master_id
       AND se.id = sme.se_id
       AND sme.se_version >= se.valid_from
       AND sme.se_version <  se.valid_to
       AND esd.id = sme.final_esd_id
       AND sme.commandline IS NULL
       AND sme.final_ts IS NULL
       AND sme.merge_mode='FAILURE'
       AND sme.state='FINISHED'

       
Pablo Maldonado       

Dieter Stubler

unread,
Jul 22, 2019, 6:03:19 AM7/22/19
to sche...@googlegroups.com
Hi Pablo,

since SCI_SUBMITTED_ENTITY is a view, there is no create index option.
The SUBMITTED_ENTITY table defines the timestamps as unix date time integers and the SCI View converts them to a RDBMS date type.
Since for submit_ts is the result of a conversion function, creating an index on the submit_ts of the undelying SUBMITTED_ENTITY will not help.

I recommend to add the following condition to your query:

sme.id = sme.master_id

Thus the RDBMS has to process master rows only.
This should speed up your query quite a lot.
Regards
Dieter

Pablo Maldonado

unread,
Jul 22, 2019, 10:33:40 AM7/22/19
to schedulix
Hello Dieter,

I spend what I did on Friday to solve my problem, but again I ask if you can give me your opinion and advice.

1 - The query what to look for is to inform that Batch canceled by means of nagios.

2 - To do this, implement the following query that runs every 1 minute.

   2a - Index creation :
        CREATE INDEX SME_TEST_IDX ON SUBMITTED_ENTITY (master_id,parent_id,state,final_ts); (improves the time from 13 sec to 0.59 sec)

   2b - Query :        
        SELECT sme.master_id,
               'CANCELADO' status,
               se.name,
               getFullPath(se.folder_id) fullPath
        FROM   SCI_SUBMITTED_ENTITY        sme,
               SCI_V_SCHEDULING_ENTITY     se,
               SCI_V_EXIT_STATE_DEFINITION esd
        WHERE      sme.master_id IN ( SELECT DISTINCT master_id
                                      FROM   SUBMITTED_ENTITY sme2
                                      WHERE      sme2.parent_id IS NOT NULL
                                             AND sme2.state IN (11,15)
                                             AND sme2.final_ts IS NULL
                                    )
               AND se.id = sme.se_id
               AND sme.se_version >= se.valid_from
               AND sme.se_version <  se.valid_to
               AND esd.id = sme.final_esd_id
               AND sme.parent_id IS NULL
               AND sme.final_ts IS NULL
               AND sme.state='FINISHED';

         Nota : 
         - getFullPath = own function that returns the fullpath of the batch.
         - The subquery in the IN gets the jobs (no batchs) that failed.
         - Query response time 1.25 sec, before 16 - 20 sec.        
         
Resultado : 
------------------

[ jobID :: 7466010        Status :: CANCELADO        JobName :: S13_XXX        FullPath :: ORACLE.TES.HOST.XXX.S13_XXX    ]


Pablo

Dieter Stubler

unread,
Jul 24, 2019, 6:28:41 AM7/24/19
to schedulix
Hi Pablo,

What should your query do ?
If you just want to get a list of cancelled masters you could just do (state 10 is CANCELLED):

SELECT id, se_id  from submitted_entity where master_id = id and state = 10

and optionally join that with SCI_C_SCHEDULING_ENTITY and SCI_SUBMITTED_ENTITY to get the name and path and other attributes you need.

Regards
Dieter

Pablo Maldonado

unread,
Jul 24, 2019, 10:53:21 AM7/24/19
to schedulix
Hi Dieter,

All the works in Schedulix are defined as follows (this is how I did it):

BATCH
  -> JOB1
  -> JOB2
  -> etc ...
  
And the only thing I want to report by Nagios are the lots that end up as canceled (not what job produced the cancellation), that's why the subquery looks for the canceled jobs and then from there the lots.
From what I saw, if I didn't see badly, if a job cancels the batch, it continues with state FINISHED, but while it is also running, that's why my crazy query.

Is what I tell you correct?

Thank you


Pablo

Dieter Stubler

unread,
Jul 29, 2019, 4:17:31 AM7/29/19
to schedulix
Sorry Pablo,

I do not understand what you mean with 'lots'.
What do you mean with 'if a jobs cancels the batch'.
As long a job does not do an alter job ... with cancel using the api, a job does not 'cancel' a batch.
Really do not understand what yourt setup looks like and which information should be collected using a query to the rdbms.

Regards
Dieter

Ronald Jeninga

unread,
Jul 29, 2019, 4:47:37 AM7/29/19
to schedulix
Hi Pablo,

somehow I have the feeling there's a conflict in terminology.

A job that runs and fails will return some exit code (usually one that is unequal zero).
This exit code is mapped (by the used exit state mapping) to some exit state.
This exit state is either a final or restartable state.

If this exit state is restartable, the job won't reach a final state and remains in state FINISHED.
If this exit state is defined to be a final state (this is done in the exit state profile), the job will reach the state FINAL.
This conversion from RUNNING -> FINISHED -> FINAL happens within a single transaction.
Hence you won't ever see a job that exited with an exit code that maps to a final exit state that is still in state FINISHED.

Jobs that are FINAL are immutable. Jobs that are FINISHED can be restarted, you can set their exit state manually (usually to some final state), or you can cancel them.
First after the operator action "Cancel", a job will end up in state CANCELLED.
A job in state CANCELLED is immutable too. This means that there's a fundamental difference between being in a FINISHED state compared to the CANCELLED state.

Thus there's no way that a job cancels some batch (with the exception of a job that executes a program that is designed to cancel batches and in fact automates the operator's intervention) in a natural way.

This again means that you use the word "(to) cancel" for something else than we do.

With the batch structure you've showed (only a master (BATCH) with n jobs as children), the query you need is probably very simple:

SELECT id
   FROM SUBMITTED_ENTITY
 WHERE id = master_id    /* masters only */
       AND cnt_finished > 0; /* with children in a FINISHED state */

And sure, you can join with the scheduling_entity table to retrieve the full name.
As an alternative, you can execute the query in sdmsh and use

SELECT id, se_id
   FROM SUBMITTED_ENTITY
 WHERE id = master_id    /* masters only */
       AND cnt_finished > 0  /* wiith children in a FINISHED state */
WITH se_id job;

Note: this simple approach doesn't work in deeper hierarchies!

I might have missed your point entirely, but in that case I simply don't understand your terminology.

Best regards,

Ronald
Reply all
Reply to author
Forward
0 new messages