Need some clarifications

Soumen Sarkar

unread,

Feb 27, 2015, 4:36:53 AM2/27/15

to sche...@googlegroups.com

Hello Group,

Good Day!!

I have the below hierarchy for resources. And I am able to achieve now due to having all of your earlier helps :-) ..But I have a little confusion here.

I have 3 job servers SERVER1,SERVER2 and SERVER3. All three servers are residing in three different physical machines/hosts.I have a level called DATAWAREHOUSE, and I have assigned a named resource in this level called NUM_PROCS (Amount=20). That means at any point in time it can take request for maximum 20 processes only when a job is getting submitted. down to the hierarchy I have three servers that I mentioned earlier, One step ahead I have also assigned 3 named resources to three servers, in SERVER1 its NUM_PROCS1 (Amount=10), in SERVER2 its NUM_PROCS2 (Amount=5) and in SERVER3 its NUM_PROCS3 (Amount=5) i.e. the 20 process that are getting spawned from root level DATAWAREHOUSE, among them SERVER1 can server only for 10 and SERVER2 & SERVER3 can server for 5 each.

1. My job is a bash script which is creating and submitting job dynamically (using sdms-submit and alias, static unchecked). since I have 3 job server agents residing in three different machines and my scheduling server is installed in another machine, then in which machine the .sh script will be kept physically , is it in scheduling server machine, or in all job server agent machines or scheduling server machine along with all job server agent machines.??

2. Now I am thinking further beyond i.e. high availability. Let say My job got submitted and SREVER1 has taken 10 requests and processing it. if in any circumstance SERVER1 itself goes down or the physical host holding SERVER1 got down the all the 10 running process will fail. Now If I rerun the 10 failed processes/jobs from Monitor Batches and Jobs window,

will it be handled by SRVER2 or SERVER3 once they are free from their own set of tasks.??

Let say SERVER1 is unavailable for a long time due to some issues, I don't want anyone of my further scheduled job which are going to be submitted to be failed, all stuffs should be handled by SERVER2 and SERVER3 until the SERVER1 is up.In that case do I need to deregister SERVER1 explicitly then correct it and then register again, or any other way schedulix has for automatic fail-over.

3. Our present scheduling system installation- Its a single installation, we can connect to prod/dev from single window but repository for production and development environment are different.Its also doable in schedulix I believe, its just all about environments.but not sure in schedulix how can we maintain different repository (Database) for single installation. Can you please provide me some idea on that?..

Thanks for your help..

Thanks

Soumen S

Soumen Sarkar

unread,

Feb 27, 2015, 4:40:41 AM2/27/15

to sche...@googlegroups.com

Oops Typo.. among them SERVER1 can serve only for 10 and SERVER2 & SERVER3 can serve for 5 each

Dieter Stubler

unread,

Feb 27, 2015, 5:22:10 AM2/27/15

to sche...@googlegroups.com

Hi Soumen,

As I understand you want to have a resource to limit the total number of processes in your data warehouse and additional resources to limit the number of processes on the servers[1-3].
To do that you should name your named resource NUM_PROCS like NUM_PROCS_TOTAL to make that clear (If you rename the named resource, all references will still be ok).
Then you should only create only ONE named resource for local limits. Lets call it NUM_PROCS_LOCAL.
Now you create resources for NUM_PROCS_LOCAL with amounts 10,5,5 on your 3 servers.
Best practices is now to create a Footprint with two resource requirerements of amount 1 for NUM_PROCS_TOTAL, and NUM_PROCS_LOCAL.
Lets call it DW_PROCESS.
If you assign the DW_PROCESS Footprint to a job, this job will require one NUM_PROCS_LOCAL and one NUM_PROCS_TOTAL to run.
Since both resources are available from all three servers, this does not limit the selection of the executing jobserver.
If you use an environment allowing execution on any of this servers schedulix does load balancing for you.

your topic 1:
schedulix does not provide automatic distribution of your software to agent machines.
So everything neccessary to run your commandline hast to be available on the machine where it should be executed.

your topic 2:
when set up like above, your jobs will distributed between your 3 servers automatically.
When a server crashes, the system still treats the jobs on this system still as running because schedulix doesnt know anything about the
state of this server (may just the network is down). In this case you will use the deregister operation on the crashed server.
This will result in no more jobs to be assigned to this server and very important, will put all 'running' jobs on that server into state BROKEN_FINISHED.
If you now restart them, they will be executed by one of the remaining servers.
When the crashed server gets up gain, the jobserver will automatically register itself on connect again.

your topic 3:
one schedulix server is using exectly one repository where it stores any data in.
The web interface allows to define more than one server connection.
You have to use the Zope management interface http://yourmachine:yourport:/Custom/manage and edit the SDMSServers Python script to add additional servers.

Example:

#
# define all accessible SDMS Servers here
#
return {
'DEFAULT' : {
'HOST'    : 'devserver',
'PORT'    : '2506',
'VERSION' : 'BASIC',
'CACHE'   : 'Y'
},
'PRODUCTION' : {
'HOST'    : 'prodserver',
'PORT'    : '2506',
'VERSION' : 'BASIC'
}
}

No users can use the 'schedulix!web Users' dialog to define connections to those servers.
On the Main Desktop you can the select the connection new windows should work on by the connection option field in upper right corner of the Main Desktop Windows.

Hope that helped.
Feel free to ask further questions.

Regards
Dieter

Ronald Jeninga

unread,

Feb 27, 2015, 5:27:21 AM2/27/15

to sche...@googlegroups.com

Hi Soumen,

well, let me try....

First of all, the NUM_PROCS on DATAWAREHOUSE (I'll abbreviate this to DWH) level are useless. To explain this: During Resource Scheduling it is first determined which jobservers are candidates for the execution of the job.
To do this, alls visible resources are "collected", starting from the leaf node of the tree (that is the jobserver). If some resource is found that will be the resource used. Any instances on higher levels are ignored.
Since in your case all jobservers have their own resources defined, the resource on DWH level doesn't play a role. I think, for understandability it would be best to delete the resource.

1. To rephrase the situation: you have some job which executes a shell script that dynamically submits child jobs. If you submit this job, it will be executed, if possible, on a jobserver offering all required resources. The scheduling system is not aware of installed software. Hence the script you want to execute must be found by the jobserver executing that job. Which one that is, I don't know. It depends on the resource requirements. If the job doesn't do anything else than just submitting child jobs, you could even write the script as run program.

You might want to have a look at the run program of SYSTEM.'EXAMPLES'.'E0275_PROGRESS'.'PROGRESS_1'. You don't need to understand fully what we are doing there, it's more an example of how to write small shell scripts within the run program. In simple cases that is easier than creating a shell script in the file system, especially if more than one jobserver must be able to execute it.

You could see the scheduling system as a very fast and thorough working employee. It has a list of job definitions to execute. Depending on the resource requirements it looks for a suitable run time environment. Then it "types in" the provided commandline and watches what happens. Depending on the result, it proceeds with the next job definition from the list, or it stops.

2. As always, it depends. :-) But for serious: after rerunning a job, it will pass the resource scheduler again. If your other scopes are capable of executing the job (offer all required resources), it will be written onto their list of jobs to execute. A job can be present on more than one list at the same time. The first jobserver that issues a "GET NEXT JOB", will get it, and the job will be removed from all the lists.

But IF you have set the KEEP option of one or more resources, the jobservers selected for execution must be able to "see" this resource. This is why I said: it depends.
(Another possibility is that you are working with the STICKY flag).

If SERVER1 is unavailable for a long time, the system will continue to work. Logically it doesn't make a difference if it is registered or not. It has a small performance impact though, if the server remains registered.
Since all jobs that could be executed by SERVER1 are written on its list and removed from the list as soon as some jobserver does the "GET NEXT JOB", there will be writing overhead with no value.
hence, if you know the server won't be started for a long time, it is a good idea to deregister it. As soon as you start the jobserver it will register itself anyway.

3. You can configure one Zope installation to talk with several schedulix servers. That makes it easy to switch between several installations.
To do so, you'll have to edit the /Custom/SDMSServers script (within Zope) and add connection information in the Web Users Dialogue.

HTH, but if I didn't explain everything very well, please continue asking.

Regard,

Ronald

Ronald Jeninga

unread,

Feb 27, 2015, 6:10:54 AM2/27/15

to sche...@googlegroups.com

Hi Soumen,

I have to apologize, I missed the fact that you created a separate Named Resource for each jobserver and created an instance there. This means that the first paragraph from my previous message isn't entirely correct. The NUM_PROCS resource on DWH level does play a role (as long as it is required by some Job).

But in your setup, if any job requires NUM_PROCS1, it will only run on SERVER1. No load balancing and no failover. And comparable applies to Jobs requiring NUM_PROCS2 or NUM_PROCS3.

Regards,

Ronald

Soumen Sarkar

unread,

Feb 27, 2015, 6:15:47 AM2/27/15

to sche...@googlegroups.com

Hi Ronald

Yeah I am still reading the post, and was little confused, but its clear now. I have a silly question :-) ..how schedulix make connection to linux boxes, is it ssh?.

Thanks

Soumen S

Ronald Jeninga

unread,

Feb 27, 2015, 6:29:39 AM2/27/15

to sche...@googlegroups.com

Hi Soumen,

it's the jobserver that connects to the scheduling server. In case of schedulix, we use plain sockets. In case of BICsuite SSL/TLS is an option.
But provided that your production environment is shielded from the outside world, there's no strict need to use SSL/TLS.

The communication is quite simple. The jobserver sends a command in plain text to the server. The server responds with a Java serialized object.
In case of Zope, the server responds with an ASCII String that can be transformed into a python data structure using the eval() function.
In fact, the communication is simple enough that in case of emergency even telnet can be used to access the server.

There are a few words on this topic in the syntax documentation.

Regards,

Ronald

Soumen Sarkar

unread,

Feb 27, 2015, 6:56:37 AM2/27/15

to sche...@googlegroups.com

Thanks Ronald for the clarification..I have been exploring this product from last 1 and half months perhaps, and so far it was really a awesome journey.I am really enjoying and hope I will be :-). I am from Neustar DWH Tech Team based out in Pune/India.

Thanks

Soumen S

Soumen Sarkar

unread,

Feb 27, 2015, 7:00:29 AM2/27/15

to sche...@googlegroups.com

Hi Dieter,

Thank you very much for your post, It was very much informative and it helped me a lot. Thanks Again sir.

Regards

Soumen S

Ronald Jeninga

unread,

Feb 27, 2015, 8:05:39 AM2/27/15

to sche...@googlegroups.com

Hi Soumen,

thank you for the compliments! I'm happy to hear you're enjoying the journey. I can promise you, it's not over yet. There's still a lot to learn and there is still a vast amount of features and possibilities to discover.
Actually I already knew you're from Neustar. I had a small e-mail exchange with someone else from there and you were on carbon copy in the last few mails.
We were talking about the possibilities of a workshop, which is still a good idea, I think. It would save you from writing another ten thousand messages here, which would mean enormous time-savings, as well as faster and better results.

But whatever you decide, we're here. And answering your questions might help others too, who always wanted to know about schedulix but were afraid to ask ;-)

Regards,

Ronald

Reply all

Reply to author

Forward