hi-
Here are the machines i'm setting up:
1) Mac (intel osx) - as condor central server
2) paralles VM running Windows within the mac as execute machine
3) seperate windows desktop
4) after everthing else works: EC2 windows machines - i suppose running as a cluster that attachs as a flock. (perhaps with cyclecomputing)
I have tried (for days):
* playing with various configurations of condor_config & condor_config.local on both machines.
* taken down firewalls on both sides.
* read manuals, googled, etc..
* running condor_store_cred with various setting on both sides
STATUS:
So far I have Condor up and running on the MAC as an execute, submit, manage installation. I successfully ran a test job. The windows execute node is up but i can't test it until i get credd security working properly (i think that's the problem). I can see the windows and mac slots from the both sides (see below).
When i submit a job from MAC that has windows requirements it doesn't run. Presently, condor_q -analyze says "not yet been considered by the matchmaker" and "match but reject the job for unknown reasons." Under a previously attempted configuration it was "reject your job because of their own requirements" , the Windows slot would got to 'Matched', but the job would be Idle and the logs would suggest a security issue.
I can't even condor_rm the Idle jobs on the MAC side. I'm guessing there being matched to Windows ceded their control:
------
jimi:~ root# condor_q
-- Submitter: jimi.westell.com : <169.254.177.117:49371> : jimi.westell.com
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
11.0 Jason 8/17 22:10 0+01:46:05 I 0 0.0 sample-job 60
13.0 Jason 8/18 01:12 0+01:24:43 I 0 0.0 sample-job 60
14.0 Jason 8/18 01:24 0+00:02:49 I 0 0.0 sample-job 60
15.0 Jason 8/18 01:53 0+00:00:00 I 0 0.0 sample-job 60
4 jobs; 4 idle, 0 running, 0 held
jimi:~ root# condor_rm 11.0
AUTHENTICATE:1003:Failed to authenticate with any method
No result found for job 11.0
------
CONFIGURATIONS:
-------- condor_config.local on MAC:
--------
CREDD_HOST = 10.211.55.10
STARTER_ALLOW_RUNAS_OWNER = True
CREDD_CACHE_LOCALLY = True
ALLOW_CONFIG = root@$(CONDOR_HOST), *
SEC_CONFIG_NEGOTIATION = REQUIRED
SEC_CONFIG_AUTHENTICATION = REQUIRED
SEC_CONFIG_ENCRYPTION = REQUIRED
SEC_CONFIG_INTEGRITY = REQUIRED
SEC_PASSWORD_FILE = /usr/local/condor/etc/pool_password
-------- condor_config.local on Windows:
--------
CREDD_HOST = xx.xxx.55.10
STARTER_ALLOW_RUNAS_OWNER = True
CREDD_CACHE_LOCALLY = True
SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD
ALLOW_CONFIG = *
SEC_CONFIG_NEGOTIATION = REQUIRED
SEC_CONFIG_AUTHENTICATION = REQUIRED
SEC_CONFIG_ENCRYPTION = REQUIRED
SEC_CONFIG_INTEGRITY = REQUIRED
------- condor_config on Windows
------- i made this low security just try to get it working:
-------
ALLOW_WRITE = *
ALLOW_READ = *
#... not sure what else you need to see
LOG FILES:
--------- CredLog - on windows
--------- this is after turning MAC & WIN firewalls off - not a perm solution, but not working anyway:
---------
08/18/11 14:42:18 Failed to start non-blocking update to <xxx.xxx.1.21:9618>.
08/18/11 14:42:18 Return from Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> 0.0000s
08/18/11 14:47:18 Calling Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> (2)
08/18/11 14:47:18 Return from Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> 0.0000s
08/18/11 14:47:18 Calling Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> (2)
08/18/11 14:47:18 SECMAN: required authentication with <xxx.xxx.1.21:9618> failed, so aborting command UPDATE_AD_GENERIC.
08/18/11 14:47:18 ERROR: SECMAN:2004:Failed to create security session to <xxx.xxx.1.21:9618> with TCP.
|AUTHENTICATE:1003:Failed to authenticate with any method
08/18/11 14:47:18 Failed to start non-blocking update to <xxx.xxx.1.21:9618>.
08/18/11 14:47:18 Return from Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> 0.0000s
08/18/11 14:52:39 attempt to connect to <xxx.xxx.1.21:9618> failed: timed out after 20 seconds.
08/18/11 14:52:39 Calling Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> (2)
08/18/11 14:52:39 ERROR: SECMAN:2004:Failed to create security session to <xxx.xxx.1.21:9618> with TCP.
|SECMAN:2003:TCP connection to <xxx.xxx.1.21:9618> failed.
08/18/11 14:52:39 Failed to start non-blocking update to <xxx.xxx.1.21:9618>.
08/18/11 14:52:39 Return from Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> 0.0000s
--------- MasterLog - on windows
---------
---------
08/18/11 14:51:50 condor_read(): timeout reading 21 bytes from <10.211.55.10:53043>.
08/18/11 14:51:50 IO: Failed to read packet header
08/18/11 14:51:50 store_pool_cred: failed to receive all parameters
COMMAND LINE OUTPUT:
---------- condor_status - on windows
---------- Manual says to run this when you are done, doesn't mention the command
---------- only works on the windows side:
C:\Users\Administrator>condor_status -f "%s\t" Name -f "%s\n" ifThenElse(isUndefined(LocalCredd),\"UNDEF"\",LocalCredd)
slot1@JASONHERMANB752 UNDEF
sl...@jimi.westell.com UNDEF
slot2@JASONHERMANB752 UNDEF
sl...@jimi.westell.com UNDEF
sl...@jimi.westell.com UNDEF
sl...@jimi.westell.com UNDEF
sl...@jimi.westell.com UNDEF
sl...@jimi.westell.com UNDEF
sl...@jimi.westell.com UNDEF
sl...@jimi.westell.com UNDEF
------- condor_status - MAC (identical on windows)
-------
-------
jimi:log root# condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
sl...@jimi.westell OSX X86_64 Unclaimed Idle 0.210 1024 0+19:09:01
sl...@jimi.westell OSX X86_64 Unclaimed Idle 0.000 1024 1+11:24:12
sl...@jimi.westell OSX X86_64 Unclaimed Idle 0.000 1024 1+03:18:37
sl...@jimi.westell OSX X86_64 Unclaimed Idle 0.000 1024 0+23:14:03
sl...@jimi.westell OSX X86_64 Unclaimed Idle 0.000 1024 0+15:05:52
sl...@jimi.westell OSX X86_64 Unclaimed Idle 0.000 1024 0+11:04:54
sl...@jimi.westell OSX X86_64 Unclaimed Idle 0.000 1024 0+06:59:54
sl...@jimi.westell OSX X86_64 Unclaimed Idle 0.000 1024 1+15:27:42
slot1@JASONHERMANB WINNT60 INTEL Unclaimed Idle 0.120 1023 0+00:00:04
slot2@JASONHERMANB WINNT60 INTEL Unclaimed Idle 0.100 1023 0+00:00:02
Total Owner Claimed Unclaimed Matched Preempting Backfill
INTEL/WINNT60 2 0 0 2 0 0 0
X86_64/OSX 8 0 0 8 0 0 0
Total 10 0 0 10 0 0 0
-------- condor_store_cred on Windows:
--------
--------
C:\Users\Administrator>condor_store_cred -c add
Account: condor_pool@JASONHERMANB752
Enter password:
Operation failed.
Make sure you have CONFIG access to the target Master.
thanks kindly for any assistance, jason
Hi Jason,
I had similar problems – but was running Windows machines only. In my case it was important to include the port number in CREDD_HOST. We use the following settings:
LOCAL_CREDD = $(CONDOR_HOST)
CREDD_HOST = $(LOCAL_CREDD):$(CREDD_PORT)
After that it was important to add the pool passwords correctly on all machines and of course the user passwords. Finally to execute “condor_reconfig –all” and making sure that the LocalCredd flag was set – see manual http://www.cs.wisc.edu/condor/manual/v7.6/6_2Microsoft_Windows.html.
For one of my machines I got trouble that looked that credd issues but it turned out that the host was not correctly registered in the domain (in Active Directory). Removing it and re-adding it solved that problem.
Hope that can be of some help.
/Tomas
You should not need a credd unless you are running as owner, which is
not the default.
Also your CRED_HOST *must be* a windows machine. It may be too early in
the a.m., but I can't discern from the logs below if that is the case.
Cheers,
Tim
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-use...@cs.wisc.edu with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-use...@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
+ check the CONFIG_LOG output on windows.
I'm not certain, but I don't believe it's allowed to try to set the pool
passwd from a non-windows machine. I will have to run a test to
verify.
Cheers,
Tim
From: | Jason Herman <jbhe...@gmail.com> |
To: | Condor-Users Mail List <condor...@cs.wisc.edu> |
Date: | 08/24/2011 09:51 PM |
Subject: | Re: [Condor-users] credd issues: heterogenous system MAC-central; WIN-execute + EC2 (win) when this works |
Sent by: | condor-use...@cs.wisc.edu |
try adding -debug to the command line for the tool
+ check the CONFIG_LOG output on windows.
I'm not certain, but I don't believe it's allowed to try to set the pool
passwd from a non-windows machine. I will have to run a test to
verify.
Cheers,
Tim
On Wed, 2011-08-24 at 23:43 -0400, Jason Herman wrote:
I am probably missing something from previous emails but for clarification I am assuming the CREDD is running on the windows machine and not the central manager in your case. Second, why is the pool domain different when you set the pool password on windows and the MAC? We only use windows in our pool, but maybe this will help.
Windows: condor_pool@JASONHERMANB752
MAC: condo...@jimi.westell.com
Should these not be the same?
mike
From: Jason Herman <jbhe...@gmail.com> To: Condor-Users Mail List <condor...@cs.wisc.edu> Date: 08/24/2011 09:51 PM Subject: Re: [Condor-users] credd issues: heterogenous system MAC-central; WIN-execute + EC2 (win) when this works Sent by: condor-use...@cs.wisc.edu
On Thu, 2011-08-25 at 13:33 -0400, Jason Herman wrote:
> i'll try -debug and check CONFIG_LOG.
>
>
> the docs say the pool password needs to be set on all machines. The
> main reason i'm setting up CRED at all is so that i can submit jobs to
> the WIN box from the MAC side. my understanding is that all computers
> need to share the pool password so their daemons can communicate. Then
> further i need to have the same logon/password accounts on both mac &
> win so from mac i can run_as_owner on the WIN box. am i understanding
> that correctly?
The only way that would be true is if your entire single sign on was
validating against Active Directory (usually not the case unless you are
primarily a windows shop), and you wish to run-as-owner. (The 2
combined features are extremely rare).
You can most certainly submit jobs from the mac side and have them run
on windows, you only need to set your requirements correctly. You will
need to verify your ALLOW_* is correct.
e.g. Linux submit -> windows run
--------------------------------
universe = vanilla
executable = your_bat_file.bat
arguments = 2
requirements = ( Arch=="X86_64") && ( OpSys=="WINNT51" ||
OpSys=="WINNT52" || OpSys=="WINNT61" )
# need the line below
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
iwd = /tmp
queue 20
--------------------------------
Cheers,
Tim