Hi Vendors,
Position - Sr. Principal
Lead/Architect/Engineer Workload Automation (IBM)
Location - Warsaw,
IN
Type – Contract
Tivoli-TWS/IWS)
Role Overview We are seeking a Lead Workload Automation Engineer / Architect to define and drive the enterprise architecture, strategy, and operational model for IBM Tivoli/IBM Workload Scheduler
(TWS/IWS) across distributed environments (on-prem and cloud). This role sets platform standards and reference designs, leads modernization and major upgrades/migrations, governs reliability and security practices, and serves as the senior technical partner for application, databases, and infrastructure organizations to deliver resilient, scalable scheduling services for mission-critical workloads. In addition, assist and supervise two job scheduling teams.
Key Responsibilities
• Own the end-to-end architecture for the TWS/IWS platform (components,
topology, environments, integrations), including standards, patterns, and reference implementations.
• Provide technical oversight for additional (3rd-party) job scheduling
platforms where used; establish operating standards, integration patterns, and support processes to ensure consistent controls and reliability.
• Lead enterprise-scale installations, upgrades, and migrations; define
cutover/rollback strategies, coordinate change windows, and ensure readiness across dependent teams.
• Lead assessments of legacy scheduler instances and batch frameworks to
identify candidates for retirement, consolidation, or migration; produce target-state recommendations, sequencing/roadmaps, and risk-based migration plans.
• Define reliability engineering practices for workload automation:
availability targets, capacity planning, performance tuning, monitoring/alerting, and continuous improvement.
• Design and validate high-availability and disaster recovery solutions
(including DB2 HADR where applicable); plan and execute regular DR tests and remediate gaps.
• Establish governance for workload onboarding and job design: scheduling
standards, dependency modeling, naming conventions, calendars, critical path optimization, and SLA/SLO management.
• Architect and productionize automation for platform operations and
self-service (e.g., provisioning, reporting, batch controls) using shell/Python/Perl and enterprise tooling.
• Own security and compliance posture: access model (LDAP/SSO),
least-privilege controls, audit evidence, vulnerability remediation, and secure configuration baselines.
• Manage and develop two teams (e.g., platform engineering and operations):
set priorities and operating rhythms, oversee delivery and support outcomes, coach/mentor team members, and drive performance management in partnership with leadership.
• Be available for major outages and critical events related to job
scheduling, including QEND activities up to four (4) times per fiscal year, providing incident leadership, stakeholder communications, and post-incident follow-up.
• Participate in an on-call rotation and provide after-hours/weekend support
as needed to maintain scheduling availability and meet business SLAs.
• Support a global operating model by working flexibly across EMEA and US
business hours to provide required coverage and stakeholder overlap.
• Serve as escalation point for complex incidents; lead root-cause analysis
and drive problem management to prevent recurrence.
• Mentor and guide engineers; lead technical design reviews,
documentation/runbook standards, and knowledge sharing across the organization.
• Deep dive into other job scheduling teams like Automate, AS400 and Robot
and assist in supervising these teams in IT Operations.
Required Qualifications
• High School Diploma or equivalent
• 10+ years of experience in enterprise workload automation, including 7+
years of hands-on IBM TWS/IWS/IWA administration in distributed environments.
• Bachelor’s degree or 10+ years of equivalent IT industry service
experience
• For senior/lead equivalent roles, 8+ years of relevant ITSM/major incident
operations experience may be required.
• IT Technology Certification is a plus
• Proven experience in a lead/architect capacity: defining platform
standards/reference designs, guiding cross-team implementations, and making architecture decisions for reliability, scalability, and security.
• Strong Linux/UNIX engineering and production troubleshooting experience,
including performance and availability triage.
• Advanced automation/scripting skills (shell plus Python and/or Perl) with
experience building supported, maintainable operational tooling.
• Demonstrated ability to lead complex incident response and root-cause
analysis, and to drive preventative action through problem management.
• Strong change leadership in regulated production environments (planning,
risk management, implementation, validation, rollback) aligned with ITIL processes.
• Excellent stakeholder communication and ability to influence across
applications, database, infrastructure, and security teams.
Preferred Qualifications
• DB2 administration experience, including High Availability Disaster
Recovery (HADR); familiarity with Oracle/Postgres and SQL.
• Experience with TWS/IWS integrations and APIs (REST/SOAP), event-based
scheduling, and real-time/on-demand workload patterns.
• Experience with Tivoli Dynamic Workload Console (TDWC/TDWB) and critical
path monitoring.
• Experience integrating file transfer solutions (e.g., SFTP/PGP/GPG,
managed file transfer platforms) into batch workflows.
• Experience with SAP and other enterprise application integrations via TWS
extended agents.
• Experience building dashboards/metrics and integrating with observability
platforms (e.g., Grafana/Graphite).
• Experience defining platform standards, leading upgrades/migrations, and
coordinating cross-team delivery (e.g., change windows, cutovers, rollback planning).
• Familiarity with cloud patterns and automation (e.g.,
infrastructure-as-code concepts, container/VM scheduling considerations) in support of workload modernization.
• Hands-on experience across ITSM processes (Incident, Problem, Change,
Knowledge) in an enterprise environment.
• ServiceNow experience, including incident lifecycle management,
documentation standards, and reporting.
• Working knowledge of ITIL concepts and IT service management best
practices.
• Artificial Intelligence – Navigating all the AI APP – know how to
communicate with it and know when not to use it when it does not meet your or the companies’ expectations
• Strong analytical and problem-solving skills to investigate issues and
drive resolution.
• Ability to manage multiple tasks in a high-volume, high-urgency operations
environment.
• Strong written and verbal communication skills, including confident
facilitation on conference bridges.
• Able to write and review technical documentation and knowledge articles.
Skills & Tools
• Workload Automation: IBM TWS/IWS/IWA, TDWC/TDWB, dynamic scheduling, JSDL
• Operating Systems: Linux, UNIX (AIX/SunOS), Windows (agent support)
• Databases: DB2 (HADR), Oracle/Postgres (familiarity)
• Scripting: Shell, Python, Perl
• ITSM/Monitoring: ITIL processes; integrations with tools such as
ServiceNow, AppDynamics, OBM, Grafana/Graphite
• Security: LDAP/SSO concepts, role-based access, audit/patch compliance
Best Regards
Rohit Kumar (Sr.Technical Recruiter)
roh...@intellisofttech.com|