Requirement - Sr. Principal Lead/Architect/Engineer Workload Automation

0 views

Skip to first unread message

vamsi s

unread,

May 8, 2026, 5:28:28 PM (3 days ago) May 8

to vam...@sritechsolutions.com

Role: Sr. Principal Lead/Architect/Engineer Workload Automation - (IBM Tivoli-TWS/IWS)

Location: Minneapolis MN (Onsite/Hybrid)

Experience:10+

Description:

We are seeking a Lead Workload Automation Engineer / Architect to define and drive the enterprise architecture, strategy, and operational model for IBM Tivoli/IBM Workload Scheduler (TWS/IWS) across distributed environments (on-prem and cloud). This role sets platform standards and reference designs, leads modernization and major upgrades/migrations, governs reliability and security practices, and serves as the senior technical partner for application, databases, and infrastructure organizations to deliver resilient, scalable scheduling services for mission-critical workloads. In addition, assist and supervise two job scheduling teams.

Key Responsibilities

Own the end-to-end architecture for the TWS/IWS platform (components, topology, environments, integrations), including standards, patterns, and reference implementations.
Provide technical oversight for additional (3rd-party) job scheduling platforms where used; establish operating standards, integration patterns, and support processes to ensure consistent controls and reliability.
Lead enterprise-scale installations, upgrades, and migrations; define cutover/rollback strategies, coordinate change windows, and ensure readiness across dependent teams.
Lead assessments of legacy scheduler instances and batch frameworks to identify candidates for retirement, consolidation, or migration; produce target-state recommendations, sequencing/roadmaps, and risk-based migration plans.
Define reliability engineering practices for workload automation: availability targets, capacity planning, performance tuning, monitoring/alerting, and continuous improvement.
Design and validate high-availability and disaster recovery solutions (including DB2 HADR where applicable); plan and execute regular DR tests and remediate gaps.
Establish governance for workload onboarding and job design: scheduling standards, dependency modeling, naming conventions, calendars, critical path optimization, and SLA/SLO management.
Architect and productionize automation for platform operations and self-service (e.g., provisioning, reporting, batch controls) using shell/Python/Perl and enterprise tooling.
Own security and compliance posture: access model (LDAP/SSO), least-privilege controls, audit evidence, vulnerability remediation, and secure configuration baselines.
Manage and develop two teams (e.g., platform engineering and operations): set priorities and operating rhythms, oversee delivery and support outcomes, coach/mentor team members, and drive performance management in partnership with leadership.
Be available for major outages and critical events related to job scheduling, including QEND activities up to four (4) times per fiscal year, providing incident leadership, stakeholder communications, and post-incident follow-up.
Participate in an on-call rotation and provide after-hours/weekend support as needed to maintain scheduling availability and meet business SLAs.
Support a global operating model by working flexibly across EMEA and US business hours to provide required coverage and stakeholder overlap.
Serve as escalation point for complex incidents; lead root-cause analysis and drive problem management to prevent recurrence.
Mentor and guide engineers; lead technical design reviews, documentation/runbook standards, and knowledge sharing across the organization.
Deep dive into other job scheduling teams like Automate, AS400 and Robot and assist in supervising these teams in IT Operations.

Required Qualifications

High School Diploma or equivalent
10+ years of experience in enterprise workload automation, including 7+ years of hands-on IBM TWS/IWS/IWA administration in distributed environments.
Bachelor’s degree or 10+ years of equivalent IT industry service experience
For senior/lead equivalent roles, 8+ years of relevant ITSM/major incident operations experience may be required.
IT Technology Certification is a plus.
Proven experience in a lead/architect capacity: defining platform standards/reference designs, guiding cross-team implementations, and making architecture decisions for reliability, scalability, and security.
Strong Linux/UNIX engineering and production troubleshooting experience, including performance and availability triage.
Advanced automation/scripting skills (shell plus Python and/or Perl) with experience building supported, maintainable operational tooling.
Demonstrated ability to lead complex incident response and root-cause analysis, and to drive preventative action through problem management.
Strong change leadership in regulated production environments (planning, risk management, implementation, validation, rollback) aligned with ITIL processes.
Excellent stakeholder communication and ability to influence across applications, database, infrastructure, and security teams.

Preferred Qualifications

DB2 administration experience, including High Availability Disaster Recovery (HADR); familiarity with Oracle/Postgres and SQL.
Experience with TWS/IWS integrations and APIs (REST/SOAP), event-based scheduling, and real-time/on-demand workload patterns.
Experience with Tivoli Dynamic Workload Console (TDWC/TDWB) and critical path monitoring.
Experience integrating file transfer solutions (e.g., SFTP/PGP/GPG, managed file transfer platforms) into batch workflows.
Experience with SAP and other enterprise application integrations via TWS extended agents.
Experience building dashboards/metrics and integrating with observability platforms (e.g., Grafana/Graphite).
Experience defining platform standards, leading upgrades/migrations, and coordinating cross-team delivery (e.g., change windows, cutovers, rollback planning).
Familiarity with cloud patterns and automation (e.g., infrastructure-as-code concepts, container/VM scheduling considerations) in support of workload modernization.
Hands-on experience across ITSM processes (Incident, Problem, Change, Knowledge) in an enterprise environment.
ServiceNow experience, including incident lifecycle management, documentation standards, and reporting.
Working knowledge of ITIL concepts and IT service management best practices.
Artificial Intelligence – Navigating all the AI APP – know how to communicate with it and know when not to use it when it does not meet your or the companies’ expectations
Strong analytical and problem-solving skills to investigate issues and drive resolution.
Ability to manage multiple tasks in a high-volume, high-urgency operations environment.
Strong written and verbal communication skills, including confident facilitation on conference bridges.
Able to write and review technical documentation and knowledge articles.

Skills & Tools

Workload Automation: IBM TWS/IWS/IWA, TDWC/TDWB, dynamic scheduling, JSDL
Operating Systems: Linux, UNIX (AIX/SunOS), Windows (agent support)
Databases: DB2 (HADR), Oracle/Postgres (familiarity)
Scripting: Shell, Python, Perl
ITSM/Monitoring: ITIL processes; integrations with tools such as ServiceNow, AppDynamics, OBM, Grafana/Graphite
Security: LDAP/SSO concepts, role-based access, audit/patch compliance

Tools & Platforms:

ServiceNow (incident management, dashboards, filters, and reporting).
Power BI (consume and build basic operational reports and dashboards).
Microsoft 365 (Outlook, Teams, Excel, Word, PowerPoint) for collaboration and documentation.
SharePoint and OneNote for knowledge management and operational tracking.
Job scheduling tools – Tivoli, Automate, AS400, Robot.

– With Regards,

Vamsi Sattaru | US IT Technical Recruiter

Email ID is: vam...@sritechsolutions.com

Reply all

Reply to author

Forward

0 new messages