Requirement - Sr. Principal Lead/Architect/Engineer Workload Automation

0 views
Skip to first unread message

vamsi s

unread,
May 8, 2026, 5:28:28 PM (3 days ago) May 8
to vam...@sritechsolutions.com

Role: Sr. Principal Lead/Architect/Engineer Workload Automation -  (IBM Tivoli-TWS/IWS)

Location: Minneapolis MN (Onsite/Hybrid)

Experience:10+

Description:

We are seeking a Lead Workload Automation Engineer / Architect to define and drive the enterprise architecture, strategy, and operational model for IBM Tivoli/IBM Workload Scheduler (TWS/IWS) across distributed environments (on-prem and cloud). This role sets platform standards and reference designs, leads modernization and major upgrades/migrations, governs reliability and security practices, and serves as the senior technical partner for application, databases, and infrastructure organizations to deliver resilient, scalable scheduling services for mission-critical workloads. In addition, assist and supervise two job scheduling teams.

Key Responsibilities

  • Own the end-to-end architecture for the TWS/IWS platform (components, topology, environments, integrations), including standards, patterns, and reference implementations.
  • Provide technical oversight for additional (3rd-party) job scheduling platforms where used; establish operating standards, integration patterns, and support processes to ensure consistent controls and reliability.
  • Lead enterprise-scale installations, upgrades, and migrations; define cutover/rollback strategies, coordinate change windows, and ensure readiness across dependent teams.
  • Lead assessments of legacy scheduler instances and batch frameworks to identify candidates for retirement, consolidation, or migration; produce target-state recommendations, sequencing/roadmaps, and risk-based migration plans.
  • Define reliability engineering practices for workload automation: availability targets, capacity planning, performance tuning, monitoring/alerting, and continuous improvement.
  • Design and validate high-availability and disaster recovery solutions (including DB2 HADR where applicable); plan and execute regular DR tests and remediate gaps.
  • Establish governance for workload onboarding and job design: scheduling standards, dependency modeling, naming conventions, calendars, critical path optimization, and SLA/SLO management.
  • Architect and productionize automation for platform operations and self-service (e.g., provisioning, reporting, batch controls) using shell/Python/Perl and enterprise tooling.
  • Own security and compliance posture: access model (LDAP/SSO), least-privilege controls, audit evidence, vulnerability remediation, and secure configuration baselines.
  • Manage and develop two teams (e.g., platform engineering and operations): set priorities and operating rhythms, oversee delivery and support outcomes, coach/mentor team members, and drive performance management in partnership with leadership.
  • Be available for major outages and critical events related to job scheduling, including QEND activities up to four (4) times per fiscal year, providing incident leadership, stakeholder communications, and post-incident follow-up.
  • Participate in an on-call rotation and provide after-hours/weekend support as needed to maintain scheduling availability and meet business SLAs.
  • Support a global operating model by working flexibly across EMEA and US business hours to provide required coverage and stakeholder overlap.
  • Serve as escalation point for complex incidents; lead root-cause analysis and drive problem management to prevent recurrence.
  • Mentor and guide engineers; lead technical design reviews, documentation/runbook standards, and knowledge sharing across the organization.
  • Deep dive into other job scheduling teams like Automate, AS400 and Robot and assist in supervising these teams in IT Operations.

Required Qualifications

  • High School Diploma or equivalent
  • 10+ years of experience in enterprise workload automation, including 7+ years of hands-on IBM TWS/IWS/IWA administration in distributed environments.
  • Bachelor’s degree or 10+ years of equivalent IT industry service experience
  • For senior/lead equivalent roles, 8+ years of relevant ITSM/major incident operations experience may be required.
  • IT Technology Certification is a plus.
  • Proven experience in a lead/architect capacity: defining platform standards/reference designs, guiding cross-team implementations, and making architecture decisions for reliability, scalability, and security.
  • Strong Linux/UNIX engineering and production troubleshooting experience, including performance and availability triage.
  • Advanced automation/scripting skills (shell plus Python and/or Perl) with experience building supported, maintainable operational tooling.
  • Demonstrated ability to lead complex incident response and root-cause analysis, and to drive preventative action through problem management.
  • Strong change leadership in regulated production environments (planning, risk management, implementation, validation, rollback) aligned with ITIL processes.
  • Excellent stakeholder communication and ability to influence across applications, database, infrastructure, and security teams.

Preferred Qualifications

  • DB2 administration experience, including High Availability Disaster Recovery (HADR); familiarity with Oracle/Postgres and SQL.
  • Experience with TWS/IWS integrations and APIs (REST/SOAP), event-based scheduling, and real-time/on-demand workload patterns.
  • Experience with Tivoli Dynamic Workload Console (TDWC/TDWB) and critical path monitoring.
  • Experience integrating file transfer solutions (e.g., SFTP/PGP/GPG, managed file transfer platforms) into batch workflows.
  • Experience with SAP and other enterprise application integrations via TWS extended agents.
  • Experience building dashboards/metrics and integrating with observability platforms (e.g., Grafana/Graphite).
  • Experience defining platform standards, leading upgrades/migrations, and coordinating cross-team delivery (e.g., change windows, cutovers, rollback planning).
  • Familiarity with cloud patterns and automation (e.g., infrastructure-as-code concepts, container/VM scheduling considerations) in support of workload modernization.
  • Hands-on experience across ITSM processes (Incident, Problem, Change, Knowledge) in an enterprise environment.
  • ServiceNow experience, including incident lifecycle management, documentation standards, and reporting.
  • Working knowledge of ITIL concepts and IT service management best practices.
  • Artificial Intelligence – Navigating all the AI APP – know how to communicate with it and know when not to use it when it does not meet your or the companies’ expectations
  • Strong analytical and problem-solving skills to investigate issues and drive resolution.
  • Ability to manage multiple tasks in a high-volume, high-urgency operations environment.
  • Strong written and verbal communication skills, including confident facilitation on conference bridges.
  • Able to write and review technical documentation and knowledge articles.

Skills & Tools

  • Workload Automation: IBM TWS/IWS/IWA, TDWC/TDWB, dynamic scheduling, JSDL
  • Operating Systems: Linux, UNIX (AIX/SunOS), Windows (agent support)
  • Databases: DB2 (HADR), Oracle/Postgres (familiarity)
  • Scripting: Shell, Python, Perl
  • ITSM/Monitoring: ITIL processes; integrations with tools such as ServiceNow, AppDynamics, OBM, Grafana/Graphite
  • Security: LDAP/SSO concepts, role-based access, audit/patch compliance

 

Tools & Platforms:

  • ServiceNow (incident management, dashboards, filters, and reporting).
  • Power BI (consume and build basic operational reports and dashboards).
  • Microsoft 365 (Outlook, Teams, Excel, Word, PowerPoint) for collaboration and documentation.
  • SharePoint and OneNote for knowledge management and operational tracking.
  • Job scheduling tools – Tivoli, Automate, AS400, Robot.

 With Regards,  

 

Vamsi Sattaru US IT Technical Recruiter


Reply all
Reply to author
Forward
0 new messages