Position: Big Data Engineer (Lead)
Location: Malven, PA Hybrid Schedule (T/W/th) M and F remote (no flexibility on this at all)
USC/GC Only
Interview method: video
Start Date: 3-4 weeks from offer
(If relocating, must actually relocate on day 1. Talent cannot commute via flying weekly.)
Job Description
We are seeking a highly experienced and hands-on Senior Data Engineer to join our Data Engineering teams. You will play a key role in supplementing existing capacity, upgrading our data architecture, and ensuring the highest quality, performance, and cost-efficiency of our data platforms. The work is focused on critical deliverables for personal investment, personal wealth, and comprehensive data analytics, while preparing the platform for a larger strategic move in the future.
Key Responsibilities
· Design, build, and maintain high-performance ETL/ELT data pipelines using Python and PySpark.
· Apply expert-level coding skills to develop and manage data processing jobs leveraging PySpark for distributed computing across large-scale datasets.
· Take full ownership of the data workflow, including getting data from multiple sources, scrubbing, and validating data to ensure the highest quality.
· Write and optimize complex, performant SQL queries for data extraction, integrity checks, and performance tuning.
· Contribute to platform modernization by exploring and increasing the adoption of AI/ML, including using tools like Copilot and Claude for acceleration, and building models to fill data gaps or improve systems.
· Collaborate with data architects by proposing ideas and great questions, taking ownership as the expert on data, pipelines, and systems.
· Implement DevOps practices for the automated deployment and orchestration of Python applications and data pipelines (e.g., using Docker, Jenkins, Terraform).
· Hands on experience with SQL and complex performance tuning.
Required Technical Skills
· Programming: Expert-level proficiency in Python, including libraries like Pandas and NumPy.
· Designing: Designing data pipelines for the data coming from multiple sources
· Data Processing: Solid hands-on experience with PySpark for building scalable data workflows
· Data Querying: Expert-level knowledge of writing complex SQL queries (Oracle or Snowflake), with proven ability to perform performance tuning on large datasets and complex database code.
· Cloud Platform: Robust experience with AWS cloud services and associated data services, specifically:
· AWS Glue (ETL)
· S3
· Lambda
· Redshift
· DynamoDB, Athena, ECS, EventBridge, OpenSearch, RDS
· ETL & Data Management: Robust proficiency in ETL/ELT methodologies and tools, as well as Data Quality, Data Validation, and Anomaly Detection techniques.
· Scripting: Working experience with scripting and automation using Unix and Python.
Desired Skills & Professional Attributes
· Familiarity with AI/ML and Large Language Model (LLM) approaches to data analysis and validation.
· Knowledge of data warehousing concepts and data modeling techniques.
· Experience with DevOps, Continuous Integration, and Continuous Delivery (e.g., Jenkins, GitHub).
· Experience with BI Reporting tools such as Power BI or Tableau.
· Robust preference for candidates with prior experience in the investment data domain.
· Ability to work independently through complex data challenges and robust analytical and problem-solving skills.
Regards:
Dhananjay Yadav
Technical Recruiter
1 Point System LLC
115 Stone Village Drive,
Suite C, Fort Mill, SC (29708).
Fax : 803-832-7973 | W : www.1pointsys.com