We are seeking an experienced Big Data Engineer with strong expertise in Hadoop, Spark, and Google Cloud Platform (GCP). The ideal candidate will design, develop, and optimize large-scale data processing pipelines and analytical solutions on the cloud.
Responsibilities:Design and implement data pipelines and ETL processes using Spark, Hadoop, and GCP services (BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Storage).
Work with structured and unstructured data from multiple sources and perform data cleansing, transformation, and aggregation.
Collaborate with data scientists, analysts, and application teams to deliver scalable data solutions.
Optimize data performance and ensure reliability, availability, and scalability of data systems.
Implement data governance, quality, and security best practices.
Troubleshoot performance and data quality issues in distributed systems.
5+ years of experience in Big Data technologies.
Strong hands-on experience with Hadoop ecosystem (HDFS, Hive, Pig, Sqoop, Oozie).
Expertise in Apache Spark (Core, SQL, Streaming).
Strong experience with GCP data services – BigQuery, Dataflow, Dataproc, Composer, Cloud Storage, Pub/Sub.
Proficiency in Python/Scala/Java for data processing.
Good knowledge of SQL and data modeling concepts.
Familiarity with CI/CD, Git, and Cloud Deployment tools.
Experience with Airflow, Terraform, or Dataform.
Knowledge of Kafka or real-time streaming.
Familiarity with Docker/Kubernetes.