SO

PySpark Module Lead

Sopra Steria
Noida4-6 LPA Posted 22 May 2025
FULL TIME
Machine Learning
Spark
Deployment
Module Lead
Data Processing
+3 more

Job Description

We are seeking a highly skilled and motivated Data Engineer to join our dynamic team. As a Data Engineer, you will collaborate closely with our Data Scientists to develop and deploy machine learning models. Proficiency in below listed skills will be crucial in building and maintaining pipelines for training and inference datasets.

Responsibilities:

  • Work in tandem with Data Scientists to design, develop, and implement machine learning pipelines.
  • Utilize PySpark for data processing, transformation, and preparation for model training.
  • Leverage AWS EMR and S3 for scalable and efficient data storage and processing.
  • Implement and manage ETL workflows using Streamsets for data ingestion and transformation.
  • Design and construct pipelines to deliver high-quality training and inference datasets.
  • Collaborate with cross-functional teams to ensure smooth deployment and real-time/near real-time inferencing capabilities.
  • Optimize and fine-tune pipelines for performance, scalability, and reliability.
  • Ensure IAM policies and permissions are appropriately configured for secure data access and management.
  • Implement Spark architecture and optimize Spark jobs for scalable data processing.
Join WhatsApp Channel