SO

PySpark Module Lead

Sopra Steria

Noida ₹4-6 LPA Posted 22 May 2025

FULL TIME

Machine Learning

Spark

Deployment

Module Lead

Data Processing

+3 more

Job Description

We are seeking a highly skilled and motivated Data Engineer to join our dynamic team. As a Data Engineer, you will collaborate closely with our Data Scientists to develop and deploy machine learning models. Proficiency in below listed skills will be crucial in building and maintaining pipelines for training and inference datasets.

Responsibilities:

Work in tandem with Data Scientists to design, develop, and implement machine learning pipelines.
Utilize PySpark for data processing, transformation, and preparation for model training.
Leverage AWS EMR and S3 for scalable and efficient data storage and processing.
Implement and manage ETL workflows using Streamsets for data ingestion and transformation.
Design and construct pipelines to deliver high-quality training and inference datasets.
Collaborate with cross-functional teams to ensure smooth deployment and real-time/near real-time inferencing capabilities.
Optimize and fine-tune pipelines for performance, scalability, and reliability.
Ensure IAM policies and permissions are appropriately configured for secure data access and management.
Implement Spark architecture and optimize Spark jobs for scalable data processing.

Required Skills

Machine Learning Spark Deployment Module Lead Data Processing Management Aws Architecture