Job Description:

Build and scale data solutions that power smarter decisions
In this role you ll work at the intersection of software engineering and data engineering using Python PySpark and ETL to transform raw complex datasets into reliable analytics ready assets
You ll collaborate closely with data engineers analysts and stakeholders to understand requirements design efficient pipelines and deliver high quality outputs on time
If you enjoy solving performance challenges improving data quality and creating maintainable code that runs in production this is a great opportunity to grow your impact
Expect a supportive collaborative environment where ownership is encouraged learning is continuous and your contributions directly improve how teams access and trust data

Key Responsibilities:

Data Pipeline Development
Develop and maintain scalable batch ETL pipelines using Python and PySpark for data ingestion transformation and loading
Implement reusable transformation logic ensuring pipelines are modular testable and easy to maintain
Optimize Spark jobs for performance partitioning caching joins shuffles and cost efficiency
Data Quality Reliability
Apply data validation checks handle schema evolution and ensure accuracy and completeness of processed datasets
Troubleshoot pipeline failures analyze logs and implement robust error handling and retry mechanisms
Monitor job runs and support operational stability through alerts runbooks and timely incident resolution
Collaboration Delivery
Work with cross functional teams to gather requirements define data mappings and deliver datasets aligned to business needs
Participate in code reviews follow engineering best practices and contribute to continuous improvement of standards and tooling
Document pipeline logic dependencies and operational procedures for smooth handovers and long term maintainability

Technology Analytics Packages Python Big Data Technology Big Data Data Processing PySpark ETL

Bachelor s degree in Computer Science Engineering Information Systems or a related field or equivalent practical experience
2 5 years of hands on experience building data pipelines using Python and PySpark
Strong understanding of ETL concepts data transformations and handling large scale datasets
Proficiency in writing clean maintainable code and debugging production issues
Working knowledge of data structures algorithms and software development best practices

Technology->Analytics - Packages->Python - Big Data,Technology->Big Data - Data Processing->PySpark