Job Description
• Design and develop data ingestion pipelines using Apache NiFi for structured, semi-structured, and unstructured data sources.
• Build and maintain real-time and batch processing workflows to ingest data into AWS-based data lakes or warehouses.
• Configure and optimize NiFi processors, templates, and flows for scalability and reliability.
• Implement monitoring, alerting, and recovery strategies for NiFi flows and downstream processes.
• Integrate NiFi with AWS services such as S3, Lambda, Redshift, Glue, Kinesis, and RDS.
• Ensure data quality, integrity, and compliance across pipelines. • Work closely with data analysts, architects, and business teams to define data requirements and delivery models.
• Document workflows, metadata, and lineage using tools like Apache Atlas or custom catalogs.
Required Skills & Qualifications
• 6-7 years of experience in data engineering or data integration roles.
• Strong hands-on experience with Apache NiFi including custom processor development and flow orchestration.
• Solid experience with AWS services, especially: o S3, Lambda, Glue, Kinesis, Redshift, RDS
• Strong understanding of data ingestion patterns (CDC, batch, streaming).
• Proficiency in SQL, Python, or Java for data transformations or scripting.
• Experience in handling large datasets and performance optimization in data workflows.
Good to Have
• Exposure to Apache Kafka, Airflow, or Step Functions.
• Experience with data lake architectures and tools like Delta Lake, Lake Formation, or Athena.
• Knowledge of data governance and cataloging tools (AWS Glue Data Catalog, Apache Atlas).
• Experience working in Agile teams with tools like Jira, Confluence.
