IO
Job Description
Key responsibilities:
- Analyze, parse and clean large datasets of structured and unstructured data
- Collaborate closely with security analysts and lead developers to understand bottlenecks and to prioritize, correct and ingest cleaned data
- Develop data ingestion frameworks to handle high-volume and high-variety data, including structured and unstructured data recovered from +2000 data sources across the deep, dark and open web
- Implement data quality control measures to detect and handle data anomalies, duplicates, and inconsistencies
- Work towards optimizing data pipeline performance for scalability, reliability, and latency
- Troubleshoot data pipeline issues and data quality problems
Your skills, experience, and qualifications:
- Strong command of Linux CLI tooling and AWS CLI
- Deep understanding of common data formats such csv, json, sql, excel files
- Deep understanding of file encoding and related issues
- Strong Python scripting skills to develop tooling
- Strong problem-solving skills, with the ability to troubleshoot complex data pipeline issues
- Excellent communication and collaboration skills, with the ability to work with data scientists, engineers, and senior stakeholders
- Degree educated in Computer Science or relevant subject
- Excellent spoken and written English
- Positive, can-do approach to work, delivering on commitments
- Creative approach to problem solving, can look at existing situations and problems in novel ways and propose innovative solutions
