OR

Data Scientist 3

Oracle
Bangalore3-12 LPA Posted 24 Oct 2025
FULL TIME
Pytorch
Python

Job Description

  • Design and develop large-scale datasets to power generative AI models in multimodal domains (e.g., text, vision, speech), with a focus on synthetic data creation.
  • Build robust pipelines and tooling for data acquisition, cleaning, transformation, and quality assurance to support model training and evaluation.
  • Research, implement, and adapt cutting-edge techniques (e.g., fine-tuning, RLHF, data augmentation) to align generative models with domain-specific needs.
  • Curate and annotate datasets, ensuring diversity, representativeness, and compliance with responsible AI practices.
  • Evaluate open-source and research models, integrating best practices into data generation workflows.
  • Collaborate with engineering teams to ensure datasets and synthetic data pipelines are scalable, reliable, and production ready.
  • Develop metrics and benchmarking frameworks to assess data quality, model alignment, and downstream impact across modalities.
  • Partner cross-functionally with product, research, and infrastructure teams to drive innovation in data preparation and generative AI applications.

Qualification and Skills:

  • Bachelors or Master s in Computer Science, Data Science, AI/ML, or related field with 3+ years of industry experience.
  • Proficiency in Python and solid foundation in applied ML methods.
  • Proficiency with Pytorch, Torchvision, OpenCV, and similar, as well as building and deploying DNN models in production.
  • Experience building large-scale data pipelines for acquisition, cleaning, augmentation, and validation.
  • Ability to evaluate datasets for distribution, diversity, anomalies and fairness to assess overall quality and suitability for generative AI.
  • Experience with Computer Vision, NLP, Transformers, Large Language Models, Generative AI, optimizations around LLM training and serving. Experience with multimodal models a bonus.
  • Proven track record of delivering scalable, data-centric ML solutions.

Required Skills

Join WhatsApp Channel