This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
The Artificial Intelligence Data Engineer II designs, develops, and manages scalable data pipelines and feature stores that enable AI/Machine Learning (ML) model training and deployment across the enterprise. This position collaborates with technical team members to automate data flows, integrate structured and unstructured data sources, and optimize performance for large-scale processing. The AI Data Engineer II also implements data quality validation, metadata management, and lineage tracking to ensure trusted data delivery for AI applications in compliance with healthcare regulations.
Job Responsibility:
Design and implement scalable data pipelines for AI/ML workloads
Develop and deploy AI/ML solutions using Python, Snowpark, or cloud-native ML services
Build and manage feature stores to support model training and inference
Integrate structured and unstructured data sources from internal and external systems
Collaborate with data scientists to understand data requirements and optimize pipelines
Implement data quality checks, metadata tagging, and lineage tracking
Ensure compliance with Health Insurance Portability and Accountability Act (HIPAA), Centers for Medicare and Medicaid Services (CMS), and enterprise data governance standards
Automate data ingestion and transformation using tools like AWS Glue, Snowflake, and Informatica Data Management Cloud (IDMC)
Implement DevOps/MLOps and Continuous Integration (CI)/Continuous Delivery (CD) pipelines using git actions or similar tools
Monitor pipeline performance and troubleshoot issues in production environments
Contribute to backlog grooming and sprint planning for AI data initiatives
Perform other duties as assigned
Requirements:
Bachelor's Degree in Computer Science or Related Field
At least 5 years of experience in data engineering
At least 2 years of experience focused on AI/ML data pipelines
Hands on experience working on GenAI projects (chatbot implementations, Natural Language Processing (NLP), Sentiment Analysis, recommendation systems, anomaly detection etc.
Proficient skills in Python, SQL, Spark, AWS (Glue, S3, Lambda), Snowflake (Snowpark Container Services), IDMC, prompt engineering, model inference and fine-tuning, RAG and working with MCP, Vector databases
Proficient technical and data engineering skills
Solid understanding of supervised and unsupervised machine learning methods, feature engineering, model evaluation, and validation techniques
Ability to operationalize models in production environments, including basic MLOps practices (version control, CI/CD, reproducibility)
Ability to communicate complex AI/ML concepts effectively to non-technical stakeholders
Excellent documentation skills, ensuring reproducibility, clarity of assumptions, and transparency of model design
Strong collaboration skills, with proven ability to work cross-functionally with key stakeholders
Analytical problem-solving skills with the ability to translate business challenges into actionable AI/ML solutions
Effective written and verbal communication skills, including documentation of modeling processes, assumptions, and results
Data pipeline development and cloud platform training
Nice to have:
Master's Degree in Data Science or Related Field
Experience in health plan payer systems and regulatory data handling
Experience with Fast Healthcare Interoperability Resources (FHIR), Health Level Seven (HL7), HIPAA compliance, and healthcare data standards
Experience with FHIR, HL7, HIPAA compliance, and healthcare data standards