This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Provide front-line technical support: Diagnose and resolve production issues related to our generative AI applications, including performance bottlenecks, API errors, data inconsistencies, and infrastructure problems.
Monitor application health: Utilize monitoring tools and dashboards to track key metrics, identify anomalies, and proactively address potential issues before they impact users.
Incident Management: Follow established incident management procedures to document, escalate, and resolve production incidents, ensuring timely communication with stakeholders.
Collaborate with engineering teams: Work closely with development and infrastructure teams to identify the root cause of issues, implement fixes, and prevent future occurrences.
Develop and maintain documentation: Create and update technical documentation, including runbooks, knowledge base articles, and troubleshooting guides.
Automate support tasks: Identify opportunities to automate repetitive tasks and improve support efficiency through scripting and tooling.
Participate in on-call rotation: Provide on-call support on a rotational basis to ensure 24/7 coverage for critical applications.
Continuous Improvement: Contribute to the continuous improvement of our support processes and tools by identifying areas for optimization and implementing best practices.
Requirements
Bachelor's degree in Computer Science, Engineering, or a related field.
8-13 years of experience in application production support or a related role.
Strong understanding of software development lifecycle and DevOps principles.
Experience supporting cloud-based applications, preferably on AWS, Azure, or GCP.
Change Management , Incdient management , Problem Managemet,Stakeholder management
Proficiency in at least one scripting language (e.g., Python, Bash).
Familiarity with monitoring tools (e.g., Datadog, Prometheus, Grafana).
Excellent problem-solving and analytical skills.
Strong communication and collaboration skills.
A passion for AI and machine learning.
Nice to have
Experience with generative AI models and algorithms (e.g., GANs, VAEs, Transformers).
Knowledge of MLOps principles and practices.
Experience with containerization technologies (e.g., Docker, Kubernetes).