This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Soliton is a high-technology software company working with global customers across Semiconductor, Medical, Automotive, Industry 4.0, and High-Tech domains. We are seeking an Applied AI Engineer to design, build, and deploy intelligent applications leveraging Generative AI, Large Language Models (LLMs), and modern AI engineering practices.
Job Responsibility:
Design, implement, and optimize Generative AI applications using Python and frameworks such as FastAPI
Build AI solutions using LLM frameworks like LlamaIndex and LangChain
Implement containerized deployments using Docker
Develop and optimize Retrieval-Augmented Generation (RAG) pipelines for improved information retrieval
Work with self-hosted and cloud-based vector databases for efficient search and retrieval
Design and manage knowledge graphs and graph-based RAG systems
Implement re-ranking models and retrieval optimization techniques
Apply prompt engineering and context engineering to enhance model performance
Establish guardrails to ensure safe, ethical, and compliant AI deployments
Build data preprocessing and transformation pipelines for structured and unstructured data
Perform inference using offline LLMs via platforms like Ollama or Hugging Face (Llama, Mistral)
Integrate online LLM providers such as OpenAI, Anthropic, or GCP for real-time inference
Monitor AI workflows using observability tools like MLflow or Arize Phoenix
Evaluate model performance using frameworks such as TruLens or custom-built evaluation systems
Continuously improve AI systems based on evaluation insights, metrics, and user feedback
Requirements:
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
Proven experience in AI/ML engineering and related technologies
3+ years of experience building applications using Python and asynchronous programming
Experience working with SQL and NoSQL databases
Strong problem-solving skills and ability to work in a fast-paced environment
Excellent communication and teamwork skills
Experience building Generative AI applications using Python and FastAPI
Hands-on knowledge of LLM frameworks such as LangChain or LlamaIndex
Ability to work with unstructured data (PDFs, documents, chunking, search) and structured data
Experience designing RAG-based systems, including prompt engineering and retrieval optimization
Familiarity with vector databases (Qdrant, Pinecone, Weaviate) and search solutions
Exposure to AI agents, workflows, and basic orchestration concepts
Experience using cloud platforms like Azure or AWS
Working knowledge of online and offline LLMs (OpenAI, Llama, Mistral)
Understanding of AI evaluation, monitoring, and observability concepts
Experience with Docker and CI/CD pipelines for deploying AI applications
Nice to have:
Experience with MCP clients and servers
Knowledge of multimodal LLMs for image and voice processing
Knowledge of deploying applications in cloud or on-prem infrastructure
Knowledge of fine-tuning techniques and data preparation for fine-tuning