This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
This is a great opportunity to be part of one of the fastest-growing infrastructure companies in history, an organization that is in the center of the hurricane being created by the revolution in artificial intelligence. VAST Data is the data platform company for the AI era. We are building the enterprise software infrastructure to capture, catalog, refine, enrich, and protect massive datasets and make them available for real-time data analysis and AI training and inference. Designed from the ground up to make AI simple to deploy and manage, VAST takes the cost and complexity out of deploying enterprise and AI infrastructure across data center, edge, and cloud. We are seeking an experienced Solutions Data Engineer who possess both technical depth and strong interpersonal skills to partner with internal and external teams to develop scalable, flexible, and cutting-edge solutions. Solutions Engineers collaborate with operations and business development to help craft solutions to meet customer business problems.
Job Responsibility:
Build distributed data pipelines using technologies like Kafka, Spark (batch & streaming), Python, Trino, Airflow, and S3-compatible data lakes
Design, deploy, and troubleshoot hybrid cloud/on-prem environments using Terraform, Docker, Kubernetes, and CI/CD automation tools
Implement event-driven and serverless workflows
Create technical guides, architecture docs, and demo pipelines
Integrate data validation, observability tools, and governance directly into the pipeline lifecycle
Own end-to-end platform lifecycle: ingestion → transformation → storage (Parquet/ORC on S3) → compute layer (Trino/Spark)
Benchmark and tune storage backends (S3/NFS/SMB) and compute layers for throughput, latency, and scalability using production datasets
Work cross-functionally with R&D to push performance limits across interactive, streaming, and ML-ready analytics workloads
Operate and debug object store–backed data lake infrastructure
Requirements:
2–4 years in software / solution or infrastructure engineering
2–4 years focused on building / maintaining large-scale data pipelines / storage & database solutions
Proficiency in Trino, Spark (Structured Streaming & batch) and solid working knowledge of Apache Kafka
Coding background in Python (must-have)
Deep understanding of data storage architectures including SQL, NoSQL, and HDFS
Solid grasp of DevOps practices, including containerization (Docker), orchestration (Kubernetes), and infrastructure provisioning (Terraform)
Experience with distributed systems, stream processing, and event-driven architecture
Hands-on familiarity with benchmarking and performance profiling for storage systems, databases, and analytics engines
Excellent communication skills
Nice to have:
familiarity with Bash and scripting tools is a plus