This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Collaborating with stakeholders for requirement gathering and design discovery, ensuring alignment between business needs and scalable data engineering solutions
Design scalable, low-latency services to host models
productionize prototypes on the cloud, including data pipelines, training
inference pipelines, and pre
post-processing routines
Designing, implementing, and optimizing new data pipelines using Apache Spark and PySpark on large-scale distributed application and AI systems
Develop and optimize data pipelines to collect, consolidate, and normalize data to feed to AI models for offline evaluation and real-time execution
Building and orchestrating ETL workflows using Apache Airflow, ensuring reliable and timely data ingestion, transformation, and delivery
Developing and optimizing scalable data pipelines for multiple tenants using Google Cloud Platform (GCP) services such as DataProc and BigQuer ensuring secure, isolated environments for each tenant
Migrating and integrating multiple data sources into a centralized architecture to improve performance, consistency, and operational efficiency of the AI evaluation
Performing data validation and quality checks across pipelines using PySpark, conducting root cause analysis for mismatches, and implementing resolutions to maintain data trust
Working with structured and semi-structured data in databases like BigQuery, Cosmos and Microsoft SQL Server for data manipulation and querying
Create monitoring dashboards
perform latency tuning of deep learning models, scaling solutions to enterprise level
investigate and resolve performance issues
Run experiments to compare models, features, and hyperparameters
utilize A/B testing and continuous monitoring to validate and adjust models
Writing and maintaining automation scripts in NodeJs and Python to streamline repetitive tasks and data operations
Managing code repositories and version control using Git, performing merges, resolving conflicts, and maintaining clean deployment practices
Maintaining documentation and dashboards in Confluence to monitor pipeline health, performance, and data insights
Collaborating cross-functionally with engineering, analytics, and product teams to ensure data quality and meet evolving business needs
Participating in testing activities using manual and automated test scripts to validate data transformations and pipeline reliability
Supporting continuous integration and deployment (CI/CD) using Docker, Jenkins, and infrastructure-as-code tools
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.