This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a Software Engineer (Data Engineering) who can seamlessly integrate the roles of a Data Engineer and Data Scientist. The ideal candidate will design robust data pipelines, build AI/ML models, and deliver data-driven insights that address complex business challenges. This is a client-facing role requiring close collaboration with US-based stakeholders, and the candidate must be flexible to work in alignment with US time zones when needed.
Job Responsibility:
Design, build, and maintain scalable ETL and ELT pipelines for large-scale data processing
Develop and optimize data architectures supporting analytics and ML workflows
Ensure data integrity, security, and compliance with organizational and industry standards
Collaborate with DevOps teams to deploy and monitor data pipelines in production environments
Build predictive and prescriptive models leveraging AI and ML techniques
Develop and deploy machine learning and deep learning models using TensorFlow, PyTorch, or Scikit-learn
Perform feature engineering, statistical analysis, and data preprocessing
Continuously monitor and optimize models for accuracy and scalability
Integrate AI-driven insights into business processes and strategies
Serve as the technical liaison between NStarX and client teams
Participate in client discussions, requirement gathering, and design reviews
Provide status updates, insights, and recommendations directly to client stakeholders
Work flexibly with customers based on US time zones for real-time collaboration
Design layered data lake to data mart models (raw → processed → merged → aggregated)
Implement hive-style partitioning (year/month/day) with retention and archival strategies
Define schema contracts, decision logic, and state machine handoffs
Author robust PySpark or Scala jobs for parsing, flattening, merging, and aggregation
Tune performance using broadcast joins, partition pruning, and shuffle control
Implement atomic, overwrite-by-partition writes and idempotent operations
Perform idempotent DELETE, INSERT, or MERGE operations into Redshift
Maintain audit-friendly SQL with deterministic predicates and row-level metrics
Build scalable, automated ETL pipelines with idempotency and cost efficiency
Implement schema drift checks, duplicate prevention, and partition reconciliation
Monitor EMR or Kubernetes lifecycle, cluster right-sizing, and cost tracking
Build log and event pipelines into S3 using CloudWatch, Kinesis, or Firehose
Manage bucket layouts, lifecycle rules, and data catalog consistency
Understand compression formats and Hive-style directory structures
Implement AWS Step Functions with Choice, Map, Parallel states, retries, and backoff
Automate scheduling using EventBridge and deploy guardrail Lambdas
Parameterize pipelines for multiple environments and selective recomputations
Requirements:
4+ years in Data Engineering and AI/ML roles
Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field
Welcome to CrawlJobs.com – Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.