CrawlJobs Logo

Software Engineer (Data Engineering)

India, Hyderabad · Job Posted December 26, 2025
Apply Position
Job Link Share

Job Description

We are seeking a Software Engineer (Data Engineering) who can seamlessly integrate the roles of a Data Engineer and Data Scientist. The ideal candidate will design robust data pipelines, build AI/ML models, and deliver data-driven insights that address complex business challenges. This is a client-facing role requiring close collaboration with US-based stakeholders, and the candidate must be flexible to work in alignment with US time zones when needed.

Job Responsibility

  • Design, build, and maintain scalable ETL and ELT pipelines for large-scale data processing
  • Develop and optimize data architectures supporting analytics and ML workflows
  • Ensure data integrity, security, and compliance with organizational and industry standards
  • Collaborate with DevOps teams to deploy and monitor data pipelines in production environments
  • Build predictive and prescriptive models leveraging AI and ML techniques
  • Develop and deploy machine learning and deep learning models using TensorFlow, PyTorch, or Scikit-learn
  • Perform feature engineering, statistical analysis, and data preprocessing
  • Continuously monitor and optimize models for accuracy and scalability
  • Integrate AI-driven insights into business processes and strategies
  • Serve as the technical liaison between NStarX and client teams
  • Participate in client discussions, requirement gathering, and design reviews
  • Provide status updates, insights, and recommendations directly to client stakeholders
  • Work flexibly with customers based on US time zones for real-time collaboration
  • Design layered data lake to data mart models (raw → processed → merged → aggregated)
  • Implement hive-style partitioning (year/month/day) with retention and archival strategies
  • Define schema contracts, decision logic, and state machine handoffs
  • Author robust PySpark or Scala jobs for parsing, flattening, merging, and aggregation
  • Tune performance using broadcast joins, partition pruning, and shuffle control
  • Implement atomic, overwrite-by-partition writes and idempotent operations
  • Perform idempotent DELETE, INSERT, or MERGE operations into Redshift
  • Maintain audit-friendly SQL with deterministic predicates and row-level metrics
  • Build scalable, automated ETL pipelines with idempotency and cost efficiency
  • Implement schema drift checks, duplicate prevention, and partition reconciliation
  • Monitor EMR or Kubernetes lifecycle, cluster right-sizing, and cost tracking
  • Build log and event pipelines into S3 using CloudWatch, Kinesis, or Firehose
  • Manage bucket layouts, lifecycle rules, and data catalog consistency
  • Understand compression formats and Hive-style directory structures
  • Implement AWS Step Functions with Choice, Map, Parallel states, retries, and backoff
  • Automate scheduling using EventBridge and deploy guardrail Lambdas
  • Parameterize pipelines for multiple environments and selective recomputations

Requirements

  • 4+ years in Data Engineering and AI/ML roles
  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field
  • Python, SQL, Bash, PySpark, Spark SQL, boto3, pandas
  • Apache Spark on EMR (driver/executor model, sizing, dynamic allocation)
  • Amazon S3 (Parquet) with lifecycle management to Glacier
  • AWS Glue Catalog and Crawlers
  • AWS Step Functions, AWS Lambda, Amazon EventBridge
  • CloudWatch Logs and Metrics, Kinesis Data Firehose (or Kafka/MSK)
  • Amazon Redshift and Redshift Spectrum
  • IAM (least privilege), Secrets Manager, SSM
  • Git with CI pipelines (Jenkins, GitHub, GitLab), CloudWatch monitoring
  • Strong analytical and problem-solving capabilities
  • Excellent communication for client engagement and stakeholder presentations
  • Ability to work flexibly with global and US-based teams
  • Team-oriented, proactive, and adaptable in fast-paced environments

Nice to have

  • Scala, Docker, Kubernetes (Spark-on-Kubernetes), k9s
  • Fast data stores such as DynamoDB, MongoDB, or Redis
  • Databricks and Jupyter notebooks
  • FinOps exposure including cost baselines and dashboards
  • Experience with MLOps and end-to-end AI/ML deployment pipelines
  • Knowledge of NLP and Computer Vision
  • Certifications in AI/ML, AWS, Azure, or GCP

What we offer

  • Competitive salary and performance-based incentives
  • Opportunity to work on cutting-edge AI and ML projects
  • Exposure to global clients and international project delivery
  • Continuous learning and professional development opportunities
  • Competitive base + commission
  • Fast growth into leadership roles

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Software Engineer (Data Engineering)

8 matching positions

Software Engineer / Senior Software Engineer - Data Engineering GitHub

As a Software Engineer at GitHub, you will enhance the collaboration experience ...
Location
Location
Czech Republic , Multiple Locations
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND experience in Data Engineering and coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, Ruby, Rust, or Python OR equivalent experience.
  • Master's Degree in Computer Science or related technical field AND engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, Ruby, Rust or Python OR Bachelor's Degree in Computer Science or related technical field AND engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, Ruby, Rust or Python OR equivalent experience.
Job Responsibility
Job Responsibility
  • Design, develop, test and ship high-quality technical solutions that scale across multiple GitHub services.
  • Collaborate with cross-functional teams to define and implement innovative solutions.
  • Provide technical leadership, mentorship, pairing opportunities, and code reviews to encourage the growth of others.
  • Own and advocate for the health and quality of the systems that the team builds, including participating in on-call and first responder rotations
  • Write architecture briefs and proposals, carry out code experiments, and build prototypes to learn how we can achieve planetary scale with our systems.
  • Design and implement APIs to facilitate seamless integration between software components.
  • Utilize CI/CD tools to set up automated pipelines for continuous integration and delivery.
  • Become intimately familiar with the systems you build and take pride in writing maintainable code.
  • Fulltime
Read More
Arrow Right

Software engineer 2 / Senior Software engineer - Azure Data

Microsoft's Azure Data engineering team is leading the transformation of analyti...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical field AND 3+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • OR equivalent experience
  • Experience with the Azure stack including Storage, Compute, Networking, Fabric, Purview, Synapse, AKS, DevOps, Data Factory, or Power BI
  • Experience with big data technologies such as Spark, Kafka, Hadoop, or HBase
  • Experience building data lake or data engineering products, tools, or pipelines
  • Familiarity with container-based architectures (Docker, Kubernetes)
  • Ability to debug complex distributed systems on Linux and/or Windows platforms
Job Responsibility
Job Responsibility
  • Write extensible, maintainable code in C#, Java, Scala, or Python for Fabric Materialized Lake View services and HDInsight components
  • Use AI tools and coding best practices across the development lifecycle
  • Design data refresh, scheduling, and query optimisation features with minimal supervision
  • Review code from teammates for correctness, test coverage, security risks, and adherence to team standards
  • Coach junior engineers through code reviews
  • Debug complex issues in distributed systems running on Azure, Linux, and Windows
  • Run live site operations on a rotational, on-call basis
  • Integrate logging and instrumentation to gather telemetry on system health, performance, reliability, and security
  • Work with product managers, technical leads, and partners across geographies to define customer requirements for Materialized Lake View features
  • Fulltime
Read More
Arrow Right

Sr. Software Engineer (Data Engineering)

Location
Location
Canada , Toronto
Salary
Salary:
173000.00 - 192000.00 CAD / Year
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of full-time engineering experience
  • Experience working with multiple multi-functional teams (product, science, product ops etc)
  • Proficient in Big Data architecture, ETL frameworks and platforms
  • Expertise in one or more object-oriented programming languages (e.g. Python, Go, Java, C++) and the eagerness to learn more
  • Expert in data-driven architecture and systems design
  • BS/MS/Phd in Computer Science or related field required
Job Responsibility
Job Responsibility
  • Work on creating a platform that powers data driven decision making for Uber Rides and Eats line of business
  • Be a technical lead for a team that works closely with sciences team to implement and productionize statistical models
  • Design, develop, and deploy new systems to empower fast data-driven decisions
  • Build distributed backend systems serving real-time analytics and machine learning features at Uber scale
  • Work with the product and science teams to build and drive the technical roadmap and vision for the team
What we offer
What we offer
  • Eligible to participate in Uber's bonus program
  • may be offered an equity award & other types of comp
  • eligible for various benefits
  • Fulltime
Read More
Arrow Right

Principal Software Engineer - Data Engineering

As a Principal Software Engineer at GitHub, you will enhance the collaboration e...
Location
Location
Czech Republic , Multiple Locations
Salary
Salary:
Not provided
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's Degree in Computer Science or related technical discipline with proven experience maintaining and delivering production software coding in languages including, but not limited to, Go, Ruby, Rust, Python, JavaScript, C, C++, C#, Java
  • Experience with designing a data strategy and leading the development of its core components by building and optimizing scalable data pipelines, integrations, and robust data models that solve complex business challenges
  • Experience working closely with product management, design, and other engineering teams to drive cross-functional projects and deliver high-quality products
Job Responsibility
Job Responsibility
  • Design, develop, test and ship high-quality technical solutions that scale across multiple GitHub services and become intimately familiar with the systems you build and take pride in writing maintainable code
  • Provide technical leadership, mentorship, pairing opportunities, and code reviews to encourage the growth of others
  • support teams in producing extensible and maintainable code, ensuring integration with downstream dependencies and adherence to quality standards
  • Own and advocate for the health and quality of the systems that the team builds, including participating in on-call for first responder rotations and live incidents
  • Write architecture briefs and proposals and carry out code experiments
  • Design and implement APIs to facilitate seamless integration between software components
  • Utilize CI/CD tools to set up automated pipelines for continuous integration and delivery
  • Collaborate with cross-functional teams and partner with stakeholders and lead discussions for technical solutions, including design and cost considerations
  • Create and guide others in 1) developing clear testing plans to assure solution quality, reliability, and performance
  • 2) defining success metrics
  • Fulltime
Read More
Arrow Right

Software Engineer - Data Engineering

At Catawiki, data sits at the core of our decision-making, powering everything f...
Location
Location
Netherlands , Amsterdam
Salary
Salary:
Not provided
catawiki.com Logo
Catawiki
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of hands-on experience building and operating data systems in production
  • Fluent in Python and SQL
  • Experience with data integration tools such as Fivetran and/or Airbyte
  • Experience with CI/CD, Infrastructure as Code (e.g. Terraform), and modern DataOps practices
  • Experience with cloud platforms (GCP is a plus)
  • Familiar with parts of the data stack such as BigQuery, PubSub, DataFlow, GKE, Airflow, Airbyte, FastAPI and Prometheus
  • Experience with streaming pipelines using technologies like Kafka, Pub/Sub, Dataflow, or Apache Beam
  • Keen to learn new tools, support data platform and machine learning engineering initiatives
  • Understand the importance of data privacy and GDPR
Job Responsibility
Job Responsibility
  • Build and Scale Data Pipelines: Maintain and develop reliable batch and streaming pipelines that ingest data from internal systems and third-party sources into Catawiki’s data warehouse
  • Empower Data Science and AI: Maintain and enhance the tools and platforms used by Data Scientists for analysis, experimentation, model training, and model deployment
  • Protect Data and Privacy: Ensure data is stored securely and that governance, access control, and privacy standards are consistently applied across the data platform
  • Run and Evolve the Data Platform: Maintain the infrastructure that hosts our data tools and applications, keeping it scalable, stable, and cost-effective
  • Own Core Data Tooling: Self-host and operate key data engineering tools such as Airflow and Airbyte on Kubernetes
  • Keep the Lights On: Provide operational support to ensure pipelines, platforms, and tools run smoothly and reliably for teams across the business
What we offer
What we offer
  • €100 Catavoucher when you join
  • €50 Catavoucher on your birthday
  • An extra day off each year to “Pursue Your Passion”
  • Additional leave for key work anniversaries and important life events
  • Fulltime
Read More
Arrow Right

Staff Software Engineer, Data Engineering

You'll own Gamma's data infrastructure and architecture as we scale to hundreds ...
Location
Location
United States , San Francisco
Salary
Salary:
230000.00 - 310000.00 USD / Year
gamma.app Logo
Gamma
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of experience as a data engineer or software engineer working on data infrastructure with deep expertise in distributed systems
  • Expert-level knowledge of event streaming platforms, especially Apache Kafka (producers, consumers, Kafka Connect, stream processing)
  • Extensive hands-on experience with Snowflake, including performance optimization, cost management, and data modeling at massive scale
  • Strong understanding of relational databases (particularly Postgres) and experience with CDC patterns and replication strategies in distributed environments
  • Proven track record architecting and leading major data infrastructure initiatives that handled orders of magnitude growth
  • Experience establishing data engineering best practices and driving technical strategy across organizations
  • Strong communication skills and experience influencing technical direction across engineering, analytics, and leadership
Job Responsibility
Job Responsibility
  • Own and evolve our end-to-end event pipeline architecture, from Kafka ingestion through Snowflake analytics, setting technical direction for data infrastructure
  • Design and architect distributed data systems that scale to orders of magnitude more data volume while maintaining world-class query performance
  • Lead initiatives to build and optimize CDC (change data capture) pipelines and streaming data transformations at massive scale
  • Establish best practices for data quality, pipeline reliability, and system observability across the organization
  • Drive strategic technical decisions about data modeling, infrastructure architecture, and technology choices
  • Mentor engineers and elevate data engineering practices across analytics, product, and engineering teams
What we offer
What we offer
  • competitive equity
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Data Engineering

Join us in building the future of finance. Our mission is to democratize finance...
Location
Location
United States , Menlo Park
Salary
Salary:
196000.00 - 230000.00 USD / Year
robinhood.com Logo
Robinhood
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of professional experience building end-to-end data pipelines
  • Hands-on software engineering experience, with the ability to write production-level code in Python for user-facing applications, services, or systems (not just data scripting or automation)
  • Expert at building and maintaining large-scale data pipelines using open source frameworks (Spark, Flink, etc)
  • Strong SQL (Presto, Spark SQL, etc) skills
  • Experience solving problems across the data stack (Data Infrastructure, Analytics and Visualization platforms)
  • Expert collaborator with the ability to democratize data through actionable insights and solutions
Job Responsibility
Job Responsibility
  • Help define and build key datasets across all Robinhood product areas. Lead the evolution of these datasets as use cases grow
  • Build scalable data pipelines using Python, Spark and Airflow to move data from different applications into our data lake
  • Partner with upstream engineering teams to enhance data generation patterns
  • Partner with data consumers across Robinhood to understand consumption patterns and design intuitive data models
  • Ideate and contribute to shared data engineering tooling and standards
  • Define and promote data engineering best practices across the company
What we offer
What we offer
  • Market competitive and pay equity-focused compensation structure
  • 100% paid health insurance for employees with 90% coverage for dependents
  • Annual lifestyle wallet for personal wellness, learning and development, and more
  • Lifetime maximum benefit for family forming and fertility benefits
  • Dedicated mental health support for employees and eligible dependents
  • Generous time away including company holidays, paid time off, sick time, parental leave, and more
  • Lively office environment with catered meals, fully stocked kitchens, and geo-specific commuter benefits
  • Fulltime
Read More
Arrow Right

Sr Software Engineer - Data Engineering

As an Engineer on the Data Intelligence team, you will be dealing with large sca...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
uber.com Logo
Uber
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of extensive Data engineering experience working with large data volumes and different sources of data
  • Strong data modeling skills, domain knowledge and domain mapping experience
  • Strong experience of using SQL language and writing complex queries
  • Experience with using other programming languages like Java, Scala, Python
  • Good problem solving and analytical skills
  • Good communication, mentoring and collaboration skills
Job Responsibility
Job Responsibility
  • Responsible for defining the Source of Truth (SOT), Dataset designfor multiple Uber teams
  • Identify unified data models collaborating with Data Science teams
  • Streamline data processing of the original event sources and consolidate them in source of truth event logs
  • Build and maintain real-time/batch data pipelines that can consolidate and clean up usage analytics
  • Build systems that monitor data losses from the different sources and improve the data quality
  • Own the data quality and reliability of the Tier-1 & Tier-2 datasets including maitaining their SLAs, TTL and consumption
  • Devise strategies to consolidate and compensate the data losses by correlating different sources
  • Solve challenging data problems with cutting edge design and algorithms
Read More
Arrow Right