Software Engineer (Data Engineering) Job at NStarX (Hyderabad)

Software Engineer (Data Engineering)

NStarX

Location:
India , Hyderabad

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Not provided

Save Job

Apply Position

Job Description:

We are seeking a Software Engineer (Data Engineering) who can seamlessly integrate the roles of a Data Engineer and Data Scientist. The ideal candidate will design robust data pipelines, build AI/ML models, and deliver data-driven insights that address complex business challenges. This is a client-facing role requiring close collaboration with US-based stakeholders, and the candidate must be flexible to work in alignment with US time zones when needed.

Job Responsibility:

Design, build, and maintain scalable ETL and ELT pipelines for large-scale data processing
Develop and optimize data architectures supporting analytics and ML workflows
Ensure data integrity, security, and compliance with organizational and industry standards
Collaborate with DevOps teams to deploy and monitor data pipelines in production environments
Build predictive and prescriptive models leveraging AI and ML techniques
Develop and deploy machine learning and deep learning models using TensorFlow, PyTorch, or Scikit-learn
Perform feature engineering, statistical analysis, and data preprocessing
Continuously monitor and optimize models for accuracy and scalability
Integrate AI-driven insights into business processes and strategies
Serve as the technical liaison between NStarX and client teams
Participate in client discussions, requirement gathering, and design reviews
Provide status updates, insights, and recommendations directly to client stakeholders
Work flexibly with customers based on US time zones for real-time collaboration
Design layered data lake to data mart models (raw → processed → merged → aggregated)
Implement hive-style partitioning (year/month/day) with retention and archival strategies
Define schema contracts, decision logic, and state machine handoffs
Author robust PySpark or Scala jobs for parsing, flattening, merging, and aggregation
Tune performance using broadcast joins, partition pruning, and shuffle control
Implement atomic, overwrite-by-partition writes and idempotent operations
Perform idempotent DELETE, INSERT, or MERGE operations into Redshift
Maintain audit-friendly SQL with deterministic predicates and row-level metrics
Build scalable, automated ETL pipelines with idempotency and cost efficiency
Implement schema drift checks, duplicate prevention, and partition reconciliation
Monitor EMR or Kubernetes lifecycle, cluster right-sizing, and cost tracking
Build log and event pipelines into S3 using CloudWatch, Kinesis, or Firehose
Manage bucket layouts, lifecycle rules, and data catalog consistency
Understand compression formats and Hive-style directory structures
Implement AWS Step Functions with Choice, Map, Parallel states, retries, and backoff
Automate scheduling using EventBridge and deploy guardrail Lambdas
Parameterize pipelines for multiple environments and selective recomputations

Requirements:

4+ years in Data Engineering and AI/ML roles
Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field
Python, SQL, Bash, PySpark, Spark SQL, boto3, pandas
Apache Spark on EMR (driver/executor model, sizing, dynamic allocation)
Amazon S3 (Parquet) with lifecycle management to Glacier
AWS Glue Catalog and Crawlers
AWS Step Functions, AWS Lambda, Amazon EventBridge
CloudWatch Logs and Metrics, Kinesis Data Firehose (or Kafka/MSK)
Amazon Redshift and Redshift Spectrum
IAM (least privilege), Secrets Manager, SSM
Git with CI pipelines (Jenkins, GitHub, GitLab), CloudWatch monitoring
Strong analytical and problem-solving capabilities
Excellent communication for client engagement and stakeholder presentations
Ability to work flexibly with global and US-based teams
Team-oriented, proactive, and adaptable in fast-paced environments

Nice to have:

Scala, Docker, Kubernetes (Spark-on-Kubernetes), k9s
Fast data stores such as DynamoDB, MongoDB, or Redis
Databricks and Jupyter notebooks
FinOps exposure including cost baselines and dashboards
Experience with MLOps and end-to-end AI/ML deployment pipelines
Knowledge of NLP and Computer Vision
Certifications in AI/ML, AWS, Azure, or GCP

What we offer:

Competitive salary and performance-based incentives
Opportunity to work on cutting-edge AI and ML projects
Exposure to global clients and international project delivery
Continuous learning and professional development opportunities
Competitive base + commission
Fast growth into leadership roles

Additional Information:

Job Posted:
December 26, 2025

Employment Type:

Fulltime

Work Type:

On-site work

NStarX - All Job Offers

Job Link Share: