CrawlJobs Logo

Infrastructure Software Engineer

etched.com Logo

Etched

Location Icon

Location:
United States , San Jose

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

150000.00 - 250000.00 USD / Year

Job Description:

Building cutting-edge model-specific ASICs requires crafting custom infrastructure and toolchains to support ultra-fast, reliable, and scalable development across the stack - from simulation to silicon. We build this infrastructure as software - and we engineer it with the same best practices we apply to our products. We use the same rigor, design discipline, and quality standards and testing as we do to our ASIC, software, and platform. You will lead the development and adoption of next-generation infrastructure tooling, enabling Etched ASIC, Software, and Platform engineers to iterate faster, build more reliably, and push the boundaries of AI performance. This includes building and scaling our hybrid high-performance compute (HPC) cluster, optimized for massively parallel CI, EDA workflows, Emulation, and hardware-aware job execution. You’ll also architect and implement a state-of-the-art observability stack with LLM integration and a strong emphasis on streaming health and performance telemetry, log aggregation, distributed tracing, insight generation, synthetic testing, and smart alerting - across CI pipelines, simulation clusters, and service endpoints. This role demands a strong software engineering mindset, quality instincts, and deep understanding of systems. It’s not just about writing scripts - it’s about writing code that builds and manages infrastructure with precision, repeatability, and intent.

Job Responsibility:

  • Architect and Scale Distributed Compute Systems: Design and build the orchestration layers that drive our hybrid high-performance clusters—enabling simulation, synthesis, and continuous integration of AI ASICs at unprecedented scale
  • Build Infrastructure-as-Code Systems: Develop and maintain a fully programmable infrastructure control plane to ensure reproducibility, auditability, and rapid iteration across the entire stack
  • Optimize End-to-End Developer Experience: Create tools and abstractions that empower engineers to harness massive parallelism without worrying about the underlying complexity
  • Workload Elasticity, Reliability, and Efficiency: Prototype and execute workload orchestration and migration strategies between on-premise and cloud environments, balancing performance, storage availability and replication, uptime, and cost across heterogeneous hardware and compute backends
  • Implement real-time telemetry, tracing systems that surface insights from millions of metrics, enabling proactive debugging and system optimization
  • Push the Limits of Observability: Build a full observability stack that includes dashboards, alerting, automated responses, and a synthetic testing framework to proactively test infrastructure performance and reliability for various application and data flows, ensuring we remain proactive against issues impacting development and productivity workflows

Requirements:

  • Are a systems-minded software engineer who loves building foundational platforms, working close to the metal and cloud, solving high-leverage problems at scale
  • Are a deeply technical engineer who treats infrastructure as a software problem - prioritizing clean abstractions, version control, small change lists, easy roll backs, testing, and long-term maintainability over ad hoc configuration
  • Have strong programming skills in languages such as Python, Go, Rust, and C++, and are comfortable building production-grade tooling
  • Possess expert-level knowledge of Linux, virtualization, containerization, and CI/CD pipelines, with a deep understanding of how to debug, optimize, and scale complex systems
  • Are familiar with Infrastructure as Code tools like OpenTofu, Ansible, or Puppet, and enjoy designing declarative, reproducible infrastructure systems
  • Understand and use PromQL and other telemetry/query languages and have used LLM to extract insight from real-time metrics, and know how to architect and tune observability stacks
  • Have a track record of debugging and resolving difficult hardware-software integration problems across bare-metal systems, networks, and distributed workloads
  • Can lead and mentor technical teams, guiding design decisions and helping others develop sound engineering instincts
  • Have 8+ years of experience in infrastructure engineering, systems programming, or backend software development - ideally in environments where performance, scale, or hardware interaction mattered
  • Are driven by curiosity, take initiative, and have an innate sense of ownership — you thrive in uncharted territory, design for edge cases, and love making systems more powerful, reliable, and elegant

Nice to have:

  • Familiarity with Bazel build system
  • Deep understanding of ASIC development flows, especially those involving Synopsys, Cadence, and Verilator, including how EDA tools interact with infrastructure for simulation, synthesis, and verification
  • Hands-on experience architecting systems with AWS, GCP, or Azure, including hybrid on-prem/cloud deployments, workload migration strategies, and cloud-native orchestration tooling
  • Experience monitoring, provisioning, and debugging bare-metal servers, network hardware, and high-performance storage systems in rack-scale environments
  • Comfortable in profiling and optimizing compute environments for single-threaded latency, memory-bound workloads, or I/O throughput, especially in the context of simulation or CI performance
  • Proficiency building or operating telemetry systems at scale using Prometheus, Grafana, Loki, VictoriaMetrics, and tools for distributed tracing, log aggregation, and real-time alerting across heterogeneous mediums (SMS, email, push alerts, etc.)
What we offer:
  • Medical, dental, and vision packages with generous premium coverage
  • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office

Additional Information:

Job Posted:
February 18, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Infrastructure Software Engineer

Software Engineer, Data Infrastructure

The Data Infrastructure team at Figma builds and operates the foundational platf...
Location
Location
United States , San Francisco; New York
Salary
Salary:
149000.00 - 350000.00 USD / Year
figma.com Logo
Figma
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of Software Engineering experience, specifically in backend or infrastructure engineering
  • Experience designing and building distributed data infrastructure at scale
  • Strong expertise in batch and streaming data processing technologies such as Spark, Flink, Kafka, or Airflow/Dagster
  • A proven track record of impact-driven problem-solving in a fast-paced environment
  • A strong sense of engineering excellence, with a focus on high-quality, reliable, and performant systems
  • Excellent technical communication skills, with experience working across both technical and non-technical counterparts
  • Experience mentoring and supporting engineers, fostering a culture of learning and technical excellence
Job Responsibility
Job Responsibility
  • Design and build large-scale distributed data systems that power analytics, AI/ML, and business intelligence
  • Develop batch and streaming solutions to ensure data is reliable, efficient, and scalable across the company
  • Manage data ingestion, movement, and processing through core platforms like Snowflake, our ML Datalake, and real-time streaming systems
  • Improve data reliability, consistency, and performance, ensuring high-quality data for engineering, research, and business stakeholders
  • Collaborate with AI researchers, data scientists, product engineers, and business teams to understand data needs and build scalable solutions
  • Drive technical decisions and best practices for data ingestion, orchestration, processing, and storage
What we offer
What we offer
  • equity
  • health, dental & vision
  • retirement with company contribution
  • parental leave & reproductive or family planning support
  • mental health & wellness benefits
  • generous PTO
  • company recharge days
  • a learning & development stipend
  • a work from home stipend
  • cell phone reimbursement
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Infrastructure

The InfraOps team’s primary goal is to enable and empower Kiddom’s engineering b...
Location
Location
United States , New York City
Salary
Salary:
160000.00 - 200000.00 USD / Year
kiddom.co Logo
Kiddom
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS or MS in Computer Science or a related field
  • 5+ years professional software engineering experience
  • Experience with Java, or Python, Go, Clojure in a production environment
  • Experience designing and building REST APIs
  • Exposure to authorization technologies (OAuth)
  • Experience with continuous integration and automation tools and processes
  • Strong knowledge of design patterns and software engineering best practices
  • Excellent problem solving and debugging skills
  • Strong acumen or exposure to DevOps or SRE methodologies
  • Keen sense for SecOps.
Job Responsibility
Job Responsibility
  • Evangelizing and fostering a healthy DevOps culture here at Kiddom, working with teams to establish best practices and help guide new and existing services.
  • Practicing Infrastructure as Code (IaC) wherever possible, giving us the confidence in repeatable processes that can be automated.
  • Grow our DevOps efforts from small scale to large scale multi-region
  • Share ownership of the entire infrastructure architecture
  • Aim for high availability, high resiliency
  • Support the engineering team with tools to evaluate the performance of their code in production environments, speed up CI/CD pipeline, & feature verification
  • support the engineering team with tools to speed up CI/CD pipeline, feature verification
  • Design and build a scalable, generalized framework for third-party API integrations
  • Leverage existing infrastructure and components to build RESTful web services
  • Build APIs and robust testing environments for internal and external developers
What we offer
What we offer
  • Competitive salary
  • Meaningful equity
  • Health insurance benefits: medical (various PPO/HMO/HSA plans), dental, vision, disability and life insurance
  • One Medical membership (in participating locations)
  • Flexible vacation time policy (subject to internal approval). Average use 4 weeks off per year.
  • 10 paid sick days per year (pro rated depending on start date)
  • Paid holidays
  • Paid bereavement leave
  • Paid family leave after birth/adoption. Minimum of 16 paid weeks for birthing parents, 10 weeks for caretaker parents. Meant to supplement benefits offered by State.
  • Commuter and FSA plans
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - ML Infrastructure

We build simple yet innovative consumer products and developer APIs that shape h...
Location
Location
United States , San Francisco
Salary
Salary:
180000.00 - 270000.00 USD / Year
plaid.com Logo
Plaid
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems
  • Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks)
  • Proven experience delivering reliable and scalable infrastructure in production
  • Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability
  • Strong communication skills and ability to collaborate across teams
Job Responsibility
Job Responsibility
  • Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems
  • Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development
  • Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring
  • Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency
  • Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration
  • Contribute to technical strategy and architecture discussions within the team
  • Mentor and support other engineers through code reviews, design discussions, and technical guidance
What we offer
What we offer
  • medical, dental, vision, and 401(k)
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Infrastructure

You’ll help shape the future of infrastructure automation for law enforcement sy...
Location
Location
United States , Seattle; Boston
Salary
Salary:
141000.00 - 225600.00 USD / Year
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
  • 8+ years of professional software development experience
  • Strong background building cloud-native, distributed solutions
  • Experience designing tooling and automation to simplify the operational management of SaaS/PaaS systems
  • Proficiency in backend services with multiple managed languages (e.g., Java, Scala, Go, C#, or similar)
  • Expertise with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation) and building modular, reusable, testable components
  • Familiarity with Kubernetes platforms (e.g., AKS, EKS, or similar)
  • Hands-on experience with CI/CD platforms for automating infrastructure, builds, testing, and releases
  • Strong collaboration and communication skills, with empathy for the needs of engineering teams
Job Responsibility
Job Responsibility
  • Lead engineering architecture design reviews
  • Set a high technical bar for the team through code and architecture design reviews
  • Mentoring engineers
  • Working across teams with Product, Design, and Engineering to create integrated solutions that delight our customers
  • Improve our Engineering process, including long-term thinking, sprint planning and stand-ups
  • Building services that adhere to our high bar on availability and latency in this mission-critical space
  • Working with the latest open source technologies
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right

Principal Software Engineer - Research Infrastructure Team

We are seeking a highly motivated and experienced Senior Software Engineer, pass...
Location
Location
Israel , Tel Aviv
Salary
Salary:
Not provided
paloaltonetworks.com Logo
Palo Alto Networks
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • BS in Computer Science or equivalent knowledge or equivalent military experience required
  • 5+ years of software engineering experience
  • Expertise in Python and Python internals
  • Experience in designing, building and maintaining a user facing application/API
  • Experience with Git or other source controls
  • Good communication skills
  • Self-driven with the ability to work independently, take initiative, and drive processes end-to-end
Job Responsibility
Job Responsibility
  • Responsible for the complete software development life cycle including requirement analysis, design, development and deployment
  • Take part in integrating the newest features and technologies, automate workflows, and create user friendly tools and frameworks for researchers
  • Produce elegant, generic, modular and extendable code
  • Actively influence the processes and methods for researchers, affecting their day to day life
  • Fulltime
Read More
Arrow Right

Software Engineer, Infrastructure

As a Software Engineer on our Infrastructure team, you will help design and buil...
Location
Location
United States , New York; San Mateo; Redwood City
Salary
Salary:
140000.00 - 150000.00 USD / Year
fireworks.ai Logo
Fireworks AI
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
  • Strong programming skills in Python, C++, or a similar language
  • Solid understanding of computer systems concepts such as networking, storage, and distributed computing
  • Familiarity with cloud platforms like AWS, GCP, or Azure, and containerization tools like Docker or Kubernetes
  • Knowledge and interest in cloud infrastructure, distributed systems, and machine learning
Job Responsibility
Job Responsibility
  • Contribute to the design and development of scalable backend infrastructure that supports distributed training, inference, and data pipelines
  • Build and maintain core backend services such as job schedulers, autoscalers, resource managers, and model serving systems
  • Support performance optimization, cost efficiency, and reliability improvements across compute, storage, and networking layers
  • Collaborate with ML, DevOps, and product teams to translate research and product needs into infrastructure solutions
  • Learn and apply modern cloud technologies including Kubernetes, Ray, Kubeflow, and MLFlow
  • Participate in code reviews, technical discussions, and continuous integration and deployment processes
What we offer
What we offer
  • Meaningful equity in a fast-growing startup
  • Competitive salary and comprehensive benefits package
  • Fulltime
Read More
Arrow Right

Software Engineer, Infrastructure

The Infrastructure team builds foundational systems at scale. We're hundreds o b...
Location
Location
United States , New York City; San Francisco Bay Area
Salary
Salary:
171200.00 - 246000.00 USD / Year
metronome.com Logo
Metronome
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years building infrastructure systems: Hands-on experience with distributed systems, cloud infrastructure, container orchestration, data pipelines, observability, CI/CD, or other foundational platforms
  • Ownership of production systems: Track record of operating mission-critical infrastructure with strong focus on reliability, scalability, and performance
  • Force multiplier mindset: You build platforms that enable others. You create abstractions that make complex systems approachable. You think about developer experience as a first-class concern
  • Cross-functional collaboration: You partner effectively with product teams, communicate technical decisions clearly, and mentor engineers across experience levels
Job Responsibility
Job Responsibility
  • Build platforms that scale: Design and operate foundational infrastructure—Kubernetes clusters, Kafka streaming platforms, Spark batch processing, observability systems—that handle billions of events and enable Metronome to grow with minimal friction
  • Enable product velocity: Create golden paths, abstractions, and tooling that let engineers ship faster and more reliably without becoming infrastructure experts themselves
  • Enable reliability as the product: Take accountability for system uptime, performance, and correctness. Build monitoring, alerting, and incident response systems that enable the entire team catch problems before customers notice
  • Drive technical direction: Shape Metronome's infrastructure strategy, make platform-level architectural decisions, and mentor engineers across the organization
What we offer
What we offer
  • Excellent medical, dental, vision, and life insurance coverage, including a One Medical membership
  • Paid parental leave
  • FSA (Flexible spending account)
  • Retirement planning - Traditional and ROTH 401(k)
  • Flexible time off
  • Employee assistance program (mental health benefits)
  • Culture where personal growth is highly valued
  • market-benched equity
  • sales incentive pay (for eligible roles)
  • comprehensive health benefits
  • Fulltime
Read More
Arrow Right

Senior Software Engineer, Infrastructure

You’ll help shape the future of infrastructure automation for law enforcement sy...
Location
Location
United States , Seattle; Boston
Salary
Salary:
141000.00 - 225600.00 USD / Year
axon.com Logo
Axon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
  • 8+ years of professional software development experience
  • Strong background building cloud-native, distributed solutions
  • Experience designing tooling and automation to simplify the operational management of SaaS/PaaS systems
  • Proficiency in backend services with multiple managed languages (e.g., Java, Scala, Go, C#, or similar)
  • Expertise with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation) and building modular, reusable, testable components
  • Familiarity with Kubernetes platforms (e.g., AKS, EKS, or similar)
  • Hands-on experience with CI/CD platforms for automating infrastructure, builds, testing, and releases
  • Strong collaboration and communication skills, with empathy for the needs of engineering teams
Job Responsibility
Job Responsibility
  • Lead engineering architecture design reviews
  • Set a high technical bar for the team through code and architecture design reviews
  • Mentoring engineers
  • Working across teams with Product, Design, and Engineering to create integrated solutions that delight our customers
  • Improve our Engineering process, including long-term thinking, sprint planning and stand-ups
  • Building services that adhere to our high bar on availability and latency in this mission-critical space
  • Working with the latest open source technologies
What we offer
What we offer
  • Competitive salary and 401k with employer match
  • Discretionary paid time off
  • Paid parental leave for all
  • Medical, Dental, Vision plans
  • Fitness Programs
  • Emotional & Mental Wellness support
  • Learning & Development programs
  • Snacks in our offices
  • Fulltime
Read More
Arrow Right