CrawlJobs Logo

Staff Software Engineer (Infra)

amigo.ai Logo

Amigo

Location Icon

Location:
United States , New York City

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

220000.00 - 260000.00 USD / Year

Job Description:

As a Staff Software Engineer (Infra) at Amigo, you'll own the technical direction of the infrastructure that powers our platform at global scale. You'll architect multi-region systems handling millions of conversations while maintaining the compliance posture required for healthcare. You'll drive architecture decisions for Kubernetes deployments, Databricks platform, voice/telephony systems, and security infrastructure. You'll mentor engineers, shape technical culture, and ensure we maintain elite engineering standards as the team grows. Reliability and security are non-negotiable—the platform must scale without compromising safety.

Job Responsibility:

  • Own technical architecture for infrastructure across cloud platforms, Kubernetes, Databricks, and supporting systems
  • Drive engineering standards for reliability, security, observability, and incident response
  • Architect multi-region deployment strategies with zero-downtime updates for critical systems
  • Design the compliance & security infrastructure for healthcare (HIPAA, SOC 2) and support future regulatory requirements
  • Own disaster recovery architecture and backup systems meeting healthcare compliance requirements
  • Make build vs. buy decisions for infrastructure tooling and evaluate technical tradeoffs
  • Design auto-scaling systems that handle traffic spikes while maintaining cost efficiency
  • Own infrastructure as code of our infrastructure, ensuring clearly documented and identical deployments across regions
  • Mentor engineers and establish patterns that raise the bar for the infrastructure team
  • Collaborate with backend, platform, and security teams to ensure system-wide coherence
  • Define reliability targets (SLOs/SLIs) and drive operational excellence across the platform

Requirements:

  • 7+ years of production infrastructure experience, with significant time at elite engineering organizations
  • Expert-level experience with Kubernetes and container orchestration at scale
  • Proven track record designing infrastructure that scales across multiple regions
  • Deep experience with cloud platforms (AWS, GCP, or Azure)
  • Strong understanding of infrastructure-level networking and security configurations
  • History of establishing engineering standards and mentoring engineers
  • Extremely high standards for reliability, security, and operational excellence
  • Both execution-oriented and defensive-minded: you ship infrastructure while anticipating failure modes
  • Deep knowledge of infrastructure as code tools (Terraform, Pulumi, or similar)
  • Experience with compliance requirements and data residency controls in regulated industries
  • Excellent written and verbal communication across engineering and executive stakeholders

Nice to have:

  • Experience with healthcare infrastructure or HIPAA compliance at scale
  • Background with voice/telephony systems or real-time communication infrastructure
  • Experience with Databricks platform administration and optimization
  • Track record building and scaling infrastructure teams
  • Knowledge of specific regulatory frameworks (HIPAA, SOC 2, GDPR)
  • Experience with high-availability, mission-critical systems
What we offer:
  • Comprehensive health, dental, and vision insurance
  • Mental health support and wellness coaching
  • Flexible wellness stipend for fitness, therapy, or personal growth
  • Daily catered lunch and dinner
  • Annual learning budget for courses, books, or conferences
  • Conference attendance budget for professional development
  • Development setup of your choice
  • Academic collaboration opportunities

Additional Information:

Job Posted:
January 20, 2026

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Staff Software Engineer (Infra)

Senior Staff Software Engineer, Cloud Proxy

We are seeking a Senior Staff Engineer in Temporal's Cloud Global Services team ...
Location
Location
United States
Salary
Salary:
230000.00 - 290000.00 USD / Year
temporal.io Logo
Temporal
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience architecting and delivering high-availability, security-critical networking or proxy systems
  • Deep understanding of authentication/authorization patterns (OIDC-OpenID Connect on top of OAuth), mTLS, JWT-JASON Web Token, custom identity integrations)
  • Expertise in data encryption at rest and in transit, including envelope encryption and key management
  • Strong proficiency in Go or a comparable systems programming language
  • Familiarity with distributed systems, RPC frameworks (gRPC), and cloud networking patterns
  • Track record of leading complex, multi-team technical initiatives to successful delivery
  • Ability to navigate ambiguity, define vision, and create alignment
  • Experience influencing technical direction across organizational boundaries
Job Responsibility
Job Responsibility
  • Define and drive the architecture for a unified, pluggable proxy framework
  • Establish technical standards for authentication, authorization, encryption, and observability across proxy implementations
  • Evaluate and integrate existing customer-built, S2S, and Cloud Auth proxies into a single supported solution
  • Translate high-level business and security requirements into technical designs
  • Ensure proxy meets Tier 0 workload reliability, security, and performance standards
  • Partner with Product, Security, and Customer Success to align roadmap with customer needs
  • Work closely with Infra Foundations, Security, OSS Server, and CGS teams
  • Engage directly with strategic customers to understand and incorporate their requirements
  • Mentor other engineers on distributed systems architecture, networking, and security
  • Drive the open-source development model, ensuring code quality, documentation, and extensibility
What we offer
What we offer
  • Unlimited PTO, 12 Holidays + 2 Floating Holidays
  • 100% Premiums Coverage for Medical, Dental, and Vision
  • AD&D, LT & ST Disability, and Life Insurance (Standard & Supplemental Available)
  • Empower 401K Plan
  • Additional Perks for Learning & Development, Lifestyle Spending, In-Home Office Setup, Professional Memberships, WFH Meals, Internet Stipend and more
  • $3,600 / Year Work from Home Meals
  • $1,500 / Year Career Development & Learning
  • $1,200 / Year Lifestyle Spending Account
  • $1,000 / Year In-Home Office Setup (In addition to Temporal issued equipment)
  • $500 / Year Professional Memberships
  • Fulltime
Read More
Arrow Right

Software Engineer II - Backend

The Engineer II plays a crucial role in developing robust systems and tools to s...
Location
Location
United States
Salary
Salary:
117800.00 - 144800.00 USD / Year
findoctave.com Logo
Octave
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 3 years of experience building robust and scalable backend applications
  • Experience with infrastructure-as-code & continuous deployment in production
  • Experience working with healthcare, healthcare technology, and clinical staff, or other regulated industries
  • Experience with gRPC and Protobuf
  • Experience with relational database systems like PostgreSQL or MySQL
  • Experience integrating and synchronizing data with third party APIs
  • Experience with continuous delivery and troubleshooting production code
  • Solid working knowledge of Python and at least one of its web frameworks
  • Experience developing and deploying enterprise Python applications into production
  • Must be eligible to work in the United States without sponsorship now or in the future
Job Responsibility
Job Responsibility
  • Design, develop, and implement scalable and robust backend systems and APIs using Python
  • Optimize database design, performance, and security to ensure data integrity and efficiency
  • Conduct thorough testing and debugging of backend applications, ensuring high-quality, bug-free software
  • Oversee deployment and maintenance of backend services, ensuring smooth operation and uptime
  • Promote best practices to maintain high-quality codebase and consistently follows stated best practices
  • Develop and maintain technical documentation for backend systems and processes
  • Take ownership of the backend development lifecycle, from concept to testing, deployment, and monitoring
  • Writes correct and clean code, with guidance, that is easily testable, easily understood by other developers, and accounts for edge cases and errors
  • uses comments effectively
  • Can participate in technical design of features with guidance
What we offer
What we offer
  • Equity in the form of stock options
  • company sponsored life insurance, disability and AD&D plans
  • Voluntary benefits such as 401k retirement, medical, dental, vision, FSA, HSA, dependent care and commuter/parking options
  • generous Paid Time Off
  • paid parental leave benefits
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

We are looking for a Site Reliability Engineer to own our internal systems infra...
Location
Location
United States , Sunnyvale
Salary
Salary:
175000.00 - 250000.00 USD / Year
figure.ai Logo
Figure
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience with Linux/Unix systems administration
  • Proficiency in programming/scripting
  • Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures
  • Experience designing, deploying, and operating high-availability, fault-tolerant, and distributed systems
  • Mastery of infrastructure as code (Terraform, CloudFormation, Ansible…)
  • Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog…)
  • Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancers, firewalls)
  • Experience defining Service Level Objectives (SLO), developing runbooks/incident response plans, facilitating post-mortems and managing systems assets
  • Ability to work in cross-functional teams with developers, infra, and product teams
  • Excellent verbal and written communication skills
Job Responsibility
Job Responsibility
  • Be the go to person for mission critical infrastructure enabling critical operations such as Source Configuration Management, CI/CD systems, software distribution, supplier portals, manufacturing and more
  • Migrate SaaS to self-hosted solutions to enhance security and reliability
  • Implement monitoring and alerting systems, and define incident response plans and runbooks
  • Reduce human workload through automation to automate deployment and scaling
  • Establish strong relationships with stakeholders to identify infrastructure needs and establish Service Level Objectives
  • Use a data driven approach to demonstrate service robustness and track optimization work
  • Partner with the security team to ensure that security remediations and updates are applied in a timely manner
  • Fulltime
Read More
Arrow Right

Member of Technical Staff - Data Infra - MAI Superintelligence Team

Help build the world’s most advanced multimodal dataset at Microsoft AI. We are ...
Location
Location
United States , Mountain View
Salary
Salary:
163000.00 - 296400.00 USD / Year
https://www.microsoft.com/ Logo
Microsoft Corporation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling, or data engineering
  • OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 8+ years experience in business analytics, data science, software development, data modeling, or data engineering
  • OR equivalent experience
  • Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 12+ years experience in business analytics, data science, software development, data modeling, or data engineering
  • OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 15+ years experience in business analytics, data science, software development, data modeling, or data engineering
  • OR equivalent experience
  • 4+ years experience with data governance, data compliance and/or data security
  • Passionate about the role of data in large-scale AI model training
  • Thrive in a highly collaborative, fast-paced environment
  • Have a high degree of expertise and pay close attention to details
Job Responsibility
Job Responsibility
  • Design and develop data pipelines that ingest enormous amounts of multi-modal training data (text, audio, images, video)
  • Own and maintain critical data infrastructures, including spark, ray, vector databases, and others
  • Build and maintain cutting-edge infrastructure that can store and process the petabytes of data needed to power models
  • Partner with the pretraining and post-training teams to improve our data recipe by rigorous and careful experimentation
  • Embody our culture and values
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - CAD Infra Engineering

Dandy is hiring a Staff Software Engineer to join our rapidly scaling technology...
Location
Location
United States
Salary
Salary:
221000.00 - 268000.00 USD / Year
meetdandy.com Logo
Dandy
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of software engineering experience, preferably in a high-growth startup environment
  • An expert in Google Cloud Platform and Google Kubernetes Engine
  • Experience with GPU infrastructure and maintaining cloud to client application test parity is strongly preferred
  • Experience in identifying and remediating security vulnerabilities within a cloud environment
  • Experience with building observability platforms (i.e., metrics, logging, and tracing)
  • Experience with infrastructure as code platforms (Terraform, Pulumi)
  • Experience designing the architecture and automation of infrastructure within a cloud environment
  • A collaborative, pragmatic, and growth-oriented mindset
  • The ability to clearly and concisely communicate about complex technical, architectural, and/or organizational problems and propose thorough iterative solutions
  • Experience with performance and optimization problems and a demonstrated ability to both diagnose and prevent these problems
Job Responsibility
Job Responsibility
  • Solve technical problems of the highest scope and complexity for your team
  • Collaborate with stakeholders within the tech org to influence the overall objectives and long-term goals of your team
  • Advocate for improvements to product quality, security, and performance that have a particular impact across your team and others
  • Develop and maintain infrastructure, systems, and tooling to support Dandy’s products in a secure, well-tested, and performant way
  • Reinvent an analog experience and disrupt a legacy industry through novel and scalable system design
  • Collaborate with Product Engineers and other stakeholders within Engineering, Product and Data to maintain a high bar for quality in a fast-paced, iterative environment
  • Advocate for improvements to infrastructure quality, security, and performance
  • Craft code that meets our internal standards for style, maintainability, and best practices
  • Recognize impediments to our efficiency as a team ("technical debt"), propose and implement solutions
What we offer
What we offer
  • Offers Equity
  • Offers Bonus
  • healthcare
  • dental
  • mental health support
  • parental planning resources
  • retirement savings options
  • generous paid time off
  • Fulltime
Read More
Arrow Right

Member of Technical Staff, Training Infra Engineer

Contribute in and provide strong support for model training pipelines, ship stat...
Location
Location
Salary
Salary:
Not provided
cohere.com Logo
Cohere
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extremely strong software engineering skills
  • Proficiency in Python and related ML frameworks such as JAX, Pytorch and XLA/MLIR
  • Experience with distributed training infrastructures (Kubernetes, Slurm) and associated frameworks (Ray)
  • Experience using large-scale distributed training strategies
  • Hands on experience on training large model at scale and having contributed to the tooling and/or setup of the training infrastructure
Job Responsibility
Job Responsibility
  • Design and write high-performant and scalable software for training
  • Improve our training setup from an infrastructure and codebase performance standpoint
  • Craft and implement tools to speed up our training cycles and improve the overall efficacy of our training infrastructure
  • Research, implement, and experiment with ideas on our supercompute and data infrastructure
  • Learn from and work with the best researchers in the field
What we offer
What we offer
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...
Location
Location
United States , Palo Alto
Salary
Salary:
90000.00 - 300000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • 8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 3+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 2+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python
  • strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
  • Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
  • Design, implement, and maintain feature stores for ML model training and inference pipelines
  • Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
  • Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
  • Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
  • Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
  • Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
  • Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
  • Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...
Location
Location
United States , Chevy Chase; New York City; Palo Alto
Salary
Salary:
115000.00 - 300000.00 USD / Year
geico.com Logo
Geico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • 8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 3+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 2+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python
  • strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
Job Responsibility
Job Responsibility
  • Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
  • Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
  • Design, implement, and maintain feature stores for ML model training and inference pipelines
  • Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
  • Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
  • Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
  • Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
  • Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
  • Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
  • Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
What we offer
What we offer
  • Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
  • Financial benefits including market-competitive compensation
  • a 401K savings plan vested from day one that offers a 6% match
  • performance and recognition-based incentives
  • and tuition assistance
  • Access to additional benefits like mental healthcare as well as fertility and adoption assistance
  • Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year
  • Fulltime
Read More
Arrow Right