Software Engineer, Infrastructure - Analytics Job at OpenAI (San Francisco)

Software Engineer / Senior Software Engineer - Microsoft eCDN

Microsoft eCDN (enterprise content delivery network) solves the network congesti...

Location

Israel , Tel Aviv, Herzliya

Salary:

Not provided

Microsoft Corporation

Expiration Date

Until further notice

Requirements

4+ years in software development
Proficient in JavaScript or TypeScript
Proficient in software back-end environments in Node.js
BSc in computer science or equivalent
Excellent problem solving and analytical thinking skills
Experience writing infrastructure and libraries
Proven track record of delivering large amounts of high quality, complex code

Job Responsibility

Design, implement, test and monitor crucial components of the infrastructure
Develop a fully distributed, scalable and stable back-end
Help design and implement real-world, real-time, peer-to-peer algorithms
Own components that impact high-stakes virtual events

Fulltime

Software Engineer - Analytics Hub

We’re seeking a Software Engineer with full-stack capabilities, someone who’s pa...

Location

United Kingdom; Sweden , London; Stockholm

Salary:

Not provided

Acast

Expiration Date

Until further notice

Requirements

Experience with Node.js, Typescript and React
Experience working with AI and proficient in utilising it where appropriate
Experience building delightful end user facing applications and the backend infrastructure to support them
Experience in working with AWS or similar
Pragmatic mindset and can make trade-offs on what to optimise for and when to ship
Keen to mentor others in the technical aspects you are already an expert in and value working in a diverse team with developers of all experience and backgrounds

Job Responsibility

Building intuitive, high-quality user experiences on the frontend
Architecting and supporting the backend infrastructure to deliver those experiences reliably at global scale
Building a user interface to compare and contrast consumption and revenue data from multiple sources
Offering pro-active recommendations on how insights can be used to improve Creator's content
Shipping new features while continuously increasing efficiency, stability and scalability of the systems
Contributing to software development within the team
Understanding and contributing to other areas of product development
Working in a multi-disciplinary environment with agile principles
Supporting own systems and infrastructure in production

Sr Software Engineer Analytics (US Federal)

Your work days are brighter here. We’re obsessed with making hard work pay off,...

Location

United States , Reston

Salary:

151500.00 - 227300.00 USD / Year

Workday

Expiration Date

Until further notice

Requirements

5+ years of hands-on experience working with infrastructure, either on-premises or cloud-based, with a deep understanding of systems architecture, networking, and security.
Strong understanding of Kubernetes concepts, architecture, and administration.
Bachelor's degree in a computer related field or equivalent work experience
Infrastructure as code: Proficiency in infrastructure automation tools like Terraform.
CI/CD: Experience with building or maintaining CI/CD pipelines and tools like Argo CD.
Cloud experience: 3+ years experience working with AWS cloud services in a production setting.
Programming skills: Proficiency in at least one programming language, preferably GoLang or Python.
Problem-solving: Strong analytical and problem-solving skills.
Communication: Excellent communication and collaboration skills.

Job Responsibility

Design and implement: Design, develop, and implement solutions for our Kubernetes platform, including infrastructure automation, CI/CD pipelines, and observability tools.
Build and maintain: Build and maintain core platform components, ensuring high availability, scalability, and security.
Automate and optimize: Automate infrastructure provisioning, configuration management, and application deployments using tools like Terraform and Argo CD.
Troubleshooting and support: Provide support and troubleshooting for platform-related issues, working closely with development teams to resolve problems.
Security and compliance: Implement and maintain security best practices for the platform, ensuring compliance with industry standards.
Documentation and knowledge sharing: Create and maintain comprehensive documentation for platform components and processes. Actively participate in knowledge sharing within the team.
Collaboration: Collaborate effectively with other engineers, development teams, and stakeholders across multiple locations and time zones.
Stay current: Stay up to date with the latest technologies and trends in the platform engineering space.

Fulltime

Senior Software Engineer - Infrastructure Reliability

We are seeking a Senior Software Engineer to join our Security Product team, foc...

Location

India , Bangalore

Salary:

Not provided

JFrog

Expiration Date

Until further notice

Requirements

7+ years of experience in software engineering, with at least 3+ years focused on debugging and solving infrastructure-level problems in distributed systems
Strong proficiency in Go
familiarity with Python and Helm is a plus
Deep hands-on experience with RabbitMQ or similar message brokers (Kafka, ActiveMQ) - including queue management, clustering, monitoring, and production troubleshooting
Solid working knowledge of Kubernetes (pod lifecycle, resource management, networking, debugging CrashLoopBackOff / OOMKilled scenarios) and Docker
Experience investigating production incidents and conducting post-incident reviews with clear root cause analysis and follow-through
Strong understanding of Linux systems, networking fundamentals, and cloud infrastructure (AWS, Azure, or GCP)
Ability to read and interpret logs, thread dumps, heap dumps, and system metrics to isolate root causes under time pressure
Excellent analytical and problem-solving skills with a methodical approach to debugging
Strong written and verbal communication skills - ability to produce clear incident reports, root cause analyses, and playbooks, and to communicate effectively across engineering, SRE, and customer-facing teams

Job Responsibility

Investigate system outages and production failures across customer environments (SaaS and self-hosted), spanning RabbitMQ, Kubernetes, Docker, Postgres, and cloud infrastructure (AWS, Azure, GCP)
Identify recurring failure patterns and systemic weaknesses from incident data, and drive them to resolution - whether by writing Go code yourself (resilience features, infrastructure fixes, observability) or by collaborating with service owners to prioritize and address reliability gaps
Lead and participate in post-incident reviews - document root causes, corrective actions, and follow through to ensure issues are properly resolved
Collaborate with production engineering and SRE teams to develop and maintain operational playbooks and runbooks that reduce time-to-resolution
Diagnose root causes across the full stack - message queue failures, container lifecycle issues, cloud networking, disk and memory pressure, and deployment topology mismatches
Design and implement data migrations and lifecycle management for infrastructure components such as queue management and vhost operations
Emit and monitor operational metrics to proactively detect infrastructure degradation and measure service reliability

Software Engineer, Infrastructure Security

OpenAI is seeking a Security Software Engineer to join the Infrastructure Securi...

Location

United States , San Francisco; Seattle; New York City

Salary:

184000.00 - 385000.00 USD / Year

OpenAI

Expiration Date

Until further notice

Requirements

Strong software engineering skills in languages such as Python, Go, Rust, or C/C++
Experience building or operating critical security infrastructure (e.g., auth services, service-to-service proxies, certificate or key-management systems)
Deep understanding of security principles, best practices, and common vulnerabilities
Expertise in securing large-scale cloud platforms (e.g., Azure, AWS, GCP), including multi-cloud networks and cloud-agnostic system design
Familiarity with container and orchestration security (Kubernetes, service meshes) and modern authentication/authorization standards (OIDC, mTLS, SPIFFE/SPIRE)
A proactive mindset, with the ability to identify and address security gaps or inefficiencies through automation and tooling
A track record of delivering scalable solutions and driving impactful changes across infrastructure in real-world projects
Strong analytical and problem-solving skills, with an ability to think critically and objectively assess security risks
Excellent communication skills, with the ability to convey complex security concepts to technical and non-technical stakeholders
Excitement about collaborating with cross-functional teams to build secure, reliable systems that scale globally

Job Responsibility

Architect and implement production-grade security services (e.g., auth services, access brokers, secure proxies, key-management infrastructure)
Partner with infrastructure and research engineers to embed security into high-performance compute clusters
Develop automation and detection tooling to continuously identify and mitigate risks in large-scale cloud and on-prem environments
Drive high-impact initiatives such as line-speed encryption, machine identity, and network isolation
Lead or participate in design reviews and threat models to ensure new systems launch with strong security foundations and operational excellence

What we offer

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible

Fulltime

Senior Principal Software Engineer, Infrastructure

At Docker, we make app development easier so developers can focus on what matter...

Location

United States , Seattle

Salary:

251000.00 - 352000.00 USD / Year

Docker

Expiration Date

Until further notice

Requirements

12+ years of software engineering experience with demonstrated expertise across multiple platform domains (identity, billing, data, infrastructure)
Proven track record architecting and delivering large-scale distributed systems serving millions of users and thousands of enterprise customers
Deep expertise in at least two of: identity/access management systems, billing/monetization platforms, data platforms, or cloud infrastructure
Broad working knowledge across all platform domains with ability to make sound architectural decisions spanning multiple areas
Expert-level understanding of API design, service architecture, and system integration patterns at scale
Experience with cloud platforms (AWS, GCP, or Azure) and modern infrastructure patterns (Kubernetes, service mesh, infrastructure-as-code)
Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
Track record of establishing strategic technical plans that directly enabled business outcomes (revenue growth, cost reduction, market expansion)
Experience translating business strategy into technical architecture and roadmaps
Demonstrated ability to identify and prioritize investments that provide maximum platform leverage

Job Responsibility

Define and own the multi-year technical vision for Docker's foundational platform, encompassing accounts, billing, data, enterprise governance, and infrastructure
Establish strategic plans and objectives for major platform initiatives, making architectural decisions that ensure effective achievement of Docker's business objectives
Contribute to and drive the strategic vision in collaboration with the VP of Engineering, translating organizational strategy into technical roadmaps that span multiple teams and years
Identify and prioritize platform investments that provide maximum leverage—capabilities built once that enable rapid iteration across all Docker products
Develop architectural principles and standards that guide technical decisions across the Bridge organization and influence product engineering teams
Anticipate future business needs and ensure platform architecture provides the flexibility to support Docker's evolving commercial models
Lead large cross-company programs that require coordination across Desktop, Hub, AI, Security, Cloud, and Platform teams
Architect the unified platform interfaces ("Control Planes") that enable product teams to answer canonical questions like "Can this user access this feature?" or "How much has this organization consumed?" without understanding underlying complexity
Drive convergence of fragmented systems across Docker—replacing product-specific implementations with shared platform capabilities for authentication, authorization, billing, and observability
Establish technical contracts between platform and product teams that enable independent velocity while ensuring consistency and reliability

What we offer

Freedom & flexibility
fit your work around your life
Designated quarterly Whaleness Days plus end of year Whaleness break
Home office setup
we want you comfortable while you work
16 weeks of paid Parental leave
Technology stipend equivalent to $100 net/month
PTO plan that encourages you to take time to do the things you enjoy
Training stipend for conferences, courses and classes
Equity

Fulltime

Staff Infrastructure Software Engineer, Enterprise AI

Scale GP is building the next generation of enterprise-grade Generative AI produ...

Location

United States , New York; San Francisco

Salary:

216200.00 - 270250.00 USD / Year

Scale

Expiration Date

Until further notice

Requirements

Proven experience in a senior role
5+ years of full-time software engineering experience
Deep understanding of modern infrastructure practices, including CI/CD, IaC (e.g., Terraform, Helm Charts), container orchestration (e.g., Kubernetes) and observability platforms (e.g., Datadog, Prometheus, Grafana)
Extensive experience with at least one major cloud provider (AWS, Azure, or GCP)
Strong knowledge of security and compliance in enterprise environments, with a focus on access management, data isolation, and customer-specific VPC setups
Proficiency in Python or JavaScript/TypeScript, and SQL

Job Responsibility

Define the architectural patterns for our multi-cloud infrastructure to support secure, reliable, and scalable Agentic workflows for enterprise customers
Lead the infrastructure roadmap with a strong focus on compliance, privacy, and security standards, including designing change management and data isolation strategies
Own the development and maintenance of our best-in-class Agentic observability platform (logging, metrics, tracing, and analytics) to proactively ensure system health and enable rapid incident response
Drive developer efficiency by building automated tooling and championing Infrastructure-as-Code (IaC) paradigms throughout the engineering organization
Solve the toughest engineering problems related to multi-tenancy, data isolation, and high-performance inference at a massive scale, taking end-to-end ownership across the full product lifecycle

What we offer

Comprehensive health, dental and vision coverage
retirement benefits
a learning and development stipend
generous PTO
equity based compensation
additional benefits such as a commuter stipend

Fulltime

Software Engineer, Infrastructure & Security

Scale AI is seeking a highly skilled and motivated Software Engineer, AI Infrast...

Location

United States , San Francisco; St. Louis; New York; Washington

Salary:

138000.00 - 259440.00 USD / Year

Scale

Expiration Date

Until further notice

Requirements

An active security clearance, and the ability to obtain a TS/SCI with CI Poly
Full Stack Development: Proficiency in both front-end and back-end development, including experience with modern web development frameworks, programming languages, and databases
Cloud-Native Technologies: Understanding of containerization (e.g., Docker) and container orchestration (e.g., Kubernetes)
Security Focused: Experience with Federal Compliance frameworks, and requirements(e.g, Cloud SRG, FedRAMP, STIG Benchmarks, etc)
Problem Solving: Strong analytical and problem-solving skills
Collaboration and Communication: Excellent interpersonal and communication skills
Adaptability and Learning Agility: Willingness to embrace new technologies, learn new skills, and adapt to evolving project requirements

Job Responsibility

Design and implement secure scalable backend systems for Public Sector customers
Own services or systems and define their long-term health goals
Improve our high engineering standards, tooling, and process
Collaborate with cross-functional teams to define and execute the vision for backend solutions
Participate actively in customer engagements
Contribute to the platform roadmap and product strategy for Scale AI's Public Sector business

What we offer

Comprehensive health, dental and vision coverage
retirement benefits
a learning and development stipend
generous PTO
equity grant
additional benefits such as a commuter stipend

Fulltime

Select Country

Software Engineer, Infrastructure - Analytics

Job Description

Job Responsibility

Requirements

What we offer

Looking for more opportunities?