Research Intern - AI Network Observability Job at Microsoft Corporation (Mountain View)

Senior Software Engineer - Kubernetes & ServiceMesh

Join us in building Roku’s next-generation cloud-agnostic platform that powers K...

Location

India , Bengaluru

Salary:

Not provided

Roku

Expiration Date

Until further notice

Requirements

Strong hands-on experience with cloud technologies (AWS preferred
GCP or Azure is a plus), specifically in architecting and managing performant, large-scale systems handling significant traffic/data
Deep knowledge of Kubernetes (EKS, GKE, AKS, or similar) and service mesh technologies
Proficiency in Go or another programming language, Python or another scripting language
Experience designing infrastructure and building automation tools, while collaborating with internal team members and external stakeholders
Experience building CI/CD pipelines and following modern deployment practices
Familiarity with observability tools (Prometheus, Thanos, Loki, Grafana, etc.)
Ability to work independently and communicate effectively with technical and non-technical stakeholders
Passion for learning and solving complex infrastructure challenges
Experience integrating AI tools to improve processes and reduce operational toil (a plus)

Job Responsibility

Architect, design, and deploy Roku’s next-generation cloud platform and service mesh
Build and own solutions to Roku's compute problems using Docker, Kubernetes, Istio/Envoy, Terraform and scripting to evolve our tech stack and deployments
Proactively drive the research and implementation of new technologies to enhance scalability, reliability, and developer experience
Integrate security best practices into infrastructure design and automation
Build tooling to visualize inefficiencies and optimize costs across shared-tenancy clusters, including network traffic insights, cross-cluster communication efficiency, and cost attribution
Collaborate with internal teams to migrate workloads to Kubernetes + Istio, leveraging open-source observability tools
Work closely with the Observability team to scale monitoring and logging solutions for a holistic view of the platform
Leverage SRE principles to maintain high availability and streamline onboarding workflows
Mentor team members and help define best practices for infrastructure and automation

What we offer

global access to mental health and financial wellness support and resources
healthcare (medical, dental, and vision)
life insurance
accident insurance
disability insurance
commuter benefits
retirement options (401(k)/pension)
time off

Fulltime

Senior Full Stack Engineer - Go / React.js

Rapid7’s Metasploit team is building the future of the world’s best-known softwa...

Location

Czechia , Prague

Salary:

Not provided

Rapid7

Expiration Date

Until further notice

Requirements

6+ years of experience in software development using Go, JavaScript, TyperScript and React (Next.js) or equivalent of programming languages
Experience with modern cloud infrastructure (AWS, GCP, or Azure)
Experience with design patterns
Experience with message queues (RabbitMQ, SQS)
Understanding of APIs, interprocess communication, and modern networking and deployment tooling (AWS, Docker)
High level of accountability and ownership
Leading with empathy and strong user focus
Ability to learn and evaluate new technologies quickly
Interest in or experience with offensive security, penetration testing, or SOC analysis
Product driven mindset

Job Responsibility

Develop and enhance AI-powered applications within Metasploit ecosystem
Architect and implement performant, scalable, and reliable solutions that support AI-driven interactions in web development
Collaborate cross-functionally with researchers, engineers and product teams to push the boundaries of AI in cybersecurity
Ensure an exceptional user experience through user-friendly UI/UX
Diagnose and resolve complex issues, ensuring the reliability and performance of AI-powered products
Build tooling and automation to enhance incident response, developer experience, observability, and internal debugging workflows
Champion your teammates' successes, and support each other when needed

Fulltime

Senior Fullstack Engineer - Go / React.js

Rapid7’s Metasploit team is building the future of the world’s best-known softwa...

Location

United Kingdom

Salary:

Not provided

Rapid7

Expiration Date

Until further notice

Requirements

6+ years of experience in software development using Go, JavaScript, TypeScript and React (Next.js) or equivalent of programming languages
Experience with modern cloud infrastructure (AWS, GCP, or Azure)
Experience with design patterns
Experience with message queues (RabbitMQ, SQS)
Understanding of APIs, interprocess communication, and modern networking and deployment tooling (AWS, Docker)
High level of accountability and ownership, taking responsibility for outcomes and proactively drives work forward with minimal oversight
Leading with empathy and strong user focus
Ability to learn and evaluate new technologies quickly, digging into code to find answers
Interest in or experience with offensive security, penetration testing, or SOC analysis
Product driven mindset

Job Responsibility

Develop and enhance AI-powered applications within Metasploit ecosystem
Architect and implement performant, scalable, and reliable solutions that support AI-driven interactions in web development
Collaborate cross-functionally with researchers, engineers and product teams to push the boundaries of AI in cybersecurity
Ensure an exceptional user experience through user-friendly UI/UX
Diagnose and resolve complex issues, ensuring the reliability and performance of AI-powered products
Build tooling and automation to enhance incident response, developer experience, observability, and internal debugging workflows
Champion your teammates' successes, and support each other when needed

Fulltime

Enterprise Account Executive

We are looking for a fast-paced, client-obsessed Account Executive with an entre...

Location

Australia

Salary:

250000.00 - 300000.00 USD / Year

Arize

Expiration Date

Until further notice

Requirements

5+ years enterprise SaaS sales experience: Hungry, aggressive and motivated
Familiarity or willingness to learn sales technologies to find and attract prospects
Self-starter and comfortable working in limited process environments
Full-cycle sales experience and ability to navigate the complexities of enterprise deals
Fast-paced and focused on helping prospects / customers
Team player: Collaboration with peers and other organizations within Arize is critical to success, we deeply value the success of the collective team over individual gains
Strong communication skills: Clearly and objectively communicate observations from the field

Job Responsibility

Be a networker, seller and closer
Build relationships with AI/ML stakeholders and be an active member of the community
Conduct discovery with prospects and share the Arize vision
Run a sophisticated prospecting strategy to 'get the word out' and find deals
Create sales plays, write talk tracks and strategically identify new business opportunities
Deeply research accounts, stakeholders and competitors
Manage proof of concepts, drive adoption and grow accounts
Manage and navigate internal / external stakeholders to ensure success
Understand use cases, scope licensing and find more workloads
BANT or MEDDIC methodology preferred

What we offer

competitive equity package
medical
dental
vision
401(k) plan
unlimited paid time off
generous parental leave plan
others for mental and wellness support
WFH monthly stipend to pay for co-working spaces

Fulltime

Business Consultant, Digital Commerce

As a Consultant - Digital Commerce, you will work as part of our Strategy and Gr...

Location

United States

Salary:

Not provided

Columbus United Kingdom

Expiration Date

Until further notice

Requirements

Deep expertise in Retail, Manufacturing, Food and Beverages, and Life Sciences

Job Responsibility

Leading strategic commitments
Gathering information, creating insight based on analysis to draw meaningful conclusions, identify implications for recommendations and gain understanding and acceptance by the customer
Follow Columbus framework approaches, tailoring them to specific customer needs
Collaborating and contributing to the Advisory competence network, helping to continuously improve the offering through retrospectives on completed work
Manage activities and deliveries according to plan, review drafts and provide feedback / coaching to other project members
Act as a catalyst and coach in projects that span over your area of expertise
Evaluate and manage risks and problems and ensure that the project objectives are achieved
Use AI appropriately to drive quality and accelerate delivery
Establish a reliable relationship with our clients’ management where they see you as an advisor
Develop and maintain strong, trust-based relationships with our customers (at all levels)

What we offer

Health Insurance
Life Insurance
Dental Insurance
Vision Insurance
Short-Term Disability
Long-Term Disability
paid vacation
sick leave
holidays
401(k)

Fulltime

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Chevy Chase; New York City; Palo Alto

Salary:

115000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Staff Software Engineer - AI/ML Platform

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Chevy Chase; New York City; Palo Alto

Salary:

115000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Staff Software Engineer - AI/ML Infra

GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Pl...

Location

United States , Palo Alto

Salary:

90000.00 USD / Year

Geico

Expiration Date

Until further notice

Requirements

Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
3+ years of hands-on experience with machine learning infrastructure and deployment at scale
2+ years of experience working with Large Language Models and transformer architectures
Proficient in Python
strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)

Job Responsibility

Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
Design, implement, and maintain feature stores for ML model training and inference pipelines
Build and optimize LLM inference systems using frameworks like vLLM, TensorRT-LLM, and custom serving solutions
Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
Design and implement ML platforms using DataRobot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
Develop and maintain infrastructure using Terraform, ARM templates, and Azure DevOps
Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases

What we offer

Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year

Fulltime

Select Country

Research Intern - AI Network Observability

Job Description

Job Responsibility

Requirements

Looking for more opportunities?