CrawlJobs Logo

Observability Engineer – Trading

Canada, Toronto 120000.00 - 250000.00 CAD / Year · Job Posted March 19, 2026
Apply Position
Job Link Share

Job Description

My client are seeking an Engineer with strong Linux experience and expertise within the Observability space. The organisation uses a number of different tools to monitor their extensive estate.

Job Responsibility

  • Working across monitoring and observability platforms within a trading environment
  • Supporting and enhancing monitoring across a large scale estate
  • Working with multiple technical teams to ensure visibility and reliability across systems
  • Working with tools such as VictoriaMetrics, Prometheus, Grafana, Vector, ELK and AlertManager
  • Operating within a Linux based environment
  • Utilising Python where required
  • Working with Git
  • Supporting environments where Kubernetes understanding is advantageous

Requirements

  • Strong experience within the Observability space
  • Strong Linux experience
  • Experience with Prometheus, Grafana, VictoriaMetrics, Vector, ELK and AlertManager
  • Python knowledge
  • Experience with Git
  • Understanding of Kubernetes is a distinct advantage

What we offer

Bonus

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Observability Engineer – Trading

8 matching positions

Trading Systems Engineer

We are looking for a highly skilled and hands-on Software Engineer to drive mode...
Location
Location
Canada , Mississauga
Salary
Salary:
94300.00 - 141500.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5-8 years of relevant experience
  • Experience in systems analysis and programming of software applications
  • Experience in managing and implementing successful projects
  • Working knowledge of consulting/project management techniques/methods
  • Ability to work under pressure and manage deadlines or unexpected changes in expectations or requirements
  • Strong hands-on experience in coding (Kotlin, Java, Python)
  • Deep expertise in system design and microservices architecture
  • SpringBoot, Openshift
  • Event Driven & Messaging Systems (Kafka, Solace, Tibco, MQ)
  • Low-Latency & High-Performance Computing
Job Responsibility
Job Responsibility
  • Design, develop, and maintain robust, scalable, and high-performance applications
  • Implement trunk-based development practices to enable continuous integration and rapid delivery
  • Develop clean, maintainable, and testable code following SOLID principles and software design best practices
  • Ensure high levels of unit test coverage, test-driven development (TDD), and behavior-driven development (BDD)
  • Actively contribute to hands-on coding, code reviews, and refactoring to maintain high engineering standards
  • Drive the adoption of modern engineering ways of working, including Agile, DevOps, and CI/CD
  • Advocate for automated testing, infrastructure as code, and continuous monitoring to enhance software reliability
  • Apply Behavior-Driven Development (BDD), Test-Driven Development (TDD), and unit testing to ensure code quality and functionality
  • Conduct thorough code reviews, ensuring adherence to best practices in readability, performance, and security
  • Implement and enforce secure coding practices, performing vulnerability assessments and ensuring compliance with security standards
Read More
Arrow Right

Principal Engineer I - Cloud Observability

We’re not just building better tech. We’re rewriting how data moves and what the...
Location
Location
India
Salary
Salary:
Not provided
confluent.io Logo
Confluent
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 15+ years of hands-on software development experience with the ability to anticipate future technical needs for the product and craft plans to realize them
  • Taking ideas to production is something we look for
  • Ready to roll up your sleeves - code, debug, design - do whatever it takes to ship the product to production
  • Experience building and operating large-scale systems. Solid understanding of basic systems operations (disk, network, operating systems, etc). Experience running production services in the cloud
  • Strong fundamentals in distributed systems design and development. Solid fundamentals in concurrent and multi-threading programming
  • A self starter with the ability to work effectively in teams. Proactively identifying the symptoms of technical issues and reason about their causes is needed. This will be followed by fixing the root causes
  • Timely shipping of deliverables
  • being able to trade-off short term technical decisions with the long term. Move fast, build in increments, and iterate. A sense of urgency, a mindset towards achieving results, and excellent prioritization skills
  • Ability to influence the team, peers and upper management in technology decisions using effective communication and collaborative techniques
  • Degree in Computer Science, Engineering or equivalent experience. Understanding of various technologies, programming paradigms and frameworks is needed. Ability to be pragmatic and trade off their usage in production is essential
Job Responsibility
Job Responsibility
  • You will work with a team of engineers and architects to help evolve Confluent Observability features
  • Work closely with product management, engineering leadership, and other key stakeholders across various teams in Confluent to build and drive the overall roadmap
  • Need you to be a strong tech voice outside Confluent Observability within Confluent
  • Influence the overall domain health and operational hygiene for Confluent Observability
  • We need a tech champion for the observability capabilities we provide to our customers
  • You are expected to review designs and code and improve our technical standards
  • We are looking at you to lead the technology charter for our observability features in Confluent Cloud and in hybrid scenarios with Confluent Platform
  • Mentor a team of high-performing engineers and leads, helping them to continue in growing their skill set through hands-on experience and mentorship
  • Be a strong technical leader and representative for engineering teams in India
  • Provide timely and productive feedback, encourage a growth mindset, and advise team members in setting and working toward personal development goals
What we offer
What we offer
  • Remote-First Work
  • Robust Insurance Benefits
  • Flexible Time Away
  • The Best Teammates
  • Experience Ambassadors
  • Open and Honest Culture
  • Well-Being and Growth
  • Fulltime
Read More
Arrow Right
New

Platform Engineering Consultant

We’re looking for engineers who enjoy building and operating infrastructure that...
Location
Location
United States; Europe
Salary
Salary:
75000.00 - 100000.00 GBP / Year
linuxrecruit.co.uk Logo
Linux Recruit
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience with Kubernetes and cloud-native infrastructure
  • ability to troubleshoot distributed systems in complex environments
  • familiarity with Infrastructure as Code, observability tooling, and automation
  • ability to adapt quickly, reason through unfamiliar systems, and communicate effectively with both technical and non-technical stakeholders
Job Responsibility
Job Responsibility
  • Building and operating infrastructure that supports open source software at scale
  • solving complex platform and reliability challenges
  • collaborating closely with engineering teams
  • helping organisations design systems they can run and evolve independently over time
  • contributing to infrastructure across a range of industries
  • working on Kubernetes platforms, cloud infrastructure, automation, and observability systems
  • diagnosing difficult production issues
  • evaluating trade-offs
  • recommending appropriate tools and approaches
  • working alongside engineering teams to improve platform maturity
  • Fulltime
Read More
Arrow Right
New

Fellow - FPGA architecture

We are seeking a Fellow Architect to define and drive next‑generation architectu...
Location
Location
United States , San Jose
Salary
Salary:
268320.00 - 402480.00 USD / Year
amd.com Logo
AMD
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong Expertise in FPGA configuration, Partial Reconfiguration, and bitstream architecture
  • Experience with readback/debug infrastructure and field diagnostics
  • Background in ASIC/SoC design flows
  • Familiarity with FPGA toolchains (synthesis, P&R, PR flows)
  • Ability to define architecture across silicon, tools, and software layers
  • BS, MS, or PhD in Electrical/Computer Engineering or related field
  • Recognized technical leader in FPGA systems or architecture with significant architectural contributions, patents, publications, or industry impact
Job Responsibility
Job Responsibility
  • Own FPGA configuration architecture, including bitstream, boot flows, interfaces, compression, security, and reliability
  • Define next‑generation Partial Reconfiguration architecture for low‑latency, secure dynamic updates
  • Drive innovation in runtime reconfiguration for high‑availability and adaptive systems
  • Architect readback, debug, and observability infrastructure for validation and in‑field diagnostics
  • Define configuration security, including secure boot, key management, and update flows
  • Drive system trade-offs across configuration bandwidth, latency, and power, including memory and interconnect interactions
  • Develop clear architecture specifications for configuration, PR, and debug subsystems
  • Align with software/tools teams for end‑to‑end configuration and PR flows
  • Collaborate with silicon and design teams for scalable high‑quality implementation
  • Influence product roadmap and customer engagement for PR‑driven solutions
What we offer
What we offer
  • Benefits offered are described: AMD benefits at a glance.
  • Fulltime
Read More
Arrow Right
New

Software Engineer, Systems

Meta is seeking a Staff Systems Software Engineer to design and build the founda...
Location
Location
Australia , Sydney
Salary
Salary:
Not provided
meta.com Logo
Meta
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in systems software engineering, including work on operating systems, runtime environments, low-level networking, storage systems, or large-scale platform infrastructure
  • Experience leading the end-to-end technical design and delivery of major systems software initiatives, including architecture definition, cross-team coordination, and production rollout
  • Experience diagnosing and resolving complex systems-level issues such as memory management bugs, concurrency and synchronization errors, or latency regressions using advanced debugging and profiling tools
  • Experience building reliable, observable systems software with well-defined SLOs, automated testing, staged rollout strategies, and production monitoring
  • Experience communicating systems architecture decisions and engineering trade-offs in writing to technical and non-technical audiences
Job Responsibility
Job Responsibility
  • Architect and implement large-scale systems software components, including low-level platform services, runtime environments, or infrastructure frameworks that underpin Meta's product ecosystem
  • Lead the technical design of systems initiatives, evaluating trade-offs across performance, reliability, scalability, and maintainability to drive sound engineering decisions
  • Identify and resolve complex systems-level performance bottlenecks using profiling, instrumentation, and advanced debugging techniques including static analysis and trace-based diagnostics
  • Define and enforce service level objectives, build observability infrastructure including dashboards and alerting, and drive mean-time-to-mitigation improvements during production incidents
  • Establish and evolve coding standards, testing strategies, and rollout practices for systems software across the team, including automated resiliency and overload testing
  • Leverage AI-assisted development workflows to accelerate systems design, code generation, and cross-disciplinary analysis, applying sound judgment on when deep systems expertise is required
  • Collaborate with cross-functional partners across infrastructure, product engineering, and hardware teams to align systems architecture with broader platform requirements
  • Drive execution of multi-team systems initiatives by coordinating dependencies, managing phased rollouts and migrations, and proactively surfacing and mitigating technical risks
  • Mentor other engineers on systems design principles, debugging methodologies, and AI-augmented development practices, and contribute to onboarding and engineering programs
  • Communicate technical decisions, architectural trade-offs, and systems health metrics clearly in writing and presentations to both engineering and non-engineering stakeholders
What we offer
What we offer
  • Equal Employment Opportunity
  • Reasonable accommodations for qualified individuals with disabilities and disabled veterans
  • Fulltime
Read More
Arrow Right
New

Senior Data Scientist

We are looking for an experienced Senior Data Scientist to drive advanced analyt...
Location
Location
Poland
Salary
Salary:
Not provided
valtech.com Logo
Valtech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in machine learning, statistics, and applied data science
  • Experience with causal inference, experimentation, or decision science methodologies
  • Solid understanding of forecasting, optimization, or analytical modeling techniques
  • Strong programming skills in Python and SQL
  • Experience building and deploying production-ready data science or ML systems
  • Familiarity with model lifecycle management (training, deployment, monitoring)
  • Hands-on experience with at least one major cloud platform: Azure (preferred), AWS, or GCP
  • Experience working with modern data and AI platforms (e.g., Azure ML / Azure AI, Databricks, or similar ecosystems)
  • Experience working with complex, multi-source datasets (e.g., transactional, behavioral, operational data)
  • Ability to translate business problems into analytical frameworks
Job Responsibility
Job Responsibility
  • Develop and deploy machine learning models across use cases (forecasting, optimization, recommendation systems)
  • Apply statistical, predictive, and prescriptive modeling techniques to solve business problems
  • Build reusable modeling frameworks that can scale across multiple domains
  • Design and implement causal inference methods (e.g., uplift modeling, experiments, quasi-experimental methods)
  • Translate observational and experimental data into actionable business insights
  • Embed causal reasoning into decision systems that guide actions (e.g., optimization, prioritization, trade-offs)
  • Integrate GenAI capabilities (e.g., LLMs, RAG pipelines, agent-based systems) into data science workflows
  • Contribute to the development of intelligent agents and AI-assisted decision-making systems
  • Combine structured data models with unstructured data and GenAI outputs
  • Build forecasting models (time-series, probabilistic, causal) to support planning and operations
What we offer
What we offer
  • 24 working days of paid vacation
  • National holidays covered
  • Sick leave (up to 20/year)
  • Unpaid leave (up to 20/year)
  • Medical insurance
  • Multisport card OR Multikafeteria
  • Maternity & paternity leave support
  • Internal workshops & learning initiatives
  • Professional certifications reimbursement
  • Participation in professional local & global communities
  • Fulltime
Read More
Arrow Right
New

Senior Data Scientist

Location
Location
Ukraine
Salary
Salary:
Not provided
valtech.com Logo
Valtech
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience in machine learning, statistics, and applied data science
  • Experience with causal inference, experimentation, or decision science methodologies
  • Solid understanding of forecasting, optimization, or analytical modeling techniques
  • Strong programming skills in Python and SQL
  • Experience building and deploying production-ready data science or ML systems
  • Familiarity with model lifecycle management (training, deployment, monitoring)
  • Hands-on experience with at least one major cloud platform: Azure (preferred), AWS, or GCP
  • Experience working with modern data and AI platforms (e.g., Azure ML / Azure AI, Databricks, or similar ecosystems)
  • Experience working with complex, multi-source datasets (e.g., transactional, behavioral, operational data)
  • Ability to translate business problems into analytical frameworks
Job Responsibility
Job Responsibility
  • Develop and deploy machine learning models across use cases (forecasting, optimization, recommendation systems)
  • Apply statistical, predictive, and prescriptive modeling techniques to solve business problems
  • Build reusable modeling frameworks that can scale across multiple domains
  • Design and implement causal inference methods (e.g., uplift modeling, experiments, quasi-experimental methods)
  • Translate observational and experimental data into actionable business insights
  • Embed causal reasoning into decision systems that guide actions (e.g., optimization, prioritization, trade-offs)
  • Integrate GenAI capabilities (e.g., LLMs, RAG pipelines, agent-based systems) into data science workflows
  • Contribute to the development of intelligent agents and AI-assisted decision-making systems
  • Combine structured data models with unstructured data and GenAI outputs
  • Build forecasting models (time-series, probabilistic, causal) to support planning and operations
What we offer
What we offer
  • Medical insurance
  • Sports reimbursement budget
  • Home office support
  • A number of free psychological and legal consultations
  • Maternity and paternity leave support
  • Internal workshops and learning initiatives
  • English language classes compensation
  • Professional certifications reimbursement
  • Participation in professional local and global communities
  • Growth Framework to manage expectations and define the steps to move towards the selected career
  • Fulltime
Read More
Arrow Right
New

Engineering Manager, Storage SRE

Airbnb was born in 2007 when two hosts welcomed three guests to their San Franci...
Location
Location
United States
Salary
Salary:
212000.00 - 265000.00 USD / Year
airbnb.com Logo
Airbnb
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 9+ years of relevant industry experience in database infrastructure, storage systems, or site reliability engineering
  • 3+ years of engineering management experience leading SRE, infrastructure, or platform teams
  • Demonstrated track record of building high-performing teams by hiring strong engineers, developing talent, and maintaining team health through periods of change
  • Strong technical foundation with the ability to partner with technical leads on architectural decisions, roadmap tradeoffs, and delivery quality
  • Proven ability to lead a team through a technology transition while maintaining operational rigor on existing systems
  • Solid understanding of distributed systems, cloud infrastructure, and production database operations
  • Strong communicator able to cut through ambiguity and represent the team credibly to senior leadership
Job Responsibility
Job Responsibility
  • Own the Storage SRE technical roadmap across a 12+ month horizon, setting the direction for how the team deepens its operational model as it takes on new database technologies alongside its existing systems
  • Lead and grow a team of engineers by providing mentorship, timely feedback, and career development support to build a high-performing, inclusive team
  • Drive the generalization of cluster lifecycle, schema management, and observability tooling as the team broadens its database technology support
  • Partner with engineering teams across Airbnb as the primary expert on reliable database adoption, helping them work with mission-critical storage systems safely and efficiently at scale
  • Establish and uphold operational excellence standards covering on-call strategy, incident response, backup and disaster recovery, and systemic reliability improvements
  • Collaborate with storage infrastructure and platform teams to ensure Storage SRE's tooling and observability stay current as the broader storage platform evolves
  • Improve the developer experience for engineers working with high-traffic transactional storage systems
  • Drive performance, security, scalability, and availability initiatives across Airbnb's database systems
  • Communicate technical strategy and trade-offs clearly to engineers and senior leadership
What we offer
What we offer
  • bonus
  • equity
  • benefits
  • Employee Travel Credits
  • Fulltime
Read More
Arrow Right