quality, testing, observability, reliability, and performance. Oversee end-to-end delivery processes, including requirements... and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site...
with observability, diagnostics, and live-site operations for mission-critical services. Experience working in environments with limited..., monitoring, and live-site support. Collaborate with cross-functional partners and partner engineering teams) to translate product...
High-Impact Backend Role | AI-Native Legal Platform New York / On-Site Full-time | Mid–Senior.... This is a product-first backend role where reliability, safety, and extensibility are mission-critical. What You'll Build Core...
, applying site reliability engineering principles to drive automation, observability, and resilience across the data platform... Required Qualifications 5+ years in platform engineering, data platform operations, site reliability engineering, DevOps, or related roles...
: Minimum 7 years of software related experience required, with a mixture of Site Reliability, DevOps, or Release Engineering... and observability of systems at scale and detect and alert on trends of information. Define metrics to ensure the high performance...
-site health: improve observability, monitoring/alerting, incident response, and reduce time-to-diagnosis through systemic... of improving reliability, performance, and operational excellence through observability and systematic engineering practices....
. Integrate AI systems with code repositories, CI/CD pipelines, observability tools, and security/compliance frameworks to enhance... reliability and performance. Drive best practices, design reviews, and technical direction, ensuring data governance, security...
delivery schedules, drive alignment across partner teams, and ensure proper end-to-end testing, live-site coverage, scalability..., production reliability, and security hardening for both protections and detections. Hold accountability as a designated...
, reliability, fault tolerance, and cost optimization. Experience using observability tools (logging, metrics, distributed tracing..., security best practices, and deployment infrastructure. Maintain operations of live site services on a rotational on-call basis...
observability tools (logging, metrics, tracing) to diagnose service issues and improve system reliability. Experience.... Build extensible, maintainable services and features with strong diagnosability, reliability, and production-readiness...
. Guarantee Reliability and Security: Define and meet rigorous SLIs/SLOs by engineering robust observability stacks (Prometheus... architectures. Observability & Reliability Mindset: Experience building comprehensive monitoring and logging frameworks (ELK...
. Ensure secure, high-quality product delivery, overseeing system architecture and code quality. Champion Live Site culture..., ensuring reliability and customer delight and mentor engineers, shaping the vision for agentic AI-powered work management. Seek...
improvements across agentic workflows. Oversee Live Site operations for agentic systems, ensuring reliability, rapid incident... for agent interoperability, real-time processing, and fault tolerance. Drive performance optimization and observability...
advanced deployment and support of enterprise software solutions, digital intelligence (monitoring and observability... their development and deployment processes. Mentor junior engineers on automation, observability, and continuous delivery concepts...
, and deployment of applications System Reliability and Scalability: Implementing Site Reliability Engineering (SRE) principles... to enhance system reliability, availability, and performance Monitoring and Optimization: Implementing monitoring...
, reliability, and scalability of AI platforms. Implement observability for agentic AI systems to ensure reliability, transparency... for its people and its customers. Respect for both work and play, with vehicles that are equally at home at a camp site...
performance optimization and observability improvements across agentic workflows. Oversee Live Site operations for agentic systems..., ensuring reliability, rapid incident response, and continuous improvement. Collaborate with partner engineering teams to build...
with infrastructure, platform, security, and product teams to embed AI capabilities into operational systems, observability platforms..., reliability engineering, and automation workflows Conducts architecture and design reviews for AI platforms, data systems, and ML...
GenAI observability pipelines to track trace-level data, prompt inputs and outputs, and model latency. Collaborate closely..., embeddings, reranking) and context engineering focusing on reliability, cost, and latency optimization. Strong agent design...
collaborates closely with Product Managers, Data Engineers, Infrastructure, and Site Reliability Engineering (SRE) teams on a daily.... Ownership-driven. You take responsibility for systems end-to-end, including reliability and operational health. Technically...