collaborates closely with Product Managers, Data Engineers, Infrastructure, and Site Reliability Engineering (SRE) teams on a daily.... Ownership-driven. You take responsibility for systems end-to-end, including reliability and operational health. Technically...
improvements across agentic workflows. Oversee Live Site operations for agentic systems, ensuring reliability, rapid incident... for agent interoperability, real-time processing, and fault tolerance. Drive performance optimization and observability...
, calibration & tuning automation, data pipelines, observability, and reliability. Use AI every day to go faster and improve quality... (simulation-in-the-loop and hardware-in-the-loop), live-site standards, and measurable quality gates across the stack. Collaborate...
cloud services and distributed systems at scale, including reliability, observability, and live-site operations. Deep... across the product lifecycle (security, privacy, safety, accessibility, performance, reliability, and cost). Write proof...
. Establish and maintain high engineering standards for code quality, testing, observability, reliability, and performance... benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup...
standards, including scalability, reliability, observability, and fault tolerance Optimize systems for performance, cost... stipends and much (much) more. Currently work for Yahoo? Please apply on our internal career site....
, reliability, and performance. Implement Infrastructure-as-Code using Terraform and Ansible to provision, manage, and scale Azure..., observability, and continuous improvement across engineering teams. Collaborate across teams and with external partners, including...
and Cisco’s global engineering capabilities. Our work spans networking, security, observability, and customer experience... — designing and deploying foundation models that enhance reliability, strengthen security, prevent downtime, and deliver...
, and reliability. Support the training and fine-tuning of Large and Small Language Models (LLMs and SLMs) for domain- and task... and model observability. Exposure to edge computing environments. Experience on various AI cloud platforms such as AWS...
and transaction data. Develop scalable services that connect with multiple eBay systems, establishing standards for observability..., reliability, and operational excellence. Define guardrails, fallback logic, and safety patterns to ensure alignment...
. (In most cases, Comcast prefers to have employees on-site collaborating unless the team has been designated as virtual due to the... integration, and observability using Spring Boot Actuator, Elastic logs, Kibana, and AppDynamics. Data Processing & Automation...
to establish robust observability standards and ensure smooth production deployment, and post-production reliability. Work... within 31 miles of Tustin, CA or Austin, TX will need to be able to work 2 days on-site) Position Summary: We are seeking...
architecture, and asynchronous workflows. Experience with telemetry, observability (e.g., App Insights), and live site operations... the quality and reliability of the update publishing service pipeline. You will work with other teams across WSD...
-and was named one of America's best startup employers by Forbes (#12 out of 500). Since day one, our Bangalore site has been... reliability, data integrity, and performance at scale. HR Applications: Develops essential tools and interfaces for customers...
, and observability practices to ensure reliability and performance at scale. Act as a Designated Responsible Individual (DRI) for live..., and microservices on Azure (or similar cloud). Deep understanding of reliability engineering, observability (metrics, logs, tracing...
between deep technical work and on-site collaboration with the engineering teams of global industry leaders. If you are passionate... performance, scalability, and reliability. Develop production-ready code for highly available systems, SaaS applications...
between deep technical work and on-site collaboration with the engineering teams of global industry leaders. If you are passionate..., acceptance thresholds, user rating flows, A/B tests). Instrument AI features with strong observability and testing: Logging...
solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products...). Experience with Virtual Machine Scale Sets (VMSS). Experience with live site operations, Site Reliability Engineering (SRE...
integration tests, and ensure observability and reliability for live services. Participate in on-call rotations to support.... Integrate AI-driven telemetry and automation to enhance security, reliability, and performance. Collaborate with cross...
between deep technical work and on-site collaboration with the engineering teams of global industry leaders. If you are passionate..., acceptance thresholds, user rating flows, A/B tests). Instrument AI features with strong observability and testing: Logging...