infrastructure services and making sure they are scalable and are reliable. Site Reliability Engineering (SRE) combines software... and operating hyper-scale datacenters, managing the life cycle of server fleet, providing cloud solutions, and developing various...
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale... at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the...
services rapidly, consistently, and securely. Exemplify cloud-native site reliability best practices. Write code...'s most critical safety and justice issues with our ecosystem of devices and cloud software. Like our products, we work better together...
will be primarily on-site with residency commutable to one of our offices required. Responsibilities As a Principal Engineer of the... systems, CI/CD tooling, and automating cloud-based highly available, high performing applications. Key Skills...
cloud infrastructure within a FedRAMP compliant environment. You'll drive operational excellence, champion SRE..., performance, observability, troubleshooting, security, and reliability. Our Infrastructure Platform stack includes Terraform...
cloud infrastructure within a FedRAMP compliant environment. You'll drive operational excellence, champion SRE..., performance, observability, troubleshooting, security, and reliability. Our Infrastructure Platform stack includes Terraform...
individual to join our team. This role involves ensuring the reliability and performance of our systems and infrastructure. The... ideal candidate will be adept at managing deployments, configurations, and incident responses in cloud environments...
team responsible for ensuring the reliability and support of container platforms both on-premise and in external clouds... like MS Azure, AWS, and Google Cloud. This role involves critical monitoring, troubleshooting, and lifecycle support, offering...
Responsible for ensuring the reliability, performance, and security of our internal and external facing platforms.... This role combines deep expertise in Kubernetes, cloud infrastructure, observability, and Infrastructure-as-Code...
& hybrid platforms. Minimum Qualifications BS/MS in Computer Science or Equivalent At least 3 -5 years in a Reliability... in a public, private, or hybrid cloud environment Hands-on experience managing large numbers of diverse systems...
& hybrid platforms. Minimum Qualifications BS/MS in Computer Science or Equivalent At least 3-5 years in a Reliability... in a public, private, or hybrid cloud environment Hands-on experience managing large numbers of diverse systems...
focused on innovation and reliability. Qualifications 7+ Years of experience needed Proven experience in a similar role... within a fast-paced environment. Strong knowledge of cloud platforms and containerization technologies. Experience with automation...
, drift detection, and config/state tracking (e.g., Azure Policy, Resource Graph). Reliability & Observability: Define SLOs..., network engineering, DevOps, or systems administration. 3+ years operating large-scale cloud services in Azure...
highly reliable, scalable, and secure cloud infrastructure within a FedRAMP compliant environment. You'll drive operational... automation, architecture, performance, observability, troubleshooting, security, and reliability. Our Infrastructure Platform...
About EngFlow At EngFlow, we help developers save time by accelerating software builds and tests. Our cloud-based... our engineers to move quickly and confidently. Key Responsibilities Design, build, and maintain cloud infrastructure...
both on-prem and cloud. Join us in this exciting endeavor! What You Will Be Doing: Lead initiatives to transform IT Compute... Core Team, architecture to build new service offerings across On-Prem and Cloud You will design, scale, and deploy core...
highly reliable, scalable, and secure cloud infrastructure within a FedRAMP compliant environment. You'll drive operational... automation, architecture, performance, observability, troubleshooting, security, and reliability. Our Infrastructure Platform...
highly reliable, scalable, and secure cloud infrastructure within a FedRAMP compliant environment. You'll drive operational... automation, architecture, performance, observability, troubleshooting, security, and reliability. Our Infrastructure Platform...
technologies. We focus on engineering excellence and we attract the best talent in our industry. Our cloud services are built..., and processes to enhance system uptime and reliability. Continuously evaluate and recommend improvements to platform infrastructure...
Job Description: Min 3-5 years of Service reliability/operation experience running large scale, high performance... applications in a hybrid environment (on-prem and cloud). Min 3-5 years of experience writing automation scripts and building...