and resilience engineer to lead the “AI workloads fault injection and resilience at scale” efforts for vLLM and llm-d (distributed... LLM inference on Kubernetes/OpenShift). You will design and automate failure and resiliency experiments (some examples...