and a proactive, collaborative approach to DevOps PREFERRED QUALIFICATIONS Experience with observability tools such as DataDog...
Knowledgeable about different monitoring and observability options for pro-active incident resolution Strong demonstrated...
Knowledgeable about different monitoring and observability options for pro-active incident resolution Strong demonstrated...
technical documentation, observability, and implementation of business logic. Keep up-to-date with industry trends...
experience with LangChain, OpenAI, Pinecone, and vector databases. Proven success implementing observability, CI/CD, and cost...
Airflow, Glue, or similar tools. Implement robust monitoring, logging, and alerting to ensure reliability and observability...
(Lustre, GPFS, Ceph, VAST), and container runtimes (containerd, NVIDIA Container Toolkit). Drive observability and monitoring...
/AgentOps to build systems for agent evaluation, observability, and performance monitoring. Collaborate with Security... frameworks Airflow, Databricks, Kafka, Spark, and Delta Lake Monitoring and observability tools (Prometheus, Grafana, MLflow...
tools (AWS CloudWatch, AWS CloudTrail, AWS Cost Explorer, and AI-driven observability tools). Operational understanding...
architectures. Your role will be essential in providing continuous performance benchmarking, real-time observability, and long-term..., Prometheus, ELK and Grafana and in definition and implementing the overall observability strategy for HPC validation...
fabric, or lakehouse environment Hands-on experience with modern catalog, lineage, and observability tool Familiarity...
-calling models. Experience with AI observability, model evaluation, and performance tuning. Background in distributed...
like observability, SLO/SLI, metrics...
pipelines for continuous integration, delivery, monitoring, and retraining of models. Ensure observability with logging...
pipelines for continuous integration, delivery, monitoring, and retraining of models. Ensure observability with logging...
with data, platform, and business teams to align technical delivery with business use cases - Ensure observability, guardrails...
performance via query tuning, indexing, and resource management. Implement observability strategies, including metrics, logging... and embedding techniques for AI applications Observability tools: DataDog, Splunk, Prometheus, Grafana CI/CD pipelines: Azure...
. This role combines deep expertise in Kubernetes, cloud infrastructure, observability, and Infrastructure-as-Code... monitor system health using observability tools and rapidly troubleshoot and resolve issues if they arise. Implement...
and maintain infrastructure as code with Terraform, CloudFormation, or AWS CDK. Implement monitoring and observability using...
, and safety Knowledge of reinforcement learning libraries (e.g. RLlib, Ray, OpenAI frameworks) Experience with observability...