and scoring logic to evaluate agent actions Analyze agent logs, failure modes, and decision paths Work with code repositories... limits) and how these affect evaluation design Familiarity with Docker English proficiency - B2 How it works...
with agent evaluation platforms and MCP CLI Tools and Technologies: Python (pytest, uv, Pillow), Docker, Bash, Git Submodules..., and improve overall code quality Requirements 5+ years of experience as a Software Engineer (primarily Python) Deep...
with agent evaluation platforms and MCP CLI Tools and Technologies: Python (pytest, uv, Pillow), Docker, Bash, Git Submodules..., and improve overall code quality Requirements 5+ years of experience as a Software Engineer (primarily Python) Deep...