evaluation scenarios for LLM‑based agents. You'll create test cases that simulate human‑performed tasks and define gold‑standard... behavior to compare agent actions against. You'll work to ensure each scenario is clearly defined, well‑scored, and easy...