agent actions against. You'll work to ensure each scenario is clearly defined, well‑scored, and easy to execute and reuse... like JSON or YAML for scenario description - Can define expected agent behaviors (gold paths) and scoring logic - Basic...
. About The Role We're looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents.... You'll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions...
. This remote, part-time role involves creating test cases, analyzing agent behavior, and ensuring clear documentation. A Bachelor...
. This remote, part-time role involves creating test cases, analyzing agent behavior, and ensuring clear documentation. A Bachelor...