complex task structures, policy logic, and agent evaluation frameworks. Throughout the project, you'll have to balance quality... (edge cases, failure modes, "what could go wrong") Some understanding of how scoring or evaluation works in agent testing...