This is the official website of the Competition for LLM and Agent Safety (CLAS) 2024, a NeurIPS 2024 competition. The competition aims to advance the understanding of the vulnerabilities in LLMs and LLM-powered agents to encourage methods for improving their safety. The competition features three main tracks. In the Jailbreaking Attack track, participants are challenged to elicit diverse harmful outputs in aligned LLMs via prompt injection. In the Backdoor Trigger Recovery for Models track, participants are given a CodeGen LLM embedded with a large number of domain-specific backdoors. They are chellenged to reverse-engineer the trigger for each given target. In the Backdoor Trigger Recovery for Agents track, trigger reverse engineering will be focused on eliciting specific backdoor targets corresponding to malicious agent actions.
Prizes: There is a $30,000 prize pool. The first-place teams will also be invited to co-author a publication summarizing the competition results and will be invited to given a short talk at the competition workshop at NeurIPS 2024 (registration provided). Our current planned procedures for distributing the pool are here.
CLAS 2024 comprises three major tracks where participants are challenged to develop automated prompt injection approaches to invoke undesirable LLM outputs or agent actions.
Compute Credits: We are awarding $500 compute credit grants to student teams that would not otherwise be able to participate.
These rules are an initial set, and we require participants to consent to a change of rules if there is an urgent need during registration. If a situation should arise that was not anticipated, we will implement a fair solution, ideally using consensus of participants.
Contact: clas2024-organizers@googlegroups.com
For updates and reminders, join the google group: https://groups.google.com/g/clas2024-updates.
We are kindly sponsored by a private funder.