CLAS 2024

The Competition for LLM and Agent Safety 2024

This is the official website of the Competition for LLM and Agent Safety (CLAS) 2024, a NeurIPS 2024 competition. This competition, focusing on large language model (LLM) and agent safety, marks a significant step forward in advancing the responsible development and deployment of AI technologies. As LLMs and AI agents increasingly permeate various sectors, ensuring their safety and security becomes crucial. This competition aims to bring together leading researchers, developers, and practitioners to address the most pressing challenges in AI safety. Participants are tasked with designing and developing innovative solutions that induce harmful output from LLMs and agents, as well as lead to backdoor trigger recovery for LLMs and agents. The competition will not only encourages technical innovation but also fosters a deeper understanding of the safety implications of AI, aiming to drive the the field toward safer and more trustworthy AI systems.

Prizes: There is a $30,000 prize pool. The first-place teams will also be invited to co-author a publication summarizing the competition results and will be invited to given a short talk at the competition workshop at NeurIPS 2024 (registration provided). Our current planned procedures for distributing the pool are here.

News

December 5: The CLAS 2024 workshop will be held on December 15 at the West Meeting Room 210 at the Vancouver Convention Center
October 28: All test phase submissions have been evluated. The winning teams will be announced soon.
October 21: The test phase begins. Information for testing data and models is here. Submission will close on October 26 at midnight Estern Time.
Oct 18: Development phase ends. No more new registrations.
Sep 21: Online evaluation and leaderboard are available now. Please submit here.
Sep 9: Registration is available!
August 11: All tracks are available in the starter kit now!
July 31: The starter kit is released
April 18: The development of this website begins

Overview

CLAS 2024 comprises three major tracks where participants are challenged to develop automated prompt injection approaches to invoke undesirable LLM outputs or agent actions.

Jailbreaking Attack Track: Given an aligned LLM, develop a jailbreaking attack that elicits harmful responses falling into diverse categories. For more information, see here.
Backdoor Trigger Recovery for Models Track: Given a code generation LLM injected with tens of backdoors (each associated with a malicious target code in the LLM output), develop a trigger recovery approach that identifies the trigger string for each target code. For more information, see here.
Backdoor Trigger Recovery for Agents Track: Given a web agent (powered by LLM) with large number of backdoors and a list of malicious agent actions, develop a trigger recovery approach that identifies the trigger string for each target action. For more information, see here.

Compute Credits: We are awarding $500 compute credit grants to student teams that would not otherwise be able to participate.

Important Dates

July 31: The development phase begins. Starter kit with is released.
Sep 9: Registration is available.
Sep 21: Online evaluation is available.
October 18: Final submissions for the development phase
October 21: The test phase begins. Test phase models and data are released.
October 26: Final submissions for the test phase
October 27: Top-ranking teams are contacted and asked for their code, models, and method details
November 1: Winning teams are announced for all tracks

Rules

Open Format: This is an open competition. All participants are encouraged to share their methods upon conclusion of the competition, and outstanding submissions will be highlighted in a joint publication. To be eligible for prizes and named as a winning team, top-ranking teams in the test phase are required to share their methods, code, and models with the organizers at a minimum, though public releases are highly encouraged.
Registration: Double registration is not allowed. We expect teams to self-certify that all team members are not part of a different team registered for the competition, and we will actively monitor for violations of this rule. Teams may participate in multiple tracks. Organizers are not allowed to participate in the competition or win prizes.
Compute Limits: In all three tracks, although the evaluation will be performed on the submitted prompts, the method that generates the submission must run in at most 5 days on eight 80GB A100 GPUs. These limits do not include initial setup during the development phase. This is a soft requirement; we will only check it for the winning teams, and understandable reasons for exceeding the limit by a modest amount are acceptable. We expect that most methods will not require this much computation.
Constraints on Methods: Methods used for generating submitted test cases must satisfy the following requirements. (1) They must be able to generate any desired number of test cases, not just a fixed number of test cases. (2) They must be fully automated. After an initial setup, they should require no human effort to adapt to a new set of behaviors and no human-in-the-loop for generating test cases. (3) The use of features that are clearly loopholes is not allowed (e.g., metadata). For Track II&III, the submitted triggers should not contain any information about the provided targets. We may not anticipate all loopholes and we encourage participants to alert us to their existence.
Rule-breaking will be handled case-by-case based on the circumstance. Significant rule-breaking will result in disqualification.

These rules are an initial set, and we require participants to consent to a change of rules if there is an urgent need during registration. If a situation should arise that was not anticipated, we will implement a fair solution, ideally using consensus of participants.

The Competition for LLM and Agent Safety 2024

News

Overview

Important Dates

Rules

Organizers

Sponsors