CLAS 2024

The Competition for LLM and Agent Safety 2024

This is the official website of the Competition for LLM and Agent Safety (CLAS) 2024, a NeurIPS 2024 competition. The competition aims to advance the understanding of the vulnerabilities in LLMs and LLM-powered agents to encourage methods for improving their safety. The competition features three main tracks. In the Jailbreaking Attack track, participants are challenged to elicit diverse harmful outputs in aligned LLMs via prompt injection. In the Backdoor Trigger Recovery for Models track, participants are given a CodeGen LLM embedded with a large number of domain-specific backdoors. They are chellenged to reverse-engineer the trigger for each given target. In the Backdoor Trigger Recovery for Agents track, trigger reverse engineering will be focused on eliciting specific backdoor targets corresponding to malicious agent actions.

Prizes: There is a $30,000 prize pool. The first-place teams will also be invited to co-author a publication summarizing the competition results and will be invited to given a short talk at the competition workshop at NeurIPS 2024 (registration provided). Our current planned procedures for distributing the pool are here.

News

  • April 18: The development of this website begins

 

 

 

Overview

CLAS 2024 comprises three major tracks where participants are challenged to develop automated prompt injection approaches to invoke undesirable LLM outputs or agent actions.

  • Jailbreaking Attack Track: Given an aligned LLM, develop a prompt injection approach for jailbreaking that triggers harmful responses falling into diverse categories. For more information, see here.
  • Backdoor Trigger Recovery for Models Track: Given a code generation LLM with a large number of backdoors and a list of malicious target codes, develop a trigger recovery approach that identifies the trigger string for each target code. For more information, see here.
  • Backdoor Trigger Recovery for Agents Track: Given a web agent (powered by LLM) with large number of backdoors and a list of malicious agent actions, develop a trigger recovery approach that identifies the trigger string for each target action. For more information, see here.

Compute Credits: We are awarding $500 compute credit grants to student teams that would not otherwise be able to participate.

Important Dates

  • July 15: Registration starts.
  • July 21: The development phase begins. Development models and data are released.
  • October 12: Final submissions for the development phase
  • October 13 The test phase begins. Test phase models and data are released.
  • October 18: Final submissions for the test phase
  • October 23: Top-ranking teams are contacted and asked for their code, models, and method details
  • October 30: Winning teams are announced for all tracks

Rules

  • Open Format: This is an open competition. All participants are encouraged to share their methods upon conclusion of the competition, and outstanding submissions will be highlighted in a joint publication. To be eligible for prizes and named as a winning team, top-ranking teams in the test phase are required to share their methods, code, and models with the organizers at a minimum, though public releases are highly encouraged.
  • Registration: Double registration is not allowed. We expect teams to self-certify that all team members are not part of a different team registered for the competition, and we will actively monitor for violations of this rule. Teams may participate in multiple tracks. Organizers are not allowed to participate in the competition or win prizes.
  • Compute Limits: In all three tracks, although the evaluation will be performed on the submitted prompts, the method that generates the submission must run in at most 5 days on eight 80GB A100 GPUs. These limits do not include initial setup during the development phase. This is a soft requirement; we will only check it for the winning teams, and understandable reasons for exceeding the limit by a modest amount are acceptable. We expect that most methods will not require this much computation.
  • Constraints on Methods: Methods used for generating submitted test cases must satisfy the following requirements. (1) They must be able to generate any desired number of test cases, not just a fixed number of test cases. (2) They must be fully automated. After an initial setup, they should require no human effort to adapt to a new set of behaviors and no human-in-the-loop for generating test cases. (3) The use of features that are clearly loopholes is not allowed (e.g., metadata). We may not anticipate all loopholes and we encourage participants to alert us to their existence.
  • Rule-breaking will be handled case-by-case based on the circumstance. Significant rule-breaking will result in disqualification.

 

These rules are an initial set, and we require participants to consent to a change of rules if there is an urgent need during registration. If a situation should arise that was not anticipated, we will implement a fair solution, ideally using consensus of participants.

Organizers

Contact: clas2024-organizers@googlegroups.com

For updates and reminders, join the google group: https://groups.google.com/g/clas2024-updates.

We are kindly sponsored by a private funder.