LASC 2024

The LLM and Agent Safety Competition 2024

Break AI Agent with you ability

Introduction

This is the official website of the LLM and Agent Safety Competition 2024 (under review by NeurIPS 2024). The competition aims to advance the understanding of the vulnerabilities in LLMs and LLM-powered agents to encourage methods for improving their safety. The competition features three main tracks. In the Jailbreaking Attack track, participants are challenged to elicit harmful outputs in guardrail LLMs via prompt injection. In the Backdoor Trigger Recovery for Models track, participants are given a CodeGen LLM embedded with hundreds of domain-specific backdoors. They are asked to reverse-engineer the trigger for each given target. In the Backdoor Trigger Recovery for Agents track, trigger reverse engineering will be focused on eliciting specific backdoor targets based on malicious agent actions.

News

  • April 4: Website development begins

Overview

LASC 2024 aims to advance red teaming techniques for risk identification in LLMs and agents while encouraging new defenses to shield them from adversarial behaviors. It comprises three major tracks where participants are challenged to develop automated prompt injection approaches to invoke undesirable LLM outputs or agent actions.

 

  • Jailbreaking Attack Track: Given an aligned LLM with a guardrail, develop a prompt injection approach for jailbreaking that triggers harmful responses falling into diverse categories.
  • Backdoor Trigger Recovery for Models Track: Given a code generation LLM with 100 backdoors and a list of malicious target codes, develop a trigger recovery approach that identifies the trigger string for each target code.
  • Backdoor Trigger Recovery for Agents Track: Given a web agent (powered by LLM) with 100 backdoors and a list of malicious agent actions, develop a trigger recovery approach that identifies the trigger string for each target action.



Compute Credits: We are awarding $500 compute credit grants to student teams that would not otherwise be able to participate.

Important Dates

  • June 17: Registration starts.
  • July 1: The development phase begins. Development models and data are released.
  • October 8: Final submissions for the development phase
  • October 9: The test phase begins. Test phase models and data are released.
  • October 13: Final submissions for the test phase
  • October 16: Top-ranking teams are contacted and asked for their code, models, and method details
  • October 30: Winning teams are announced for all tracks

Rules

  • Open Format: This is an open competition. All participants are encouraged to share their methods upon conclusion of the competition, and outstanding submissions will be highlighted in a joint publication. To be eligible for prizes and named as a winning team, top-ranking teams in the test phase are required to share their methods, code, and models with the organizers at a minimum, though public releases are highly encouraged.
  • Registration: Double registration is not allowed. We expect teams to self-certify that all team members are not part of a different team registered for the competition, and we will actively monitor for violations of this rule. Teams may participate in multiple tracks. Organizers are not allowed to participate in the competition or win prizes.
  • Compute Limits: In all three tracks, although the evaluation will be performed on the submitted prompts, the method that generates the submission must run in at most 5 days on eight 80GB A100 GPUs. These limits do not include initial setup during the development phase. This is a soft requirement; we will only check it for the winning teams, and understandable reasons for exceeding the limit by a modest amount are acceptable. We expect that most methods will not require this much computation.
  • Constraints on Methods: Methods used for generating submitted test cases must satisfy the following requirements. (1) They must be able to generate any desired number of test cases, not just a fixed number of test cases. (2) They must be fully automated. After an initial setup, they should require no human effort to adapt to a new set of behaviors and no human-in-the-loop for generating test cases. (3) The use of features that are clearly loopholes is not allowed (e.g., metadata). We may not anticipate all loopholes and we encourage participants to alert us to their existence.
  • Rule-breaking may result in disqualification, and significant rule-breaking will result in ineligibility for prizes.

Organizers

Please contact us if you have any questions.
Email: lasc2024-organizers@googlegroups.com