ARMs Documentation

The ARMs (Adaptive Red-Teaming Agent against Multimodal Models) is a comprehensive red-teaming framework designed to systematically evaluate the safety and robustness of vision-language models (VLMs). Built on the Model Context Protocol (MCP), ARMs provides a plug-and-play architecture for integrating diverse attack strategies and conducting thorough safety assessments.

Overview of ARMs

ARMs addresses critical challenges in multimodal AI safety evaluation through several key innovations:

🔧
Plug-and-Play Architecture
Built on Model Context Protocol (MCP) with 17+ attack strategies covering diverse adversarial patterns like typographic attacks, contextual cloaking, and visual perturbations.
🧠
Adaptive Memory System
Layered memory module with ε-greedy exploration algorithm that intelligently balances attack diversity and efficiency for comprehensive red-teaming.
🎯
SOTA Performance
Achieves 90%+ attack success rate on Claude-3.7-Sonnet and 27.5% improvement over baselines with 95.83% diversity enhancement.
📋
Policy-Following Evaluation
Supports comprehensive policy-based safety assessments aligned with real-world regulatory frameworks like EU AI Act, OWASP, and FINRA.

Here are the main capabilities of ARMs:

Installation

ARMs requires Python 3.10+ and several dependencies for multimodal processing and model integration.

Environment Setup

Create and activate a conda environment:

conda create -n ARMs python=3.10
conda activate ARMs

Install Dependencies

Install the required packages:

pip install -r requirements.txt

Environment Configuration

Set up the Python path and API keys:

# Set Python path (replace with your actual path)
export PYTHONPATH=/your_path/ARMs-preview:$PYTHONPATH

# Configure API keys
export OPENAI_API_KEY=
export ANTHROPIC_API_KEY=
export TOGETHER_API_KEY=
Note: Make sure to replace /your_path/ARMs-preview with the actual path to your ARMs directory, and configure the API keys for the model providers you plan to use.

Quickstart

Here's how to get started with ARMs for red-teaming evaluation:

Basic Usage

Run a debug evaluation on StrongReject dataset:

python -m src.main \
    --dataset StrongReject \
    --attacker_name 'gpt-4o-2024-11-20' \
    --victim_name claude-3-7-sonnet-20250219 \
    --mcp_config "attack_lib/configs/full_config.json" \
    --instance_per_category 10 \
    --max_actions 30 \
    --debug

This command will:

Key Parameters

Supported Models

Attacker models (backbone):

Victim models:

Important: The --debug flag is recommended for first-time usage as it provides a quick run-through of the pipeline. Remove it for full evaluations across all instances.

Results Structure

After running ARMs, results are saved in results/<timestamp>/ with the following structure: