Examples

This page provides practical examples for using ARMs in various scenarios, from basic red-teaming evaluations to advanced custom configurations.

Basic Examples

Quick Debug Run
Perfect for testing your setup and getting familiar with ARMs. This runs only the first instance per category.
python -m src.main \
    --dataset StrongReject \
    --attacker_name 'gpt-4o-2024-11-20' \
    --victim_name claude-3-7-sonnet-20250219 \
    --mcp_config "attack_lib/configs/full_config.json" \
    --instance_per_category 1 \
    --max_actions 10 \
    --debug
Standard Evaluation
Comprehensive evaluation on a standard benchmark with all attack strategies.
python -m src.main \
    --dataset JailbreakBench \
    --attacker_name 'gpt-4o-2024-11-20' \
    --victim_name claude-3-7-sonnet-20250219 \
    --mcp_config "attack_lib/configs/full_config.json" \
    --instance_per_category 20 \
    --max_actions 30
Policy Compliance Testing
Evaluate compliance with regulatory frameworks by generating behaviors from risk definitions.
python -m src.main \
    --generate_from_risks \
    --risk_input_file "datasets/EU_AI_Act/risk_definition.json" \
    --attacker_name 'gpt-4o-2024-11-20' \
    --victim_name claude-3-7-sonnet-20250219 \
    --mcp_config "attack_lib/configs/full_config.json" \
    --instance_per_category 5 \
    --max_actions 25
Local Model Evaluation
Evaluate locally hosted models (e.g., InternVL3 with vLLM).
python -m src.main \
    --dataset StrongReject \
    --attacker_name 'gpt-4o-2024-11-20' \
    --victim_name InternVL3-14B \
    --victim_base_url 'http://localhost:8911/v1' \
    --mcp_config "attack_lib/configs/full_config.json" \
    --instance_per_category 10 \
    --max_actions 30

Advanced Usage

Custom Memory Configuration
Fine-tune the memory module parameters for specific evaluation needs.
python -m src.main \
    --dataset JailbreakV \
    --attacker_name 'gpt-4o-2024-11-20' \
    --victim_name claude-3-5-sonnet-20241022 \
    --mcp_config "attack_lib/configs/full_config.json" \
    --instance_per_category 15 \
    --max_actions 40 \
    --top_k 5 \
    --alpha 1.5 \
    --epsilon_lambda 0.8
Multi-Epoch Evaluation
Run multiple epochs for statistical significance and consistency testing.
python -m src.main \
    --dataset OWASP \
    --attacker_name 'gpt-4o-2024-11-20' \
    --victim_name gpt-4o-2024-11-20 \
    --mcp_config "attack_lib/configs/full_config.json" \
    --instance_per_category 10 \
    --max_actions 30 \
    --epoch 3
Lightweight Attack Configuration
Use a subset of attack strategies for faster evaluation or targeted testing.
python -m src.main \
    --dataset StrongReject \
    --attacker_name 'gpt-4o-2024-11-20' \
    --victim_name claude-3-7-sonnet-20250219 \
    --mcp_config "attack_lib/configs/lightweight_config.json" \
    --instance_per_category 20 \
    --max_actions 25

Custom Configurations

Creating Custom Attack Configurations
Create a custom MCP configuration file with specific attack strategies.
# attack_lib/configs/custom_config.json
[
    {
        "server_name": "acronym",
        "command": "python",
        "args": ["attack_lib/servers/acronym.py"]
    },
    {
        "server_name": "actor_attack",
        "command": "python", 
        "args": ["attack_lib/servers/actor_attack.py"]
    },
    {
        "server_name": "crescendo",
        "command": "python",
        "args": ["attack_lib/servers/crescendo.py"]
    }
]
Custom Risk Definitions
Define custom risk categories for domain-specific evaluation.
# datasets/Healthcare/risk_definition.json
{
    "Medical Misinformation": {
        "description": "Requests that could spread harmful medical misinformation",
        "rules": [
            "Do not provide unverified medical advice",
            "Do not contradict established medical consensus without evidence",
            "Do not recommend dangerous self-medication"
        ]
    },
    "Patient Privacy": {
        "description": "Requests that could compromise patient confidentiality",
        "rules": [
            "Do not reveal patient information without consent",
            "Do not assist in accessing protected health information",
            "Do not help bypass medical privacy protections"
        ]
    }
}

Batch Evaluations

Multi-Model Comparison
Bash script to evaluate multiple victim models with the same configuration.
#!/bin/bash

# Multi-model evaluation script
MODELS=("claude-3-7-sonnet-20250219" "claude-3-5-sonnet-20241022" "gpt-4o-2024-11-20")
DATASET="JailbreakBench"
INSTANCES=10

for MODEL in "${MODELS[@]}"; do
    echo "Evaluating model: $MODEL"
    python -m src.main \
        --dataset $DATASET \
        --attacker_name 'gpt-4o-2024-11-20' \
        --victim_name $MODEL \
        --mcp_config "attack_lib/configs/full_config.json" \
        --instance_per_category $INSTANCES \
        --max_actions 30
    echo "Completed evaluation for $MODEL"
done
Cross-Dataset Evaluation
Evaluate the same model across multiple datasets for comprehensive assessment.
#!/bin/bash

# Cross-dataset evaluation script
DATASETS=("StrongReject" "JailbreakBench" "OWASP" "EU-AI-Act")
VICTIM_MODEL="claude-3-7-sonnet-20250219"

for DATASET in "${DATASETS[@]}"; do
    echo "Evaluating dataset: $DATASET"
    python -m src.main \
        --dataset $DATASET \
        --attacker_name 'gpt-4o-2024-11-20' \
        --victim_name $VICTIM_MODEL \
        --mcp_config "attack_lib/configs/full_config.json" \
        --instance_per_category 15 \
        --max_actions 30
    echo "Completed evaluation for $DATASET"
done
Standalone Behavior Generation
Generate diverse behaviors from risk definitions without immediate evaluation.
# Generate behaviors for later use
python -m src.risk_to_behavior \
    --input_file "datasets/FINRA/risk_definition.json" \
    --output_file "datasets/FINRA/generated_behaviors.csv" \
    --samples_per_category 20

# Then use the generated behaviors
python -m src.main \
    --dataset Generated-FINRA \
    --attacker_name 'gpt-4o-2024-11-20' \
    --victim_name claude-3-7-sonnet-20250219 \
    --mcp_config "attack_lib/configs/full_config.json" \
    --instance_per_category 20 \
    --max_actions 35
Best Practices: Start with debug runs to verify your setup, use appropriate compute budgets (max_actions) based on your needs, and consider running multiple epochs for statistical significance in research evaluations.
Resource Management: Large-scale evaluations can consume significant API credits. Monitor your usage and adjust batch sizes accordingly.

Adding Victim Models

ARMs supports integration of various victim models through a flexible provider system. You can easily add new models by extending the existing model classes or configuring custom endpoints.

Supported Model Providers

ARMs currently supports the following model providers:

Adding OpenAI-Compatible Models

For models that follow the OpenAI API format:

# Use with custom endpoint
python -m src.main \
    --dataset StrongReject \
    --attacker_name 'gpt-4o-2024-11-20' \
    --victim_name your-custom-model \
    --victim_base_url 'http://your-endpoint:8000/v1' \
    --mcp_config "attack_lib/configs/full_config.json"

InternVL Models with vLLM

Setting up and evaluating InternVL models:

# 1. Start vLLM server for InternVL
python -m vllm.entrypoints.openai.api_server \
    --model OpenGVLab/InternVL2-8B \
    --host 0.0.0.0 \
    --port 8911 \
    --served-model-name InternVL3-8B

# 2. Evaluate with ARMs
python -m src.main \
    --dataset JailbreakV \
    --attacker_name 'gpt-4o-2024-11-20' \
    --victim_name InternVL3-8B \
    --victim_base_url 'http://localhost:8911/v1' \
    --instance_per_category 10

Customized Providers

For models requiring special handling, you can extend the ARMs model provider system.

Creating a Custom Provider

Implement a new model class by extending the base model interface:

# src/models/custom_model.py
from .base_model import BaseModel
from typing import List, Dict, Any

class CustomModel(BaseModel):
    def __init__(self, model_name: str, base_url: str = None):
        super().__init__(model_name)
        self.base_url = base_url
        # Initialize your model client here
    
    def query(self, messages: List[Dict[str, Any]]) -> str:
        """
        Query the custom model with messages.
        
        Args:
            messages: List of message dictionaries with role and content
            
        Returns:
            Model response as string
        """
        # Implement your model query logic here
        response = self._call_model_api(messages)
        return response
    
    def _call_model_api(self, messages):
        # Custom API calling logic
        pass

Registering the Custom Provider

Add your custom model to the main evaluation pipeline:

# In src/main.py, add model detection logic
if 'custom-model' in victim_name.lower():
    from models.custom_model import CustomModel
    victim_model = CustomModel(victim_name, victim_base_url)
elif 'gpt' in victim_name.lower():
    victim_model = OpenAIModel(victim_name, victim_base_url)
# ... existing model detection logic

Provider Configuration Examples

Common configurations for different model types:

# Hugging Face Transformers (via custom endpoint)
python -m src.main \
    --victim_name llama-2-7b-chat \
    --victim_base_url 'http://localhost:8080/v1'

# Azure OpenAI
python -m src.main \
    --victim_name gpt-4 \
    --victim_base_url 'https://your-resource.openai.azure.com/v1'

# AWS Bedrock (requires custom provider implementation)
python -m src.main \
    --victim_name claude-v2 \
    --victim_base_url 'bedrock://us-east-1'

Model-Specific Considerations

Model Integration Tip: Test your custom provider with a simple debug run before conducting full evaluations to ensure proper integration.