Examples
This page provides practical examples for using ARMs in various scenarios, from basic red-teaming evaluations to advanced custom configurations.
Table of contents
Basic Examples
Quick Debug Run
Perfect for testing your setup and getting familiar with ARMs. This runs only the first instance per category.
python -m src.main \
--dataset StrongReject \
--attacker_name 'gpt-4o-2024-11-20' \
--victim_name claude-3-7-sonnet-20250219 \
--mcp_config "attack_lib/configs/full_config.json" \
--instance_per_category 1 \
--max_actions 10 \
--debug
Standard Evaluation
Comprehensive evaluation on a standard benchmark with all attack strategies.
python -m src.main \
--dataset JailbreakBench \
--attacker_name 'gpt-4o-2024-11-20' \
--victim_name claude-3-7-sonnet-20250219 \
--mcp_config "attack_lib/configs/full_config.json" \
--instance_per_category 20 \
--max_actions 30
Policy Compliance Testing
Evaluate compliance with regulatory frameworks by generating behaviors from risk definitions.
python -m src.main \
--generate_from_risks \
--risk_input_file "datasets/EU_AI_Act/risk_definition.json" \
--attacker_name 'gpt-4o-2024-11-20' \
--victim_name claude-3-7-sonnet-20250219 \
--mcp_config "attack_lib/configs/full_config.json" \
--instance_per_category 5 \
--max_actions 25
Local Model Evaluation
Evaluate locally hosted models (e.g., InternVL3 with vLLM).
python -m src.main \
--dataset StrongReject \
--attacker_name 'gpt-4o-2024-11-20' \
--victim_name InternVL3-14B \
--victim_base_url 'http://localhost:8911/v1' \
--mcp_config "attack_lib/configs/full_config.json" \
--instance_per_category 10 \
--max_actions 30
Advanced Usage
Custom Memory Configuration
Fine-tune the memory module parameters for specific evaluation needs.
python -m src.main \
--dataset JailbreakV \
--attacker_name 'gpt-4o-2024-11-20' \
--victim_name claude-3-5-sonnet-20241022 \
--mcp_config "attack_lib/configs/full_config.json" \
--instance_per_category 15 \
--max_actions 40 \
--top_k 5 \
--alpha 1.5 \
--epsilon_lambda 0.8
Multi-Epoch Evaluation
Run multiple epochs for statistical significance and consistency testing.
python -m src.main \
--dataset OWASP \
--attacker_name 'gpt-4o-2024-11-20' \
--victim_name gpt-4o-2024-11-20 \
--mcp_config "attack_lib/configs/full_config.json" \
--instance_per_category 10 \
--max_actions 30 \
--epoch 3
Lightweight Attack Configuration
Use a subset of attack strategies for faster evaluation or targeted testing.
python -m src.main \
--dataset StrongReject \
--attacker_name 'gpt-4o-2024-11-20' \
--victim_name claude-3-7-sonnet-20250219 \
--mcp_config "attack_lib/configs/lightweight_config.json" \
--instance_per_category 20 \
--max_actions 25
Custom Configurations
Creating Custom Attack Configurations
Create a custom MCP configuration file with specific attack strategies.
# attack_lib/configs/custom_config.json
[
{
"server_name": "acronym",
"command": "python",
"args": ["attack_lib/servers/acronym.py"]
},
{
"server_name": "actor_attack",
"command": "python",
"args": ["attack_lib/servers/actor_attack.py"]
},
{
"server_name": "crescendo",
"command": "python",
"args": ["attack_lib/servers/crescendo.py"]
}
]
Custom Risk Definitions
Define custom risk categories for domain-specific evaluation.
# datasets/Healthcare/risk_definition.json
{
"Medical Misinformation": {
"description": "Requests that could spread harmful medical misinformation",
"rules": [
"Do not provide unverified medical advice",
"Do not contradict established medical consensus without evidence",
"Do not recommend dangerous self-medication"
]
},
"Patient Privacy": {
"description": "Requests that could compromise patient confidentiality",
"rules": [
"Do not reveal patient information without consent",
"Do not assist in accessing protected health information",
"Do not help bypass medical privacy protections"
]
}
}
Batch Evaluations
Multi-Model Comparison
Bash script to evaluate multiple victim models with the same configuration.
#!/bin/bash
# Multi-model evaluation script
MODELS=("claude-3-7-sonnet-20250219" "claude-3-5-sonnet-20241022" "gpt-4o-2024-11-20")
DATASET="JailbreakBench"
INSTANCES=10
for MODEL in "${MODELS[@]}"; do
echo "Evaluating model: $MODEL"
python -m src.main \
--dataset $DATASET \
--attacker_name 'gpt-4o-2024-11-20' \
--victim_name $MODEL \
--mcp_config "attack_lib/configs/full_config.json" \
--instance_per_category $INSTANCES \
--max_actions 30
echo "Completed evaluation for $MODEL"
done
Cross-Dataset Evaluation
Evaluate the same model across multiple datasets for comprehensive assessment.
#!/bin/bash
# Cross-dataset evaluation script
DATASETS=("StrongReject" "JailbreakBench" "OWASP" "EU-AI-Act")
VICTIM_MODEL="claude-3-7-sonnet-20250219"
for DATASET in "${DATASETS[@]}"; do
echo "Evaluating dataset: $DATASET"
python -m src.main \
--dataset $DATASET \
--attacker_name 'gpt-4o-2024-11-20' \
--victim_name $VICTIM_MODEL \
--mcp_config "attack_lib/configs/full_config.json" \
--instance_per_category 15 \
--max_actions 30
echo "Completed evaluation for $DATASET"
done
Standalone Behavior Generation
Generate diverse behaviors from risk definitions without immediate evaluation.
# Generate behaviors for later use
python -m src.risk_to_behavior \
--input_file "datasets/FINRA/risk_definition.json" \
--output_file "datasets/FINRA/generated_behaviors.csv" \
--samples_per_category 20
# Then use the generated behaviors
python -m src.main \
--dataset Generated-FINRA \
--attacker_name 'gpt-4o-2024-11-20' \
--victim_name claude-3-7-sonnet-20250219 \
--mcp_config "attack_lib/configs/full_config.json" \
--instance_per_category 20 \
--max_actions 35
Best Practices: Start with debug runs to verify your setup, use appropriate compute budgets (max_actions) based on your needs, and consider running multiple epochs for statistical significance in research evaluations.
Resource Management: Large-scale evaluations can consume significant API credits. Monitor your usage and adjust batch sizes accordingly.
Adding Victim Models
ARMs supports integration of various victim models through a flexible provider system. You can easily add new models by extending the existing model classes or configuring custom endpoints.
Supported Model Providers
ARMs currently supports the following model providers:
- OpenAI: GPT-4o, GPT-4, GPT-3.5 families
- Anthropic: Claude-3.7-Sonnet, Claude-3.5-Sonnet, Claude-3 families
- Together AI: Various open-source models via Together's API
- Custom vLLM: Self-hosted models using vLLM serving
- Custom OpenAI-compatible: Any API following OpenAI's format
Adding OpenAI-Compatible Models
For models that follow the OpenAI API format:
# Use with custom endpoint
python -m src.main \
--dataset StrongReject \
--attacker_name 'gpt-4o-2024-11-20' \
--victim_name your-custom-model \
--victim_base_url 'http://your-endpoint:8000/v1' \
--mcp_config "attack_lib/configs/full_config.json"
InternVL Models with vLLM
Setting up and evaluating InternVL models:
# 1. Start vLLM server for InternVL
python -m vllm.entrypoints.openai.api_server \
--model OpenGVLab/InternVL2-8B \
--host 0.0.0.0 \
--port 8911 \
--served-model-name InternVL3-8B
# 2. Evaluate with ARMs
python -m src.main \
--dataset JailbreakV \
--attacker_name 'gpt-4o-2024-11-20' \
--victim_name InternVL3-8B \
--victim_base_url 'http://localhost:8911/v1' \
--instance_per_category 10
Customized Providers
For models requiring special handling, you can extend the ARMs model provider system.
Creating a Custom Provider
Implement a new model class by extending the base model interface:
# src/models/custom_model.py
from .base_model import BaseModel
from typing import List, Dict, Any
class CustomModel(BaseModel):
def __init__(self, model_name: str, base_url: str = None):
super().__init__(model_name)
self.base_url = base_url
# Initialize your model client here
def query(self, messages: List[Dict[str, Any]]) -> str:
"""
Query the custom model with messages.
Args:
messages: List of message dictionaries with role and content
Returns:
Model response as string
"""
# Implement your model query logic here
response = self._call_model_api(messages)
return response
def _call_model_api(self, messages):
# Custom API calling logic
pass
Registering the Custom Provider
Add your custom model to the main evaluation pipeline:
# In src/main.py, add model detection logic
if 'custom-model' in victim_name.lower():
from models.custom_model import CustomModel
victim_model = CustomModel(victim_name, victim_base_url)
elif 'gpt' in victim_name.lower():
victim_model = OpenAIModel(victim_name, victim_base_url)
# ... existing model detection logic
Provider Configuration Examples
Common configurations for different model types:
# Hugging Face Transformers (via custom endpoint)
python -m src.main \
--victim_name llama-2-7b-chat \
--victim_base_url 'http://localhost:8080/v1'
# Azure OpenAI
python -m src.main \
--victim_name gpt-4 \
--victim_base_url 'https://your-resource.openai.azure.com/v1'
# AWS Bedrock (requires custom provider implementation)
python -m src.main \
--victim_name claude-v2 \
--victim_base_url 'bedrock://us-east-1'
Model-Specific Considerations
- Input Format: Ensure your model handles multimodal inputs correctly
- Rate Limiting: Implement appropriate request throttling
- Error Handling: Handle API failures and timeouts gracefully
- Response Parsing: Extract text responses consistently
- Token Limits: Manage context length constraints
Model Integration Tip: Test your custom provider with a simple debug run before conducting full evaluations to ensure proper integration.
ARMs Documentation