Simplified implemntation of OPRO (Optimization by PROmpting)

Overview

This project implements a simplified version of Google DeepMind's OPRO (Optimization by PROmpting) framework as given in LLM as optimizers paper, specifically adapted for optimizing prompts for computer science questions from the MMLU dataset.

Research Background

Original OPRO Paper

The original paper "Large Language Models as Optimizers" (Google DeepMind, 2024) introduces OPRO as a novel approach to using LLMs for optimization tasks. Key aspects include:

Natural Language Optimization: OPRO enables optimization through natural language descriptions rather than formal specifications.
Meta-Prompt Structure: Uses previous solutions and their scores to guide the optimization process.
Exploration-Exploitation Balance: Manages the trade-off between exploring new solutions and exploiting known good solutions.

Implementation Details

Core Components

Configuration (OptimizationConfig)

max_steps: int = 150          # Maximum optimization steps
solutions_per_step: int = 8   # Solutions generated per step
max_history: int = 20        # Max number of previous solutions to keep
temperature: float = 1.0     # Temperature for generation
token_weight: float = 0.3    # Weight for token length in scoring
max_tokens: int =         # Maximum tokens allowed in prompt (variable)

Scoring Mechanism The implementation uses a weighted scoring formula:

combined_score = (1 - token_weight) * accuracy + token_weight * token_score
where token_score = 1 - (token_count / max_tokens)

This balances:

Solution accuracy (70% weight by default)
Token efficiency (30% weight by default)

Key Classes

TokenManager: Handles token counting and limits
MMluDataHandler: Manages MMLU dataset operations
Scorer: Evaluates solutions using OpenAI API
OptimizerEngine: Core optimization logic

Architecture Flow

Data Preparation
- Load MMLU computer science questions
- Split into train/test sets
- Sample questions for evaluation
Optimization Process
- Generate meta-prompt using previous solutions
- Create new candidate solutions
- Evaluate solutions for accuracy and token efficiency
- Update optimization history
- Repeat until convergence or max steps
Solution Evaluation
- Calculate accuracy using OpenAI API / Llama model through Groq
- Count tokens using tiktoken
- Compute combined score
- Track best solutions

Setup and Usage

Prerequisites

pip install openai pandas numpy tiktoken tqdm

Environment Variables

export OPENAI_API_KEY='your-api-key'

Data Format

MMLU CSV file should contain:

question: Question text
A, B, C, D: Multiple choice options
answer: Correct answer (A, B, C, or D)

Basic Usage

# Initialize configuration
config = OptimizationConfig()

# Setup data handler
data_handler = MMluDataHandler("path_to_mmlu_cs_data.csv")
data_handler.prepare_data()

# Initialize optimizer
optimizer = OptimizerEngine(config)

# Run optimization
results = optimizer.optimize(data_handler, config.max_steps)

Customization

Adjusting Optimization Priorities

Modify token_weight in OptimizationConfig:

Higher values (>0.3) prioritize token efficiency
Lower values (<0.3) prioritize accuracy

Optimization Parameters

Adjust temperature for exploration/exploitation balance
Modify solutions_per_step for optimization stability
Change max_history for memory management

Results and Output

The optimization process produces:

Best found instruction
Accuracy metrics
Token efficiency metrics
Combined performance scores

Results are saved in JSON format with timestamp:

{
    "steps": [...],
    "best_solution": {
        "instruction": "...",
        "accuracy": 0.85,
        "token_count": 45,
        "combined_score": 0.78
    },
    "best_score": 0.78
}

Limitations and Considerations

API Costs: Uses OpenAI API calls for evaluation
Rate Limits: Consider API rate limiting in optimization process

Future Work

Potential improvements:

Support for multiple LLM providers
Advanced token optimization strategies
Multi-objective optimization approaches
Benchmarking and evaluation of this implementation
Adding tokenizer for llama model (tiktoken tokeniser isn't compatible with llama models, compatible only to GPT based models.)

References

Google DeepMind (2024). "Large Language Models as Optimizers"

Feedback

Feel free to drop your feedbacks at hjawajiwar@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
OPRO_PAPER.pdf		OPRO_PAPER.pdf
README.md		README.md
opro_implementation.ipynb		opro_implementation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simplified implemntation of OPRO (Optimization by PROmpting)

Overview

Research Background

Original OPRO Paper

Implementation Details

Core Components

Architecture Flow

Setup and Usage

Prerequisites

Environment Variables

Data Format

Basic Usage

Customization

Adjusting Optimization Priorities

Optimization Parameters

Results and Output

Limitations and Considerations

Future Work

References

Feedback

About

Releases

Packages

Languages

HarshJ23/OPRO_implementation

Folders and files

Latest commit

History

Repository files navigation

Simplified implemntation of OPRO (Optimization by PROmpting)

Overview

Research Background

Original OPRO Paper

Implementation Details

Core Components

Architecture Flow

Setup and Usage

Prerequisites

Environment Variables

Data Format

Basic Usage

Customization

Adjusting Optimization Priorities

Optimization Parameters

Results and Output

Limitations and Considerations

Future Work

References

Feedback

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages