Skip to content

Commit

Permalink
Update README.md to be more detailed
Browse files Browse the repository at this point in the history
  • Loading branch information
jshuadvd committed Jul 17, 2024
1 parent 7583b17 commit fdbc510
Showing 1 changed file with 17 additions and 3 deletions.
20 changes: 17 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ An in-depth look at the structural modifications and their implications for mode

The **LongRoPE** model architecture is designed to extend the context window of large language models (LLMs) to over 2 million tokens, addressing the limitations of traditional Transformer architectures. The key innovation lies in the progressive extension strategy and the adjustment of positional embeddings.

The LongRoPE model extends the context window of large language models beyond 2 million tokens. Key components include:
Key components include:

1. Rotary Position Encoding (RoPE):

Expand Down Expand Up @@ -102,9 +102,23 @@ The LongRoPE model extends the context window of large language models beyond 2
```

3. Progressive Extension Strategy:

```python

def progressive_extension(model, data, base_length, target_length, population_size, num_mutations, num_crossovers, max_iterations):
# Extend to 128k
lambda_factors_128k, n_hat_128k = search_lambda_factors(model, data, 128000 / base_length, population_size, num_mutations, num_crossovers, max_iterations)
model = fine_tune(model, data, 128000, lambda_factors_128k, n_hat_128k, steps=400)

# Extend to 256k
lambda_factors_256k, n_hat_256k = search_lambda_factors(model, data, 256000 / base_length, population_size, num_mutations, num_crossovers, max_iterations)
model = fine_tune(model, data, 256000, lambda_factors_256k, n_hat_256k, steps=600)

# Extend to target length
if target_length > 256000:
final_lambda_factors, final_n_hat = search_lambda_factors(model, data, target_length / base_length, population_size // 2, num_mutations // 2, num_crossovers // 2, max_iterations // 2)
model.lambda_factors["2048k"] = final_lambda_factors
model.n_hat["2048k"] = final_n_hat

return model, final_lambda_factors, final_n_hat, lambda_factors_256k, n_hat_256k
```

### Progressive Extension Strategy
Expand Down

0 comments on commit fdbc510

Please sign in to comment.