Skip to content

vshlepov/constrained_optimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

Nonlinear constrained optimization with gradient decent algorithm.

This work addresses an asset allocation problem: maximization of risk-adjusted return on capital subjest to Basel-III regulatory constraints, exposure limits and certain business objectives. The optimization is complicated by the non-linear nature of the problem, and presence of non-differentiable indicator variables.

A common approach to handling constraints in optimization problems is to incorporate them into the loss function as penalties. This method penalizes the objective function when constraints are violated, effectively guiding the optimization process toward feasible solutions. However, this approach makes the loss function non-differentiable at the points where constraints are breached.

To address the issue of non-differentiability, we use sigmoid transformation to convert constraints into a differentiable indicator function We use hyperparameter β to control the steepness of the sigmoid curves. A higher β value makes the sigmoid function steeper, approximating a step function. This approach aligns with insights from TensorFlow Constrained Optimization (Cotter and Sridharan 2019; Narasimhan, Cotter, and Gupta 2019).

We define loss function a product of the primary objective and a series of sigmoid-transformed constraint indicators. The sigmoid transformation of constraints allows for a continuous, differentiable representation of discrete constraints, enabling gradient-based optimization techniques. When constraints are satisfied, their respective indicators approach 1, allowing the objective function to dominate the loss. Conversely, when constraints are violated, the corresponding indicators approach 0, effectively nullifying the objective and driving the loss towards 0. This behavior ensures that the optimization process strongly favors solutions that satisfy all constraints while maximizing the primary objective.

The β hyperparameter plays a crucial role in the sigmoid transformation. By adjusting β, we can control how strictly the constraints are enforced. A larger β results in a steeper sigmoid curve, leading to more aggressive penalization of constraint violations. Through experiments, we found it useful to gradually increase the β hyperparameter during training. This gradual increase allows the model to initially explore a wider solution space with softer constraints, and progressively enforce stricter compliance as the training progresses. This approach helps in balancing exploration and exploitation, leading to more robust optimization outcomes.

Local optima are a common problem in optimization and machine learning, particularly in non-convex optimization landscapes. To avoid local optima traps in our optimization process, we implemented (a) dynamic scheduling of beta parameter, (b) random initialization of asset allocations (weights), both within and outside the feasibility area (c) per-sample gradients for a batch of variants, which enabled us to efficiently explore multiple potential solutions simultaneously. These strategies collectively enhance the robustness of our model and improve its ability to find global optima.