-
Notifications
You must be signed in to change notification settings - Fork 0
2. Implementation
Our goal is to create a network that takes as an input time
where
Since the total loss
It is desired that the network is able to accurately and quickly solve big and stiff problems, for example
where
The decay matrix
Therefore the network only needs to learn the coefficients of matrix
Besides fixing the weights of the input layer and removing biases we can also constrain the weights of the hidden layer (coefficients of matrix
A better approach is to initialize
Due to measurement error of matrix A it is not uncommon for two eigenvalues to be the same, which poses an issue since the basis functions are not the solutions of the problem. Additionally given two equal eigenvalues two outputs of the input layer would be indistinguishable for all times, the solution wouldn't be unique. To avoid this we slightly change the eigenvalues, such that the difference is smaller than the measurement error. A problem also arises when the eigenvalue is zero, in this case the particular nuclide does not contribute to the reaction and can therefore be excluded, the row and column of the nuclide are removed.
The burnup matrix
The network then feeds into two output layers, where the network will learn the matrices
The network weights can be further constrained. For real eigenvalues
In order to improve the performance of the model the following techniques have been tested.
LBFGS (limited-memory BFGS) optimizer was chosen instead of more common ones like stochastic gradient descent or Adam. The impact of the optimizer was already studied in. LBFGS was chosen due to faster convergence time and in order to eliminate a hyperparameter, the learning rate.
Gradual increase of stiffness. Solving the problem is easier when the stiffness of matrix
Gradual expansion of training data. The training data of the model is a list of times,
Dynamic weight ratio. The weight ratio determines which loss (
- Step function
where
- Linear function
- Geometric decrease,
$f(epoch)=w_0 \beta^{epoch}$ , where$\beta$ is the decrease rate,$\beta < 1$ . The problem with this method is that if the network does not learn the initial conditions in the beginning, it will never be able to do so. The output is very dependent on the choice of$\beta$ . - The function does not depend on
$epoch$ but rather on the ratio between$\mathcal{L}_{IC}$ and$\mathcal{L}_{ODE}$ , such that:
where
The performance of the model did not improve by employing these functions, even after extensive experimentation with different parameters