Physics informed neural networks (PINN) with a mathematically informed architecture for the nuclear decay equation
Semester project of Miha Pompe with the goal to develop a novel mathematically informed architecture of a PINN to solve the nuclear decay equation. All the details and results are also published in PINN_presentation.pdf
, PINN_report.pdf
and the project's Wiki
.
Currently, the algorithm can solve problems with stiffness up to
Matrix size | Stiffness | Relative error | Relative final error |
---|---|---|---|
0.79 % | 0.94 % | ||
0.0009 % | 0.0011 % | ||
0.13 % | 0.23 % | ||
54 % | 63 % |
How to increase accuracy and performance:
- choose the lowest
t_max
and the highestt_steps
, - choose the optimal weight ratio,
- decrease the absolute tolerance of the training,
- try different seeds.
Isotopic decay and transmutation is governed by the nuclear decay equation
where
The decay equation is used in virtually all nuclear codes, both for simulation of reactor cores, and for radioactive decay in spent fuel. Thus, accurate and fast methods for the solution of the equation are very important for modelling reactors and for criticality safety of spent fuel repositories.
Physics Informed Neural Networks (PINNs) [2] offer a new paradigm for solving differential equations. The student Guclielmo Pacifico is exploring the viability of general-purpose PINNs for solving the nuclear decay equation. The neural network has input
where
However, upon closer examination of the problem (1), it seems that there might exist networks of a specific architecture that are more adequate for the problem at hand. This project aims to explore these type of networks.
It is known [3] that if A has n distinct real eigenvalues, the solution to (1) is the linear combination
where
This network architecture has the advantage that only n2 parameters need to be learned, and the exact required numbers of neurons and layers are known. Thus many hyperparameters are known in advance (whereas with regular neural networks tuning the hyperparameters is nontrivial).
Having n distinct eigenvalues is ensured with probability 1, but having only real eigenvalues is not. If some eigenvalues are complex, the solution is
with
First clone the repository
git clone https://github.com/mihapompe/PINN-Nuclear-decay-equation
Move into the repository and create a virtual environment with all the necessary libraries. Change environment name PINN to whatever.
conda create --name PINN python=3.9
conda activate PINN
pip install -r requirements.txt
To exit the virtual environment
conda deactivate
To run a test code write
python test_configuration.py
test_configuration.py
Example of how to solve a burnup or decay problem.Bateman_PINN.py
Code that defines PINN classes and its associated functions.CRAMsolve.py
,RKsolve.py
,SineCosineLayer.py
contain functions and classes used byBateman_PINN.py
analysis/
folder with python notebooks for running analysisreport/
folder with project statement, report and presentation.results/
folder with all the graphs and results.
- Impact of different optimizer parameters on the default test case. By choosing the optimal parameters we can decrease the relative final loss by 25 %.
final rel error | learn rate | max_iter |
tol_grad |
---|---|---|---|
0.003269 % | 0.1 | 20 | 1e-7 |
0.004371 % | 0.01 | 20 | 1e-7 |
0.003705 % | 0.1 | 50 | 1e-7 |
0.007665 % | 0.01 | 50 | 1e-7 |
0.003688 % | 0.01 | 100 | 1e-7 |
0.01109 % | 0.001 | 100 | 1e-7 |
0.004873 % | 0.001 | 1000 | 1e-7 |
0.003642 % | 0.1 | 100 | 1e-7 |
0.003642 % | 0.1 | 100 | 1e-8 |
0.003642 % | 0.1 | 100 | 1e-9 |
0.001813 % | 0.1 | 100 | 1e-10 (tol_change) |
0.001573 % | 0.1 | 100 | 1e-11 (tol_change) |
0.001573 % | 0.1 | 200 | 1e-11 (tol_change) |
0.002994 % | 0.01 | 200 | 1e-11 (tol_change) |
0.0009489 % | 0.2 | 200 | 1e-11 (tol_change) |
0.001501 % | 0.5 | 200 | 1e-11 (tol_change) |
0.0009489 % | 0.2 | 100 | 1e-11 (tol_change) |
0.001008 % | 0.2 | 20 | 1e-11 (tol_change) |
0.0007994 % | 0.2 | 100 | 1e-12 (tol_change) |
0.0007585 % | 0.1 | 100 | 1e-12 (tol_change) |
0.0007585 % | 0.1 | 100 | 1e-12 (tol_change) 1e-9 (tol_grad) |
RESULT: The biggest impact it caused by lowering the tolerance_change
and then by slightly increasing max_iter
and sometimes learning rate.
-
Finalizing the code, report, and documentation.
-
Test the use of CRAM48
RESULT: final relative error went from 3.269e-5 (CRAM16) to 3.269e-5 (CRAM48). CRAM16 time 14.45s, CRAM48 time 49.74s
-
Testing the use of
float32
andfloat64
RESULT:
pytorch linear
supports onlyfloat32
, final relative error was 3.269e-5 (time 19.4s). I then tried implementing the linear layer manually, with datatypes set tofloat32
final relative error was 6.74e-5 (time 31.7s), withfloat64
final relative error was 4.619e-4 (time 44.15 s). All for default test case. Then I tested default test case with stiffness$10^{22}$ .torch.linear
error 1.081 (time 156s, 50 epochs),float64
error 2.57 (time 228s, 50 epochs),float32
error 2.52 (time 163s, 50 epochs). At large stiffnesses the model performs bad even with higher numerical precision. -
Testing gradual increase past stiffness
$10^{20}$ . (Stiffness, final rel error) : (16, 3.2e-5), (17, 6.0e-2), (18, 1.3e-1), (20, 4.3e-1), (22, 1.06), (25, 3.49). Without the use of gradual increase the final relative error with stiffness$10^{22}$ was 1.081 which is only slightly worse than 1.06 which we achieved with this method. It is interesting that the final error is almost the same, but the loss for gradual increase is on the order$10^{-6}$ , whereas without is on the order of$10^{2}$ .
- Data analysis and making of graphs.
- Writing of the report.
- Making the presentation.
- Adjustable weight ratio. Tested different functions to lower the ratio:
- Step function, lower the ratio after some epoch or at a certain
IC_loss
. - Linear function, linearly decrease the ratio over some epochs than stay the same.
- Geometric decrease, every epoch multiply the ratio by some quotient smaller than 1. All of these methods performed worse than just setting the ratio to a constant value.
- Step function, lower the ratio after some epoch or at a certain
- Writing the report and adding graphs, clean the code.
- Get new burnup matrices and analysis.
Burnup matrix of size
$43\times43$ with stiffness |A| = 3, 1 × 1011 was cropped from dataset ENDFB71. The orange line represents the mean of the final error which is$2.2\times10^{-4}$ . The total relative error is$8.2\times10^{−5}$ . Computation time of CRAM was 15s and of the PINN was 126s (on CPU).
-
Implement and test architecture to solve the burn-up problem.
- Two implementations: first one with
SineCosineLayer
(complex) and the other one by combining two networks withSineLayer
andCosineLayer
(complex_duo) - The network is able to solve the decay problem (
$50\times50$ , stiffness$5\times10^{16}$ , rel error$3.5\times10^{-6}$ , time 332s) with higher accuracy but longer time.
- Two implementations: first one with
-
Change the initial conditions, set all the concentrations to 10.
RESULT: Network performs equally well as with previous initial conditions (first isotope set to 100) and is able to adapt to the dynamics. (picture 102)
-
New criteria for stopping the training, IC loss and ODE loss have to be below a certain threshold.
-
Self-adjusting weights. Rule: when
$log_{10}(IC_{loss}) - log_{10}(ODE_{loss}) >= 5$ : weights = [1,1]RESULT: the rule outperformed [5,1] ratio, but was equivalent to [1,1] in terms of accuracy but slower in time.
-
Trying different weight ratios (st = stiffness =
$10^{18}$ , 100x100 matrix)Weight heuristic Relative error PINN time Loss 5*st 11 % 37 $10^{-10}$ 5*st/$10^{2}$ 12 % 30 $10^{-10}$ 5*st/$10^{4}$ 12 % 38 $10^{-10}$ 5*st/$10^{6}$ 12 % 29 $10^{-9}$ 5*st/$10^{8}$ 11 % 39 $10^{-9}$ 5*st/$10^{10}$ 10 % 30 $10^{-7}$ 5*st/$10^{12}$ 10 % 71 $10^{-7}$ 5*st/$10^{13}$ 8 % 94 $10^{-6}$ 5*st/$10^{14}$ 8 % 101 $10^{-6}$ 5*st/$10^{16}$ 4 % 131 $10^{-6}$ 10*st/$10^{17}$ 3.1 % 195 $10^{-5}$ 5*st/$10^{17}$ 3.4 % 205 $10^{-5}$ 5*st/$10^{18}$ Didn't converge / / RESULT: Bringing the weight ratio down to 10 to 100 reduces the relative error by an order of magnitude. Going too low can result in network not converging.
-
Increase stiffness gradually Error stays roughly the same, decreases a bit, from 9.0 % to 8.2 %.
-
Model analysis, drawing error and concentration plots, example given below.
-
Initial tests on feeding the training data sequentially and not randomly.
RESULT: Performs worse, relative error went from 2 % to 3 %.
-
Implementing the paper (
$50\times50$ ,t_max
=2.5,t_steps
= 10000, stiffness =$5\times10^{16}$ )Method Weigh heuristic Relative error Relative final error Notes Normal / 2.67% 4.14% Gradual increase / 2.11% 3.29% Gradual increase / $10^{10}$ 2.42% 3.68% Gradual increase / $10^{12}$ 2.44% 3.78% Gradual increase / $10^{14}$ 0.11% 0.24% Gradual increase / $10^{15}$ 0.018% 0.037% picture conc_and_error_over_time90 Normal /$10^{15}$ 0.024% 0.048% picture conc_and_error_over_time91 Gradual increase no dilution / $10^{15}$ 0.016% 0.040% Slower gradual increase no dilution / $10^{15}$ 0.024% 0.045% Slower gradual increase / $10^{15}$ 0.015% 0.035% Slower gradual increase 50 0.007% 0.010% conc_and_error_over_time92 Slower gradual increase 5 0.0034% 0.0028% Slower gradual increase no dilution 5 0.0048% 0.0041% Gradual increase 5 0.0041% 0.0022% Slower gradual increase 5 with positivity loss 0.0046% 0.0029% conc_and_error_over_time93 Slower gradual increase 5 with positivity loss x10 0.0054% 0.0037% Slower gradual increase 1 0.0009% 0.0011% conc_and_error_over_time94 Slower gradual increase / $10^{16}$ 0.13% 0.23% With 100x100 matrix, stiffness $10^{18}$ , picture 101, loss$10^{-5}$ Slower gradual increase / $10^{16}$ 54% 63% With 500x500 matrix, stiffness $10^{24}$ , loss$10^{-10}$ , 38 minGradual increase means that given a training time interval [0, t_max] we slowly increase t_max, the number of data points is increasing. If it says no dilution it means that we always have the same number of data points, but still increase t_max with every epoch. The latter one is about 10% to 20% slower due to more training points.
Here I also tested how the model performs if we add another loss that enforces that
$N(t) > 0$ , positivity_loss =$||N(t)|-N(t)|^2$ . Tests have shown that no significant improvement is gained by adding this loss.Beware that the result, meaning the relative errors, fluctuate, since they are dependent on the random seed.
RESULT: Adding gradual increases to t_max as mentioned in the paper (https://arxiv.org/pdf/2304.08289.pdf) has decreased the relative error by one to two orders of magnitude.
-
Permute matrix to lower triangular.
-
Add fluctuations to same eigenvalues.
RESULT: both of these changes did not impact the accuracy of the solution.
-
Move the code to Merlin and run it on GPUs.
- Set the gradient of upper triangle to zero.
- Test the model on bigger matrices (without stiffness reduction)
Matrix size Stiffness Relative error PINN time CRAM time $5\times5$ $5\times10^{8}$ $5\times10^{-7}$ 0.4 s 12.8 s $12\times12$ $10^{24}$ 0.9 % 0.7 s 13.3 s $17\times17$ $10^{30}$ 1.8 % 1.9 s 13.4 s $100\times100$ $10^{18}$ 8.1 % 29.5 s 19.8 s $100\times100$ $10^{10}$ $5\times10^{-6}$ 30.3 s 23.4 s $500\times500$ $8\times10^{24}$ 54 % 38 min 3.5 min
-
Fix weights of initial layer.
RESULT: We are able to solve problems up to at least
$10^{12}$ stiffness. It also converges much quicker to the final solution. Most of the problems converge in one iteration, making it almost independent of the random seed value. -
Initial weights of the output layer are lower triangular.
-
Tested Adam optimizer. It did not converge, even with different learning rates.
-
Pass the values of the output layer matrix for a lower stiffness problem as initial values for stiffer problem. With that the problem was able to solve problem up to stiffness
$10^{16}$ , after that it becomes a problem with storing such large numbers.
-
Implement and test the random seed method (run the algorithm with different seeds until it converges)
RESULT: We are able to solve problems up to at least 20k stiffness.
-
Import numerical solvers.
-
Test the model on bigger matrices
-
Test the modified IC loss function
$L_{IC} = ||A(N(t=0)-N_0)||$ .RESULT: We still have a dependence on the weight ratio. With the same weight ratio the original loss function performed better than the new one.
-
Perform a more broad and dense hyperparameter search
RESULT: We found that it's best to initialize
$w_{IC}/w_{ODE} = 5\times$ stiffness.
-
Implement the new architecture
-
Test the new model on small matrices with varying stiffness
RESULT: The new architecture is able to solve problems with stiffness up to 1250.
-
Perform a hyperparameter search
- Reading papers and reports
- Administrative things