forked from mmalahe/unc-dissertation
-
Notifications
You must be signed in to change notification settings - Fork 0
/
UseCasePINNs.tex
253 lines (163 loc) · 15 KB
/
UseCasePINNs.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
\chapter{PINNs}
\graphicspath{./figures}{/media/chaztikov/1c9bbb3d-8efa-4b13-b788-5cb0e1a584ed/PINNs/main/examples/IBPINN/notes/FOSLS}{/media/chaztikov/1c9bbb3d-8efa-4b13-b788-5cb0e1a584ed/PINNs/main/examples/IBPINN/}{/media/chaztikov/1c9bbb3d-8efa-4b13-b788-5cb0e1a584ed/PINNs/main/examples/IBPINN/notes/}
\section{Section}
\subsection{Subsection}
\subsubsection{Subsubsection}
Citation \cite{bibkey1}
\section{Network Types}
\section{Activation Functions}
\subsection{Rectified Linear Unit and Variants}
\subsubsection{RELU}
\subsubsection{Leaky RELU}
\subsection{Sigmoid and Variants}
\subsubsection{Sigmoid}
\subsubsection{Sigmoidal Functions}
\subsubsection{SWISH/SILU}
The swish function is a mathematical function defined as follows:
\begin{align*}
\operatorname{swish}(x)=x \operatorname{sigmoid}(\beta x)=\frac{x}{1+e^{-\beta x}} \cdot{ }^{[1]}
\end{align*}
uses non-monotonicity, and may have influenced the proposal of other activation functions with this property such as Mish. ${ }^{3]}$
\subsubsection{MISH}
\subsection{Adaptive Activation Functions}
\subsubsection{PRELU Activation (Adaptive Tanh)}
\subsubsection{STAN Activation (Adaptive Tanh)}
\section{Neural Networks and Approximation Theory}
%http://pc-petersen.eu/Neural_Network_Theory.pdf
%http://pc-petersen.eu/Neural_Network_Theory.pdf
\section{Advanced Techniques}
\section{Relative Weighting/Balancing of Losses}
\subsection{Signed Distance Function (SDF)}
One area of considerable interest is weighting the losses with respect to each other. For example, we can weight the losses from the residual of a partial differential equation in primitive variables as
\begin{equation*}
L=\lambda_{B C} L_{B C}+\lambda_{\text {residual }} L_{\text {residual }}
\end{equation*}
Depending on the $\lambda_{B C}$ and $\lambda_{\text {residual }}$ this can impact the convergence of the solver. We can extend this idea to varying the weightings spatially as well. Written out in the integral formulation of the losses we
\begin{equation*}
L_{\text {residual }}=\int_0^1 \lambda_{\text {residual }}(x)\left(\frac{\delta^2 u_{\text {net }}}{\delta x^2}(x)-f(x)\right)^2 d x
\end{equation*}
The choice for the can be varied based on problem definition, and is an active field of research. In general, we have found it beneficial to weight losses lower on sharp gradients or discontinuous areas of the domain. For example, if there are discontinuities in the boundary conditions we may have the loss decay to
on these discontinuities. Another example is weighting the equation residuals by the signed distance function, SDF, of the geometries. If the geometry has sharp corners this often results in sharp gradients in the solution of the differential equation. Weighting by the SDF tends to weight these sharp gradients lower and often results in a convergence speed increase and sometimes also improved accuracy. In this user guide there are many examples of this and we defer further discussion to the specific examples.
\begin{figure}
\centering
% \includegraphics[width=0.99\linewidth]{/media/chaztikov/1c9bbb3d-8efa-4b13-b788-5cb0e1a584ed/PINNs/main/examples/IBPINN/unc-dissertation/notes/FOSLS/SDF_loss_weighting.png}
\includegraphics[width=0.99\linewidth]{./figures/SDF_loss_weighting}
\caption{Fig. 28 Improvements in convergence speed by weighting the equation residuals spatially.}
\label{fig:sdflossweighting}
\end{figure}
Figure \cite{fig:sdflossweighting} shows errors for one such example of laminar flow (Reynolds number 50) over a 17 fin heat sink (tutorial FPGA Heat Sink with Laminar Flow) in the initial 100,000 iterations. The multiple closely spaced thin fins lead to several sharp gradients in flow equation residuals in the vicinity of the heat sink. Weighting them spatially, we essentially minimize the dominance of these sharp gradients during the iterations and achieve a faster rate of convergence.
\subsection{Neural Tangent Kernel}
\subsection{Homoscedastic Task Uncertainty for Loss Weighting}
\subsection{Relative Loss Balancing with Random Lookback (ReLoBRaLo)}
\subsection{GradNorm}
\subsection{ResNorm}
\subsection{Neural Tangent Kernel}
\section{Collocation Point Sampling}
\subsection{Point Cloud Density}
In this section, we discuss the accuracy improvements by adding more points in the areas where the field is expected to show a stronger spatial variation. validation losses or the validation residuals losses start to increase towards the end of thaining the more points
\begin{figure}
\centering
\includegraphics[width=0.99\linewidth]{./figures/sampling_point_cloud_density}
\caption{Improvements in accuracy by adding more points in the interior.}
\label{fig:samplingpointclouddensity}
\end{figure}
Figure \cite{fig:sdflossweighting} shows the comparison of increasing the point density in the vicinity of the same 17 fin heat sink that we saw in the earlier comparison in Section sink, we are able to achieve better $L_2$ errors for $p, v$, and $w$.
\subsection{Pseudorandom Sampling}
\subsubsection{Latin Hypercube (LHC) Sampling}
%\subsection{Adaptive Sampling}
\subsection{Residual Adaptive Sampling (RAR)}
%\subsubsection{Weighted Residual Adaptive Sampling (RAR)}
\subsection{Importance Sampling}
Suppose our problem is to find the optimal parameters $\theta^*$ such that the Monte Carlo approximation of the integral loss is minimized
\begin{align*}
\begin{aligned}
\theta^* &=\underset{\theta}{\operatorname{argmin}} \mathbb{E}_f[\ell(\theta)] \\
& \approx \underset{\theta}{\operatorname{argmin}} \frac{1}{N} \sum_{i=1}^N \ell\left(\theta ; \mathbf{x}_{\mathbf{i}}\right), \quad \mathbf{x}_{\mathbf{i}} \sim f(\mathbf{x}),
\end{aligned}
\end{align*}
where $f$ is a uniform probability density function. In importance sampling, the sampling points are drawn from an alternative sampling distribution $q$ such that the estimation variance of the integral loss is reduced, that is
\begin{align*}
\theta^* \approx \underset{\theta}{\operatorname{argmin}} \frac{1}{N} \sum_{i=1}^N \frac{f\left(\mathbf{x}_{\mathbf{i}}\right)}{q\left(\mathbf{x}_{\mathbf{i}}\right)} \ell\left(\theta ; \mathbf{x}_{\mathbf{i}}\right), \quad \mathbf{x}_{\mathbf{i}} \sim q(\mathbf{x}) .
\end{align*}
Modulus offers point cloud importance sampling for improved convergence and accuracy, as originally proposed in 12 . In this scheme, the training points are importance sampling implementation.
% in Modulus are presented in examples/ldc/ldc_2d_importance_sampling.py script.
\begin{figure}\label{fig:annular_ring_sample_prob}
\includegraphics{./figures/annular_ring_sample_prob}
\caption{Fig. 16 A visualization of the training point sampling probability at iteration 100K for the annular ring example.}
\end{figure}
\begin{figure}\label{fig:annular_ring_importance_sampling}
\includegraphics{./figures/annular_ring_importance_sampling.png}
\caption{Fig. 15 A comparison between the uniform and importance sampling validation error results for the annular ring example.}
\end{figure}
\ref{fig:annular_ring_importance_sampling} shows a comparison between computed at iteration $100 \mathrm{~K}$ is also shown in \ref{fig:importance_sample_prob}
\subsection{Exact Boundary Constraint Imposition}
\subsubsection{Approximate Distance Function (ADF)}
The standard neural network solvers impose boundary conditions in a soft form, by incorporating boundary conditions as constraints in form of additional computed, and next, we will discuss the formation of the solution ansatz based on the the 11
Let $D \subset \mathbb{R}^d$ denote the computational domain with boundary $\partial D$. The exact distance is the shortest distance between any point $\mathbf{x} \in \mathbb{R}^d$ to the domain boundaries $\partial D$, and therefore, is zero on $\partial D$. The exact distance function is not second or higher-order differentiable, and thus, one can use the ADF function $\phi(\mathbf{x})$ instead.
The exact boundary condition imposition in Modulus is currently limited to $2 \mathrm{D}$ geometries only. Let $\partial D \in \mathbb{R}^2$ be a boundary composed of $n$ line segments and curves $D_i$, and $\phi_i$ denote the ADF to each curve or line segment such that $\phi_1 \cup \phi_2 \cup . . \cup \phi_n=\phi$. The properties of an ADF function are as follows: For any point $\mathbf{x}$ on $\partial D, \phi(x)=0$, and (2) $\phi(x)$ is normalized to the $m$-th order, i.e., its derivative w.r.t the unit inward normal vector is one and second to $m$ -th order derivatives are zero for all the points on $\partial D$.
The elementary properties of R-functions, including R-disjunction (union), R-conjunction (intersection), and R-negation, can be used for constructing a composite ADF, $\phi(\mathbf{x})$, to the boundary $\partial D$, when ADFs $\phi_i(\mathbf{x})$, to the partitions of $\partial D$ are known. Once the ADFs, $\phi_i(\mathbf{x})$ to all the partitions of $\partial D$ are calculated, we can calculate the ADF to $\partial D$ using the R-equivalence operation. When $\partial D$ is composed of $n$ pieces, $\partial D_i$, then the ADF $\phi$ that is normatized up to order $m$ is given by
\subsection{Gradient Aggregation}
As mentioned in the previous subsection, training of a neural network solver for complex problems requires a large batch size that can be beyond the available GPU memory limits. Increasing the number of GPUs can effectively increase the batch size, however, one can instead use gradient aggregation in case of limited GPU availability. With gradient aggregation, the required gradients are computed in several forward/backward iterations using different mini batches of the point cloud and are then aggregated and applied to update the model parameters. This will, in effect, increase the batch size, although at the cost of increasing the training time. In the case of multi-GPU/node training, gradients corresponding to each mini-batch are aggregated locally on each GPU, and are then aggregated globally just before the model parameters are updated. Therefore, gradient aggregation does not introduce any extra communication overhead between the workers.
%Details on how to use the gradient aggregation in Modulus is provided in Tutorial Modulus Configuration.
%
\subsection{Exact Continuity}
Velocity-pressure formulations are the most widely used formulations of the Navier-Stokes equation. However, this formulation has two issues that can be challenging to deal with. The first is the pressure boundary conditions, which are not given naturally. The second is the absence of pressure in the continuity equation, in addition to the fact that there is no evolution equation for pressure that may allow to adjust mass conservation. A way to ensure mass conservation is the definition of the velocity field from a vector potential:
\begin{equation*}
\vec{V}=\nabla \times \vec{\psi}=\left(\frac{\partial \psi_z}{\partial y}-\frac{\partial \psi_y}{\partial z}, \frac{\partial \psi_x}{\partial z}-\frac{\partial \psi_z}{\partial x}, \frac{\partial \psi_y}{\partial x}-\frac{\partial \psi_x}{\partial y}\right)^T
\end{equation*}
where $\vec{\psi}=\left(\psi_x, \psi_y, \psi_z\right)$. This definition of the velocity field ensures that it is divergence free and that it satisfies continuity:
\begin{equation*}
\nabla \cdot \vec{V}=\nabla \cdot(\nabla \times \vec{\psi})=0
\end{equation*}
%A good overview of related formulations and their advantages can be found in
%Young, D. L., C. H. Tsai, and C. S. Wu. “A novel vector potential formulation of 3D Navier–Stokes equations with through-flow boundaries by a local meshless method.” Journal of Computational Physics 300 (2015): 219-240.
\section{Use Cases of PINNs vs Classical Numerical Methods}
Current AI methods will be slower than the traditional solvers for training of a single case/geometry. Advantages of Physics-ML technology are for (these are also covered with examples in the Modulus presentation):
\begin{enumerate}
\item Multiple (parameterized) cases where there are several different configurations in the analysis space
\item Inverse problems (data is given but determine coefficients of PDEs)
\item Cases where measured data is available but the Physics is too complicated or not fully understood
\item Data assimilation cases where there is measured or simulated data for some variables but not the entire field (e.g. digital twins, medical imaging, full waveform inversion etc.)
\item Training AI model on compute intensive part of the solver and call that from the solver during the analysis to improve the speed (constitutive models, turbulence models or radiation viewfactors etc.)
\item Point clouds can sometimes present higher quality solutions than mesh based simulations (since without convergence studies, mesh based solutions can yield questionable results).
\end{enumerate}
CRUNCH group
%https://www.brown.edu/research/projects/crunch/home 26
In their papers, they display various use cases for physically informed neural networks (PINNs).
An example: In one of these papers, PINNs allowed for Runge Kutta simulations of extreme order.
From may experience, you are right: PINNs shine in design optimization, when you try different model configurations (subject to physical constraints) based on a multitude of parameters.
From my point of view, the general idea of PINNs for design optimization is not too dissimilar to this recent NVIDIA paper concerning autlod: Appearance-Driven Automatic 3D Model Simplification
\section{Solid Mechanics: Steady Linear Elasticity Examples}
\subsection{Fully Developed 2D Turbulent Channel Flow}
% https://docs.nvidia.com/deeplearning/modulus/text/intermediate/two_equation_turbulent_channel.html
\subsection{Steady Hagen-Poiseuille Channel Flow 2D, Fully Developed, Newtonian}
\subsection{Steady Couette Channel Flow 2D, Fully Developed, Newtonian}
\subsection{Steady Taylor Green Vortex 2D, Fully Developed, Newtonian}
\subsection{Steady Lid Driven Cavity 2D, Fully Developed, Newtonian}
\section{Fluid Flow: Steady Navier-Stokes Examples}
\subsection{Steady Hagen-Poiseuille Channel Flow 2D, Fully Developed, Newtonian}
\subsection{Steady Couette Channel Flow 2D, Fully Developed, Newtonian}
\subsection{Steady Taylor Green Vortex 2D, Fully Developed, Newtonian}
\subsection{Steady Lid Driven Cavity 2D, Fully Developed, Newtonian}
\section{Fluid Flow: Steady Turbulence Examples}
% \subsection{Fully Developed 2D Turbulent Channel Flow} https://docs.nvidia.com/deeplearning/modulus/text/intermediate/two_equation_turbulent_channel.html
\subsubsection{$k$-$\epsilon$ Model}
\subsubsection{$k$-$\omega$ Model}
\subsection{Fully Developed 2D Turbulent Channel Flow}
\section{Neural Network Architectures ("Models")}
See FPGA and Industrial
\section{PINN Methods}
Expert Informed Neural Networks
\subsection{Scaling and Nondimensionalizing the Problem}
\subsection{Hard Constraints}
Using ADF
\subsection{Finite Difference Method Approximation}
Using Meshless Differentiation F (MDF)
\subsection{Exact Continuity, Exact Incompressibility}
\subsection{Integral Continuity Plane}
\subsection{Spatially Weighted Loss}
\subsection{Spatially Dependent Sampling}
\subsection{Sobolev Training}
Gradient-Enhanced
\subsection{Selective Equations Term Suppression (SETS)}
\section{Thesis Structure}