-
Notifications
You must be signed in to change notification settings - Fork 1
/
Intro to AI.tex
364 lines (357 loc) · 26.7 KB
/
Intro to AI.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
\documentclass[12pt]{article}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\usepackage{csquotes}
\usepackage{hyperref}
\usepackage{xcolor}
\usepackage{listings}
\usepackage{mathtools}
\usepackage{graphicx}
\usepackage{amsmath}
\graphicspath{ {images/} }
\hypersetup{
colorlinks=true,
linkcolor=blue,
urlcolor=cyan,
}
\definecolor{myGray}{HTML}{e6e6e6}
\definecolor{myBlue}{HTML}{80D0F0}
\definecolor{myRed}{HTML}{F08080}
\begin{document}
\title{Intro To Artificial Intelligence}
\author{Olu Gbadebo}
\date{December 26th, 2016}
\maketitle
\begin{center}
{\small As taught by Sebastain Thrun and Peter Norvic\\}
\vfill
\end{center}
\setlength{\parindent}{10ex}
What is Artificial Intelligence? It is an \textbf{Intelligent Agent}. An Intelligent Agent interacts with a certain environment to which it receives data from and send some data back to. The Agent uses \textbf{sensors} to \enquote{see} the environment and extracts information from the environment's state. It then uses \textbf{actuators} to affect the state of the environment. When the Agent receives data, it goes through a \textbf{decision state} where it runs a function that analyzes the data and make a decision based on the data. The decision it makes is what the \textbf{attractors} will carry out on the environment.\par
The process of the Agent carrying out these (see, decide, act) actions repeatedly is called the \textbf{Perception Action Cycle}. In other words, the Perception Action Cycle is simply that the Agent continuously asks the environment for some data, processes the data, makes a decision and sends its decision to the environment.
\section*{Applications of A.I.}
Artificial Intelligence has been used in several different areas of life. Some of the areas that AI has been successfully used in are:
\begin{enumerate}
\item Finance
\item Robotics
\item Games
\item Medicine
\item The Web
\end{enumerate}
\subsection*{Finance}
There is a huge use of AI in financing and in making trading decision. AI in finance is called a \textbf{Trading Agent}. The environment of a trading agent might be a stock market or a bond market or a commodities market. The agent can read the news, follow certain events, and analyze datasets, which are its form of sensors. The decisions the agent can make are either to buy or sell (in other words, trade). \par
Trading agents have been used to look at data over time, and those who use trading agents have made good amount of profits.
\subsection*{Robotics}
Artificial Intelligence also has good history in robotics. AI agents in robotics can be actual robots. The agents' sensors include cameras, microphone and tactile sensor (touch). The way robots agent impact the environment is to move, touch or speak with humans.
\subsection*{Games}
AI has been used hugely in games. For example, AI has been used in chess game to play against humans. The AI agent, \textbf{Game Agent}, reads your moves, analyzes it and then makes it own moves. \par
There are two reasons of all reasons that game agents are used in games. The first is to play against you with purpose of making you lose and ultimately make you feel like a better player. The second reason is to make game play feel more natural and real in a way that characters in the games are smarter.
\subsection*{Medicine}
A \textbf{Diagnostic Agent} is used in medicine to make \enquote{diagnostic} decisions. The agent observes you by making measurements like blood pressure and heart signal, and, in most cases, communicates its decision with a doctor who then intervene the final decision. There are many other versions of diagnostics agent.
\subsection*{The Web}
A web crawler is a type of AI on the web. The crawling agent goes to the World Wide Web and retrieves web pages, arranges the web pages and then show the best web pages. This agent is widely used by companies such as Google and Microsoft.
\section*{Terminology}
Terminologies relating to environments are as follows:
\begin{enumerate}
\item Fully and Partial Observable
\item Deterministic and Stochastic
\item Discrete and Continuous
\item Benign and Adversarial
\end{enumerate}
\subsection*{Fully and Partial Observable}
An environment is considered a \textbf{Fully Observable environment} if what the agent senses at any point in time is completely sufficient to make an optimal decision. For example, in many card games where all the cards are visible, the momentary sites of all the cards is really sufficient for the AI to make a decision. On the contrary, a \textbf{Partial Observable environment} is where the AI needs some memory to make the best decision. For example, in the game of poker where the cards are hidden, the AI would need some previous plays as data to make some predictions. \par
In a Perception Action cycle, a Fully Observable environment would have an agent that has full access to the state whereas a Partially Observable environment would have an agent that has access to some parts of the state and also have memorized data.
\subsection*{Deterministic and Stochastic}
The \textbf{Deterministic} environment is where the agent's actions uniquely determine the outcome, like in chess where moves can be predetermined and the one moves affects the next. But a \textbf{Stochastic} environment where predictions can't be made based on random events, like a game that involves throwing a dice.
\subsection*{Discrete and Continuous}
In \textbf{Discrete} environment, there are finitely many action choices for the agent to make. Well in \textbf{Continuous} environment, \textit{you guessed it!}, there are infinitely many actions to make. Examples are chess game with finite positions on the board and dart with infinite many ways of throwing the dart (different angles and accelerations).
\subsection*{Benign and Adversarial}
\textbf{Benign} environment doesn't have an objective that contradicts your own objective. Its goal is not to \textit{hunt you down} like the weather which just does its thing without trying to hurt you. It thrives to achieve a goal that doesn't affect its user. Whereas an \textbf{Adversarial} environment exists to \textit{make you not exist} like in a chess game where your opponent is sweating hard to make you lose.
\section*{Problem solving techinques}
A problem in AI can be broken down into a number of components.
\begin{enumerate}
\item Initial State: This is the state that the agent starts out with.
\item Action: A function, \colorbox{myGray}{action(s)}, that takes in a state of the agent and returns a set of actions \colorbox{myGray}{\{a\textsubscript{1}, a\textsubscript{2}, a\textsubscript{3}, ... , a\textsubscript{n}\} } For some agents, they might have the same actions in all states and in other agents they might have different actions in all states.
\item Result: A function, \colorbox{myGray}{result(s, a)}, that takes in a state and an action and delivers as output a new state.
\item Goal Test: A function, \colorbox{myGray}{goaltest(s)}, which takes a state and returns a boolean stating whether the state is the goal or not.
\item Path Cost: A function, \colorbox{myGray}{pathcost(s\textsubscript{1}$\xrightarrow{a}$s\textsubscript{2}$\xrightarrow{a}$...)} which takes a sequence of state-action transactions and returns a number \colorbox{myGray}{n} which is the cost of that action.
\end{enumerate}
\par
Searches can be used in AI to solve AI problems. For example, provided a map, how can you find the best route from a starting position to a destination on the map? A search algorithm can help solve the problem. Some search algorithms used in AI are :
\begin{enumerate}
\item Tree Search
\item Breadth First Search
\item Graph Search
\item Depth First Search
\item Uniform Cost Search (Cheapest Cost)
\item A* Search
\end{enumerate}
\textit{I won't write in details what these searches are. I know I should. You can find the \href{https://classroom.udacity.com/courses/cs271}{lectures} on Udacity's Intro to AI, \enquote{Problem Solving} section}
\subsection*{Problem with problem-solving}
These problem-solving techniques only work if the following conditions are true:
\begin{itemize}
\item The environment must be \textbf{Fully Observable}: We must be able to know what initial state we start out with.
\item The environment must be \textbf{Known}: We must know the set of actions that can be performed.
\item The environment must be \textbf{Discrete}: There must be a finite actions to choose from.
\item The environment must be \textbf{Deterministic}: We have to know the result of taking an action.
\item The environment must be \textbf{Static}: There must be nothing else that can change the environment except our own actions.
\end{itemize}
\section*{Probability in AI: Bayes Network}
\textbf{Bayes Network} is a probability technique that is widely used almost in all fields of smart computing systems such as diagnostics, predictions, machine learning. It's also used in fields like finance, robotics and internal departments at Google. Bayes Network also serves as the building block for some advance AI techniques including particle filters, hidden MArkov model, MDPs, POMDPs, kalmal filters and many others.
\begin{center}
\textit{Problem in \colorbox{myRed}{red}, Causes in \colorbox{myBlue}{blue}, Diagnostic in \colorbox{myGray}{black/gray}}
\end{center}
\par
To understand Bayes Network, we will use the following example. Suppose you find out in the morning that your \colorbox{myRed}{car won't start}, there are many reasons such situation could occur. One is that your \colorbox{myBlue}{battery is flat}. For a flat battery, there are other causes such as \colorbox{myBlue}{battery is dead} or the \colorbox{myBlue}{battery is not charging}. \textit{You see the \enquote{chain reaction} here?} If the battery is not charging, it could be that the \colorbox{myBlue}{alternator is broken} or the \colorbox{myBlue}{fanbelt is broken}.\par
Analyzing the Bayes Network, you can see that there are many different causes for the car not to start. The question is, \enquote{can we diagnose the problem?}\\
\vspace{2em}
\includegraphics[width=\textwidth]{bayes_network1}
One diagnostic tool is a \colorbox{myGray}{battery meter}, which would either support your belief, or not, that the battery is the sole cause. Another reason to think it might be your battery is the \colorbox{myGray}{battery's age}. Older batteries tend to die more often. Other diagnostic approach is to inspect the \colorbox{myGray}{light}, \colorbox{myGray}{oil light}, \colorbox{myGray}{gas gauge}, and you could dip into the engine oil with a \colorbox{myGray}{dipstick}. This approach relates to other reasons why the car won't start, like \colorbox{myBlue}{no oil}, \colorbox{myBlue}{no gas}, \colorbox{myBlue}{fuel line blocked}, or \colorbox{myBlue}{broken starter}. \textit{They are all connected!} The flat battery will cause the lights not to work, and would have effect on the oil light and gas gauge. The dipstick and the oil light would indicate whether or not there is oil in the engine. And of course, the car won't start if there's no gas, which would be indicated by the gas gauge.\\
\vspace{2em}
\includegraphics[width=\textwidth]{bayes_network2}
\par
A Bayes Network is graph composed of nodes. Nodes can correspond to known or unknown events, typically called \textbf{Random Variables}. These nodes are connected by arcs, and the arcs suggests that the child of the arc is affected by the parent either in a deterministic way or in a probabilistic way. Therefore, the Bayes Network graph structure provides a probability distribution in the space of all the given variables (nodes).
\subsection*{Probabilities}
\textit{Too lazy to write}\\
\vspace{2em}
\includegraphics[width=\textwidth]{probs1}
\vspace{2em}
\includegraphics[width=\textwidth]{probs2}
\vspace{2em}
\includegraphics[width=\textwidth]{probs3}
\vspace{2em}
\includegraphics[width=\textwidth]{probs4}
\vspace{2em}
\includegraphics[width=\textwidth]{probs5}
\vspace{2em}
\textit{I have to write something now!}\par
If for some certain variables \(x\) and \(y\), there are two possible events \(H \mid T\) for each (as in \(x (H), x (T), y (H), y (T)\) ). Provided the probability of first event of \(x\) is given \(p ( x (H) ) = 0.5\), and probability of the sequence of possible events is also given, ex: \( p ( y (T) \mid x (H) ) = 0.9\) \textit{read: the probability of \(y\) being \(T\) following \(x\) being \(H\) is 0.9}. The probability of the second event of \(x\) is the sum of the multiplication of the \colorbox{myRed}{probability of the second event of \(x\) following the first event of \(y\)} and the \colorbox{myBlue}{probability of the first event of \(y\)} and the multiplication of the \colorbox{myRed}{probability of the second event of \(x\) following the second event of \(y\)} and the \colorbox{myBlue}{probability of the second event of \(y\)}.
\begin{equation}
p(x(T)) = \colorbox{myRed}{p(x(T) $\mid$ y(H))} \times \colorbox{myBlue}{p(y(H))} + \colorbox{myRed}{p(x(T) $\mid$ y(T))} \times \colorbox{myBlue}{p(y(T))}
\end{equation}
\textit{I'm so confused!}\\
\subsection*{Bayes Rule}
\begin{equation}
P(A \mid B) = \frac{P(B \mid A) \times P(A)}{P(B)}
\end{equation}
\colorbox{myGray}{P(A $\mid$ B)} is Posterior: the probability of A given B, where A is the cause (variable we care about) and B is the evidence\\
\colorbox{myGray}{P(B $\mid$ A)} is Likelihood: provided the A, the probability of B\\
\colorbox{myGray}{P(A)} is Prior: probability of A\\
\colorbox{myGray}{P(B)} is Marginal Likelihood: probability of B, also the Total Probability\\
\vspace{2em}
Total Probability \colorbox{myGray}{P(B)} = $\sum_{a}^{}P(B \mid A = a) \times P(A = a)$\\
\vspace{2em}
Graphically,\\
\includegraphics[width=\textwidth]{bayesgraph}\\
\vspace{4em}
\textit{You don't get it? \href{https://classroom.udacity.com/courses/cs271/lessons/48624746/concepts/487193570923}{Go here}}\\
\textit{See that P(B) in the denominator of Bayes' Rule? Yeah! Let's get rid of it cos why not?}
\par
We know that P(A $\mid$ B) + P( $\neg$ A $\mid$ B) = 1 and P(B) serves as the \enquote*{normalizer} in the Bayes' Rule. So, using psuedo probabilities, (P'), that are non-normalized: \colorbox{myGray}{P'(A $\mid$ B) = P(B $\mid$ A) $\times$ P(A)} and P'($\neg$A $\mid$ B) = P(B $\mid$ $\neg$A) $\times$ P($\neg$A), Bayes Rule is the product of the psuedo probability and the normalizer, $\eta$ ( pronounced \enquote*{eta} ) = \colorbox{myGray}{( P'(A $\mid$ B) + P'($\neg$ A $\mid$ B) )\textsuperscript{-1}}:
\begin{equation}
\colorbox{myGray}{P(A $\mid$ B) = $\eta$ P'(A $\mid$ B)}
\end{equation}
\vspace{2em}
Let's use this new equation in an example:\\
\vspace{2em}
\includegraphics[width=\textwidth]{probs6}\\
Given the information in the graphic above, find P(C $\mid$ T\textsubscript{1} = + T\textsubscript{2} = $-$).
\begin{center}
\begin{tabular}{ |c |c |c |c |c|}
& P(C) & + & $-$ & P'\\
C & 0.01 & 0.9 & 0.1 & 0.0009 \\
$\neg$ C & 0.99 & 0.2 & 0.8 & 0.1584\\
& & & & $\eta$ = 0.1593\textsuperscript{-1}
\end{tabular}
\end{center}
\(P(C \mid +, -) = \eta\;P'(C \mid +, -)\)\\\\
\(P'(C \mid +, -) = P(C \mid +) \times P(C \mid -)\)\\\\
\(P'(C \mid +, -) = ( P(C \mid +) \times P(C) ) \times ( P(C \mid -) \times P(C) )\)\\\\
\(P'(C \mid +, -) = P(C \mid +) \times P(C \mid -) \times P(C)\)\\\\
\(P'(C \mid +, -) = 0.9 \times 0.1 \times 0.01 = 0.0009\)\\\\
\(\eta = ( P'(C \mid +, -) + P'(\neg C \mid +, -) )\textsuperscript{-1}\)\\\\
\(\eta = (0.0009 + 0.1584)\textsuperscript{-1} = 6.2775\)\\\\
\vspace{2em}
Then \(P(C \mid +, -) = 6.2775 \times 0.0009 = 0.0056\)
\par
\subsection*{Conditional Independence}
In Bayes' network, given three causes \colorbox{myBlue}{A}, \colorbox{myBlue}{B} and \colorbox{myBlue}{C},\\
\vspace{2em}
\includegraphics[width=\textwidth]{probs7}\\
if A is a known variable i.e. the value of A is certain, then B is \textbf{conditionally independent} of C, (B $\perp$ C $\mid$ A). For example, the probability of C given A and B, provided that the value of A is known, is equivalent to the probability of C given A:\\
\begin{equation}
\colorbox{myGray}{\(P(C \mid A,B) = P(C \mid A)\)}
\end{equation}
\subsection*{Explaining Away}
\vspace{2em}
\includegraphics[width=\textwidth]{probs9}
Given an Effect that could be caused by two or more Causes, if one of the Causes is known to have lead to the Effect then it is less likely that the other Causes lead to the Effect. For example, say a \colorbox{myBlue}{sunny day} or a \colorbox{myBlue}{raise} makes Sebastian \colorbox{myRed}{happy}; if Sebastian happens to be \colorbox{myRed}{happy} on a \colorbox{myBlue}{sunny day} then it is less likely that a \colorbox{myBlue}{raise} is the reason for Sebastian's happiness.\\
\vspace{2em}
Let's do an example,\\
\vspace{2em}
\includegraphics[width=\textwidth]{probs8}
Given the above information, what is the probability that of a raise given that
\vspace{2em}Sebastian is happy and it's sunny \(P(R\mid H, S)\)?\\
\textit{Using Bayes' Rule\\}
\begin{equation}
P(R\mid H, S) = \frac{P(H\mid R, S) \times P(R\mid S)}{P(H\mid S)}
\end{equation}
\textit{Using Conditional Independence\\}
\begin{equation}
P(R\mid H, S) = \frac{P(H\mid R, S) \times P(R)}{P(H\mid S)}
\end{equation}
\textit{Using Total Probability\\}
\begin{equation}
P(R\mid H, S) = \frac{P(H\mid R, S) \times P(R)}{P(H\mid R, S) \times P(R) + P(H\mid \neg R, S) \times P(\neg R)}
\end{equation}
\begin{equation}
P(R\mid H, S) = \frac{1 \times 0.01}{1 \times 0.01 + 0.7 \times 0.99} = 0.0142
\end{equation}
Now, let's do the opposite. Find the probability of a raise given that Sebastian is happy \(P(R \mid H)\).\\
First, we need to compute \(P(H)\):
\begin{align*}
P(H) & = P(H \mid S, R) \times P(S \mid R)\\
&+ P(H \mid \neg S, R) \times P(\neg S \mid R)\\
&+ P(H \mid S, \neg R) \times P(S \mid \neg R)\\
&+ P(H \mid \neg S, \neg R) \times P(\neg S \mid \neg R)\\
&= 0.5245
\end{align*}
\textit{Using Bayes' Rule}
\begin{equation}
P(R \mid H) = \frac{P(H \mid R) \times P(R)}{P(H)}
\end{equation}
\textit{Using Total Probability}
\begin{align*}
P(H \mid R) &= P(H \mid R,S) \times P(S) + P(H \mid R, \neg S) \times P(\neg S)\\
&= 0.97
\end{align*}
\(P(R \mid H) = \frac{0.97 \times 0.01}{0.5245} = 0.0185\)\\\\
Let's observe the situation where Sebastian is happy and it's not sunny. What's the probability of a raise \(P(R \mid H, \neg S)\)?\\
\begin{align*}
P(R \mid H, \neg S) &= \frac{P(H \mid R, \neg S) \times P(R, \neg S)}{P(H, \neg S)}\\
&= \frac{0.9 \times 0.01}{1 \times 0.01 + 0.7 \times 0.99}\\
&= 0.0833\\
\end{align*}
Bringing all three situations together:\\
\begin{align*}
P(R \mid H, S) &= 0.0142\\
P(R \mid H) &= 0.0185\\
P(R \mid H, \neg S) &= 0.0833\\
\end{align*}
\subsection*{Conditional Dependence}
\vspace{2em}
The previous section introduced a situation where two Causes lead to one Effect.\\
\vspace{2em}
\includegraphics[width=\textwidth]{probs9}
Given that \(H\) is unknown, then \(S\) and \(R\) are independent. They do not affect each other, which is why \(P(R \mid S)\) is the same as \(P(R)\). However, if \(H\) is known then \(S\) and \(R\) are not independent anymore but \textbf{conditionally dependent}.\\
Note the difference between \textbf{independence}, \textbf{conditional independence}, and \textbf{conditional dependence}.\par
\subsection*{General Bayes Net}
\includegraphics[width=\textwidth]{probs10}
With Bayes' network, finding the probability of all nodes in a network is significantly better that without using Bayes' network. The probability of all nodes \(P(A, B, C, D, E)\) is as follows:
\begin{equation}
P(A, B, C, D, E) = P(A) \times P(B) \times P(C \mid A,B) \times P(D \mid C) \times P(E \mid C)
\end{equation}
The number of values of the probability of each node in the network is determined by the number of \enquote{incoming arcs} the node has. For the graphics above, \(A\) and \(B\) have zero incoming arcs, \(C\) has two incoming arcs and \(D\) and \(E\) have one incoming arcs.\\
\begin{align*}
the\;\#\;of\;P(X) &= 2^{k}\;\;where\;k\;is\;\#\;of\;incoming\;arcs\\
for\;A,\;P\textsubscript{n}(A) &= 2^{0}\\
&= 1 : ( P(A) )\\
for\;C,\;P\textsubscript{n}(C) &= 2^{2}\\
&= 4 : \\
& ( P(C \mid A, B),\\
& P(C \mid A, \neg B),\\
& P(C \mid \neg A, B),\\
& P(C \mid \neg A, \neg B) )\\
\end{align*}
\subsection*{D Separation a.k.a Reachability}
D Separation can be understood with the concepts of \textbf{Active Triplets} and \textbf{Inactive Triplets} where both concepts apply some rules to a sequence of three variables in a network. Active Triplets renders variables dependent, on the other hand, Inactive Triplets renders variables independent.\\\\
\textit{A = first node, B = middle node, C = last node, D = extended node}
\begin{center}
\begin{tabular}{|c|c|c|c|}
\hline
Active Triplets
&
&
Inactive Triplets
& \\ \hline
%row 2
\raisebox{-\totalheight}{\includegraphics[width=0.25\textwidth]{probs11}}
&
\parbox[c]{0.25\textwidth}{A and C are dependent when \textbf{B is unknown}.}
& \raisebox{-\totalheight}{\includegraphics[width=0.25\textwidth]{probs12}}
& \parbox[c]{0.25\textwidth}{A and C are independent when \textbf{B is known}.}\\[5em] \hline
%row 3
\raisebox{-\totalheight}{\includegraphics[width=0.25\textwidth]{probs13}}
&
\parbox[c]{0.25\textwidth}{A and C are dependent when \textbf{B is unknown}.}
& \raisebox{-\totalheight}{\includegraphics[width=0.25\textwidth]{probs14}}
& \parbox[c]{0.25\textwidth}{A and C are independent when \textbf{B is known}.}\\[5em] \hline
%row 4
\raisebox{-\totalheight}{\includegraphics[width=0.25\textwidth]{probs15}}
&
\parbox[c]{0.25\textwidth}{A and C are dependent when \textbf{B is known}.}
& \raisebox{-\totalheight}{\includegraphics[width=0.25\textwidth]{probs16}}
& \parbox[c]{0.25\textwidth}{A and C are independent when \textbf{B is unknown}.}\\[5em] \hline
%row 5
\raisebox{-\totalheight}{\includegraphics[width=0.25\textwidth]{probs17}}
&
\parbox[c]{0.25\textwidth}{A, B and C are dependent when \textbf{D is known}.}
& &\\
\hline
\end{tabular}
\end{center}
\section*{Probabilistic Inference}
\vspace{2em}
This section will focus on the how to apply Bayes' rule to answering probability questions.\\
\vspace{2em}
\includegraphics[width=\textwidth]{inference1}
Given the above graphic, one possible probabilistic question is: "what are the possible outputs given some inputs?". In this case, we can we say "given B and E, what are the outputs, J and M?".\\
\par
In probabilistic inference, an \enquote*{input} is referred to as \textbf{evidence} and an \enquote*{output} is referred to as \textbf{query}. An Evidence could be a variable that we know the values of and a Query could be a variable we're trying to find the value of. Another term in probabilistic inference is \textbf{hidden} which is a variable that we know and won't calculate what is value is. An hidden is neither an Evidence or a Query. A Query variable \textbf{is not} a single number but rather a probability distribution. \\
\par
The \enquote*{output} of a probabilistic inference is a joint distribution of all the Query variables: probability of one or more Query variables given one or more Evidence variables where each Evidence is a single value \(P(Q\textsubscript{1}, Q\textsubscript{2} ... \mid E\textsubscript{1} = e\textsubscript{1}, E\textsubscript{2} = e\textsubscript{2} ... )\). Another helpful formula is: out of the possible Query values for all the Query variables, which combination of values has the highest probability? \(argmax\textsubscript{q}\;P(Q\textsubscript{1} = q\textsubscript{1}, Q\textsubscript{2} = q\textsubscript{2} ... \mid E\textsubscript{1} = e\textsubscript{1}, E\textsubscript{2} = e\textsubscript{2} ... )\)\\
\subsection{Enumeration}
\par
Enumeration is a function that goes thru all possible probabilities in a certain network and comes up with one answer.\\
Let's explore Enumeration by stating the following problem, what's the probability that a burglary (B) alarm occurred given that John (J) and Mary (M) called \textit{the police, I guess}, \(P(+b\mid +j, +m)\).\\\\
\textit{Using Conditional Probability}
\begin{align*}
P(+b\mid +j, +m) &= \frac{\textit{joint probability}}{\textit{conditionalized variable}}\\
&= \frac{P(+b, +j, +m)}{P(+j, +m)}\\
\textit{Note:}\;P(E = true) &\equiv P(+e)\\
&\equiv 1 - P(\neg e)
\end{align*}
You would notice that, with Enumeration, we converted a conditional probability \(P(+b\mid +j, +m)\) to an unconditional probability. Now, we would expand the unconditional probabilities starting with the numerator. The expansion is done by enumerating all the atomic probabilities and calculating the sum of products, i.e. enumerate all possible values of the Hidden variables and find the product of probabilities of all atomic variables.
\begin{align*}
P(+b, +j, +m) &= \sum_{e} \sum_{a} P(+b, +m, +j, e, a)\\
\end{align*}
\(P(+b, +m, +j, e, a)\) can be calculated based on the parent of each node.\\\\
\(P(+b, +m, +j, e, a) = P(+b) \times P(+m \mid a) \times P(+j \mid a) \times P(e) \times P(a \mid +b, e)\)\\\\
Since \(e\) and \(a\) both have two possible values, true or false, i.e. there's either an earthquake or there isn't and alarm is either on or off.
Let \(P(+b, +m, +j, e, a)\) be \(f(e, a)\)\\
\begin{align*}
P(+b, +j, +m) &= \textit{summation of all possible values of \(e\) and \(a\)}\\
&= f(+e, +a) + f(+e, \neg a) + f(\neg e, +a) + f(\neg e, \neg a)
\end{align*}
The values for each probability are provided below.\\
\includegraphics[width=\textwidth]{inference2}
\subsection{Speeding up Enumeration}
For our example, there are only 5 variables. If all 5 variables are Hidden variables, there will be only 32 rows (\(f(b, e, a, j, m)\)) to sum up. In the practical world, there are more complex networks that could lead to really large rows to sum up. Therefore, we would have to come up with methods that are faster than simple Enumeration.
\subsection{Pulling out terms}
The first technique we can use to speed-up inference in Bayes Network is to pull out terms from enumeration.\\\\
Recall, \(\sum_{e}\sum_{a}P(+b) \times P(+m \mid a) \times P(+j \mid a) \times P(e) \times P(a \mid +b, e)\),\\\\
we can pull out \(P(+b)\) since its value stays the same for all rows.\\\\
\(\Rightarrow P(+b) \sum_{e}\sum_{a} P(+m \mid a) \times P(+j \mid a) \times P(e) \times P(a \mid +b, e)\)\\\\
we can also pull out \(P(e)\) one level of summation since its value stays the same for the outer summation\\\\
\(\Rightarrow P(+b) \sum_{e} P(e) \sum_{a} P(+m \mid a) \times P(+j \mid a) \times P(a \mid +b, e)\)\\
\subsection{Maximize Independence}
Maximizing independence depends on the structure of the Bayes' Network. For a linear structured Bayes' Network, it would take \(\bigcirc(n)\) if all \(n\) variables are boolean.\\\\
\vspace{2em}
\includegraphics[width=\textwidth]{inference3}
For a structure where every node points to every other node, it could take \(\bigcirc(2^{n})\) if all \(n\) variables are boolean.\\\\
\vspace{2em}
\includegraphics[width=\textwidth]{inference4}
\subsection{Causal direction}
Bayes' Network tend to be more compact and thus easier to perform inference on when the network flows from causes to effects. Our example can be drawn in causal direction as shown in the graphic below.\\
\includegraphics[width=\textwidth]{inference5}
\end{document}