started mkdocs

MKLab-ITI · Sep 6, 2024 · db0848a · db0848a
1 parent ae2f285
commit db0848a
Show file tree

Hide file tree

Showing 16 changed files with 817 additions and 14 deletions.
diff --git a/.readthedocs.yml b/.readthedocs.yml
@@ -0,0 +1,13 @@
+version: 2
+
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.9"
+
+python:
+   install:
+   - requirements: docs/requirements.txt
+
+mkdocs:
+  configuration: mkdocs.yml
diff --git a/JGNN/src/examples/classification/LogisticRegression.java b/JGNN/src/examples/classification/LogisticRegression.java
@@ -7,13 +7,16 @@
 import mklab.JGNN.adhoc.train.SampleClassification;
 import mklab.JGNN.core.Matrix;
 import mklab.JGNN.nn.Model;
+import mklab.JGNN.nn.initializers.XavierNormal;
 import mklab.JGNN.nn.loss.Accuracy;
 import mklab.JGNN.nn.loss.BinaryCrossEntropy;
+import mklab.JGNN.nn.loss.report.VerboseLoss;
 import mklab.JGNN.core.Slice;
 import mklab.JGNN.core.Tensor;
 import mklab.JGNN.core.matrix.DenseMatrix;
 import mklab.JGNN.core.tensor.DenseTensor;
 import mklab.JGNN.nn.optimizers.GradientDescent;
+import mklab.JGNN.nn.optimizers.Adam;
 
 /**
  * Demonstrates classification with logistic regression.
@@ -45,18 +48,22 @@ public static void main(String[] args) {
 
 
 		long tic = System.currentTimeMillis();
-		Model model = new SampleClassification()
+		ModelTraining trainer = new SampleClassification()
+				.setFeatures(dataset.features())
+				.setOutputs(dataset.labels())
+				.setTrainingSamples(nodeIds.range(0, 0.6))
+				.setValidationSamples(nodeIds.range(0.6, 0.8))
 				.setOptimizer(new GradientDescent(0.01))
 				.setEpochs(600)
+				.setOptimizer(new Adam(0.01))
 				.setNumBatches(10)
 				.setParallelizedStochasticGradientDescent(true)
 				.setLoss(new BinaryCrossEntropy())
-				.setValidationLoss(new Accuracy())
-				.setVerbose(true)
-				.train(modelBuilder.getModel(), 
-						dataset.features(), 
-						dataset.labels(), 
-						nodeIds.range(0, 0.6), nodeIds.range(0.6, 0.8));
+				.setValidationLoss(new VerboseLoss(new Accuracy()));
+
+		Model model = modelBuilder.getModel()
+				.init(new XavierNormal())
+				.train(trainer);
 		long toc = System.currentTimeMillis();
 
 		double acc = 0;

diff --git a/docs/index.html b/docs/index.html
@@ -451,6 +451,14 @@ <h3 id="modelbuilder">3.1. ModelBuilder</h3>
 	to declare several inputs and outputs. Inputs need to be only one symbol, but a whole expression
 	for evaluation can be declared in outputs.
 	</p>
+
+	<pre><code class="language-java">ModelBuilder modelBuilder = new ModelBuilder()
+	.var("x")
+	.operation("y = log(2*x+1)")
+	.out("y");
+System.out.println(model.predict(Tensor.fromDouble(2)));
+</code></pre>
+
 	<p>
 	The operation parses string expressions that are typically structured 
 	as assignments to symbols; the right-hand side of assignments accepts several operators and functions that 
@@ -468,13 +476,6 @@ <h3 id="modelbuilder">3.1. ModelBuilder</h3>
 	<a href="#neuralang">section 3.3</a>.
 	</p>
 
-	<pre><code class="language-java">ModelBuilder modelBuilder = new ModelBuilder()
-	.var("x")
-	.operation("y = log(2*x+1)")
-	.out("y");
-System.out.println(model.predict(Tensor.fromDouble(2)));
-</code></pre>
-
 	<p>Model definitions have so far been too simple to be employed in practice;
 	we need trainable parameters, which are created inline with the <code>matrix</code>
 	and <code>vector</code> functions. There is also an equivalent Java

diff --git a/docs/index.md b/docs/index.md
@@ -0,0 +1,6 @@
+# JGNN
+
+Graph Neural Networks (GNNs) are getting more and more popular as a machine learning paradigm, for example to make predictions based on relational information, or to perform inference on small datasets. JGNN is a library that provides cross-platform implementations of this paradigm without the need for dedicated hardware or firmware; create highly portable models that fit and are trained in a few megabytes of memory. Find GNN builders, training strategies, and datasets for out-of-the-box experimentation.
+
+While reading this guide, keep in mind that this is not a library for running computationally intensive stuff; it has no GPU support and we do not plan to add any (unless such support becomes integrated in the Java virtual machine). So, while source code is highly optimized and complex architectures are supported, running them quickly on graphs with many nodes may require compromises in the number of learned parameters or running time.
+
diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -0,0 +1,95 @@
+# Quickstart
+
+Here we demonstrate usage of JGNN for node classification. This is an inductive learning task that predicts node labels given a graph's structure, node features, and some already known labels. Classifying graphs is also supported, though it is harder to explain and set up.
+
+GNN architectures for node classification are typically written as message-passing mechanisms; they diffuse node representations across edges, where node neighbors pick up, aggregate (e.g., average), and transform incoming representations to update theirs. Alternatives that boast higher expressive power also exist and are supported, but simple architectures may be just as good or better than complex alternatives in solving practical problems ([Krasanakis et al., 2024](https://www.mdpi.com/2076-3417/14/11/4533)). Simpler architectures also enjoy reduced resource consumption.
+
+## Node classification GNN
+
+Our demonstration starts by loading the `Cora` dataset from those shipped with the library for out-of-the-box experimentation. The first time an instance of this dataset is created, it downloads its raw data from a web resource and stores them in a local `downloads/` folder. The data are then loaded into a sparse graph adjacency matrix, a dense node feature matrix, and a dense node label matrix.
+
+Sparse and dense representations are interchangeable in terms of operations, with the main difference being that sparse matrices are much more efficient when they contain lots of zeros. JGNN automatically determines the types of intermediate representations, so focus only on choosing input and desired output data formats. In the loaded matrices, each row contains the corresponding node's neighbors, features, or one-hot encoding of labels. We apply the renormalization trick and symmetric normalization on the dataset's adjacency matrix using in-place operations for minimal memory footprint. The first of the two makes GNN computations numerically stable by adding self-loops to all nodes, while renormalization is required by spectral-based GNNs, such as the model we implement next.
+
+```java
+Dataset dataset = new Cora();
+dataset.graph().setMainDiagonal(1).setToSymmetricNormalization();
+```
+
+We incrementally create a trainable model using symbolic expressions that resemble math notation. The expressions are part of a scripting language, called Neuralang, that is covered in the namesake [language tutorial](tutorial/neuralang.md). For faster onboarding, stick to the `FastBuilder`, which omits some of the language's features in favor of providing programmatic shortcuts for boilerplate code. Its constructor accepts two arguments `A` and `h0`, respectively holding the graph's adjacency matrix and node features. These arguments are set as constant symbols that parsed expressions can use. Other constants and input variables can be set afterwards, but more on this later. After instantiation, use some builder methods to declare a model's data flow. Some of these methods parse the aforementioned expressions.
+
+- **`config`** - Configures hyperparameter values. These can be used in all subsequent function and layer declarations.
+- **`function`** - Declares a Neuralang function, in this case with inputs `A` and `h`.
+- **`layer`** - Declares a layer that can use built-in and Neuralang functions. In this, the symbols `{l}` and `{l+1}` specifically are replaced by a layer counter.
+- **`classify`** - Adds a softmax layer tailored to classification. This also silently declares an input `nodes` that represents a list of node indices where the outputs should be computed.
+- **`autosize`** - Automatically sizes matrix and vector dimensions that were originally denoted with a question mark `?`. This method requires some input example, and here we provide a list of node identifiers, which we also make dataless (have only the correct dimensions without allocating memory). This method also checks for integrity errors in the declared architecture, such as computational paths that do not lead to an output.
+
+JGNN promotes method chains, where the builder's instance is returned by each of its methods to access the next one. Below, we use this programming pattern to implement the Graph Convolutional Network (GCN) architecture ([Kipf and Welling, 2017](https://arxiv.org/abs/1609.02907)). Details on the symbolic parts of definitions are presented later but, for the time being, we point to the `matrix` and `vector` functions. These declare inline some trainable parameters for given dimensions and regularization. Access the created model via `modelBuilder.getModel()`.
+
+```java
+long numSamples = dataset.samples().getSlice().size();
+long numClasses = dataset.labels().getCols();
+ModelBuilder modelBuilder = new FastBuilder(dataset.graph(), dataset.features())
+    .config("reg", 0.005)
+    .config("classes", numClasses)
+    .config("hidden", 64)
+    .function("gcnlayer", "(A,h){Adrop = dropout(A, 0.5); return Adrop@(h@matrix(?, hidden, reg))+vector(?);}")
+    .layer("h{l+1}=relu(gcnlayer(A, h{l}))")
+    .config("hidden", "classes")  // reassigns the output gcnlayer's "hidden" to be equal to the number of "classes"
+    .layer("h{l+1}=gcnlayer(A, h{l})")
+    .classify()
+    .autosize(new EmptyTensor(numSamples));
+```
+
+
+## Training the model
+
+Training epochs for the created model can be implemented manually, by passing inputs, obtaining outputs, computing losses, and triggering backpropagation on an optimizer. These steps could require lengthy Java code, especially if features like batching or threading parallelization are employed. So, JGNN automates common training patterns by extending a base `ModelTraining` class with training strategies tailored to different data formats and predictive tasks. You can find these subclasses in the [adhoc.train](https://mklab-iti.github.io/JGNN/javadoc/mklab/JGNN/adhoc/train/package-summary.html) package's Javadoc.
+
+Instances of model trainers use a method chain notation to set their parameters. Parameters typically include training and validation data (which should be set first and depend on the model training class) and aspects of the training strategy such as the number of epochs, patience for early stopping, the optimizer used, and loss functions. An example is presented below:
+
+For training, the graph adjacency matrix and node features are already declared as constants by the `FastBuilder` constructor, since node classification takes place on the same graph with fully known node features. Therefore, input features are represented as a column of node identifiers, which the `classify` method uses to gather predictions for respective nodes. Architecture outputs are softmax approximations of the one-hot encodings of respective node labels.
+
+The simplest way to handle missing labels for test data without modifying the example is to leave their one-hot encodings as zeroes only. Additionally, this particular training strategy accepts training and validation data slices, where slices are lists of integer entries pointing to rows of inputs and outputs.
+
+To complete the training setup, the example uses the `Adam` optimization algorithm with a learning rate of _0.01_ and trains over multiple epochs with early stopping. A verbose loss function prints the progress of cross-entropy and accuracy every 10 epochs on the validation data, using cross-entropy for the early stopping criterion. To run a full training process, pass a strategy to a model.
+
+In a cold start scenario, apply a parameter initializer before training begins. A warm start that resumes training from previously trained outcomes would skip this step. Selecting an initializer is not part of the training strategy to emphasize its model-dependent nature; dense layers should maintain the expected input variances in the output before the first epoch, and therefore the initializer depends on the type of activation functions used.
+
+```java
+Slice nodes = dataset.samples().getSlice().shuffle(); // A permutation of node identifiers
+Matrix inputFeatures = Tensor.fromRange(nodes.size()).asColumn(); // Each node has its identifier as an input (equivalent to: nodes.samplesAsFeatures())
+ModelTraining trainer = new SampleClassification()
+    // Set training data
+    .setFeatures(inputFeatures)
+    .setLabels(dataset.labels())
+    .setTrainingSamples(nodes.range(0, 0.6))
+    .setValidationSamples(nodes.range(0.6, 0.8))
+    // Set training strategy
+    .setOptimizer(new Adam(0.01))
+    .setEpochs(3000)
+    .setPatience(100)
+    .setLoss(new CategoricalCrossEntropy())
+    .setValidationLoss(new VerboseLoss(new CategoricalCrossEntropy(), new Accuracy()).setInterval(10));  // Print every 10 epochs
+
+Model model = modelBuilder.getModel()
+    .init(new XavierNormal())
+    .train(trainer);
+```
+
+## Save and inference
+
+Trained models and their generating builders can be saved and loaded. The next snippet demonstrates how raw predictions can also be made. During this process, some matrix manipulation operations obtain transparent access to parts of the input and output data of the dataset. This access does not copy any data.
+
+```java
+modelBuilder.save(Paths.get("gcn_cora.jgnn")); // Needs a Path as an input
+Model loadedModel = ModelBuilder.load(Paths.get("gcn_cora.jgnn")).getModel(); // Loading creates a new model builder from which to get the model
+
+Matrix output = loadedModel.predict(Tensor.fromRange(0, nodes.size()).asColumn()).get(0).cast(Matrix.class);
+double acc = 0;
+for (Long node : nodes.range(0.8, 1)) {
+    Matrix nodeLabels = dataset.labels().accessRow(node).asRow();
+    Tensor nodeOutput = output.accessRow(node).asRow();
+    acc += nodeOutput.argmax() == nodeLabels.argmax() ? 1 : 0;
+}
+System.out.println("Acc\t " + acc / nodes.range(0.8, 1).size());
+```
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -0,0 +1,6 @@
+mkdocs >= 1.2.2
+mkautodoc
+pymdown-extensions
+mkdocs-material-extensions
+mkdocs-material
+mkdocstrings
diff --git a/docs/setup.md b/docs/setup.md
@@ -0,0 +1,23 @@
+# Setup
+
+The simplest way to set up JGNN is to download it as a JAR package [release](https://github.com/MKLab-ITI/JGNN/releases) and add it your Java project's build path. Those working with Maven or Gradle can instead add JGNN's latest nightly release as a dependency from its JitPack distribution. Follow the link below, and click on "get it" on a particular version for full instructions. If you are the first person using the release, you might need to wait a little (less than a minute) until Jitpack finishes packaging it for everybody.
+
+[![download JGNN](https://jitpack.io/v/MKLab-ITI/JGNN.svg)](https://jitpack.io/#MKLab-ITI/JGNN)
+
+For example, the fields in the snippet below may be added in a Maven `pom.xml` file to work with the latest nightly release. Replace `SNAPSHOT` with the release name found in the button above.
+
+```xml
+<repositories>
+    <repository>
+        <id>jitpack.io</id>
+        <url>https://jitpack.io</url>
+    </repository>
+</repositories>
+<dependencies>
+    <dependency>
+        <groupId>com.github.MKLab-ITI</groupId>
+        <artifactId>JGNN</artifactId>
+        <version>SNAPSHOT</version>
+    </dependency>
+</dependencies>
+```
diff --git a/docs/theme_extend.css b/docs/theme_extend.css
@@ -0,0 +1,96 @@
+.gemoji {
+  height: 1em;
+  width: 1em;
+  vertical-align: -0.15em;
+}
+
+.doc {
+  padding-left: 20px;
+  margin-bottom: 30px;
+  margin-top: -15px;
+  border-left: 5px solid rgba(230, 230, 230);
+}
+
+.explain {
+  margin-top: 10px;
+}
+
+.wy-side-nav-search {
+    background-color: #d2d2d2; /* Replace with your desired color code */
+}
+
+.parameters {
+    background-color: #D0D0D0;
+    border: none;
+    padding: 0 4px;
+    text-align: center;
+    text-decoration: none;
+    display: inline-block;
+    color: #007777;
+    border-radius: 6px;
+    margin-top: 6px;
+}
+
+.component {
+    background-color: #CC5555;
+    border: none;
+    color: white;
+    padding: 2px 10px;
+    text-align: left;
+    text-decoration: none;
+    display: inline-block;
+    width: 100%;
+    font-size: 24px;
+    margin: 0 2px 0 2px;
+    border-radius: 2px;
+}
+
+.code-block {
+    margin-top: -10px;
+}
+
+.card {
+  width: 18rem;
+  border: 1px solid #ddd;
+  border-radius: 10px;
+  box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
+}
+
+.card-body {
+  padding: 20px;
+}
+
+.card-title {
+  font-weight: bold;
+}
+
+.card-text {
+  color: #888;
+  font-size: 14px;
+}
+
+.card-link {
+  display: inline-block;
+  margin-top: 10px;
+  padding: 10px 15px;
+  background-color: #007bff;
+  color: #fff;
+  text-decoration: none;
+  border-radius: 5px;
+}
+
+.card-link:hover {
+  background-color: #0056b3;
+}
+
+@media (max-width: 768px) {
+    .card-container {
+        flex-direction: column;
+        align-items: center;
+    }
+
+    .card {
+        width: 90%; /* Adjust width as needed */
+        margin-bottom: 10px;
+    }
+}
diff --git a/docs/topics/advanced.md b/docs/topics/advanced.md
@@ -0,0 +1,21 @@
+# Advanced practices
+
+Several methods have been proposed as improvements to the basic message passing scheme. However, they tend to provide marginal accuracy improvements at the cost of increased computational complexity. For large graphs, it's best to avoid complex architectures since JGNN is designed to be lightweight and does not leverage GPU acceleration. Nevertheless, JGNN supports the following enhancements, which can be useful in scenarios where runtime is less critical (e.g., transfer learning, stream learning) or for analyzing smaller graphs:
+
+- **Edge dropout**: Apply dropout to the adjacency matrix on each layer using `.layer("h{l+1}=dropout(A,0.5) @ h{l}")`. This operation disables certain caching optimizations under the hood.
+
+- **Heterogeneity**: Some recent approaches account for high-pass frequency diffusion by including the graph Laplacian. This can be inserted into the architecture as a constant, for example: `.constant("L", adjacency.negative().cast(Matrix.class).setMainDiagonal(1))`.
+
+- **Edge attention**: Computes new edge weights by taking the dot product of edge nodes using the formula `A.(h^T h)`, where `A` is a sparse adjacency matrix, the dot `.` represents the Hadamard product (element-wise multiplication), and `h` is a dense matrix containing node representations. JGNN efficiently implements this operation using the Neuralang function `att(A, h)`. For example, to create weighted adjacency matrices for each layer in gated attention networks: `.operation("A{l} = L1(nexp(att(A, h{l})))")`.
+
+- **General message passing**: JGNN supports a fully generalized message-passing scheme for more complex relational analyses, such as those described by [Velickovic, 2022](https://arxiv.org/pdf/2202.11097.pdf). In this generalization, each edge transforms and propagates representations to node neighbors. You can create message matrices by gathering features from edge source and destination nodes. To obtain edge source indexes, use `src=from(A)`, and for destination indexes, use `dst=to(A)` where `A` is the adjacency matrix. Then use the horizontal concatenation operation `|` to combine node features. After constructing messages, any ad-hoc processing can be applied using traditional matrix operations. Make sure to define the correct matrix sizes for dense transformations, such as doubling the number of columns in `h{l}`. For any `LayeredBuilder`, ensure that `message{l}` is used to obtain a message from `h{l}` that is not shared with future layers. Receiver mechanisms usually perform some form of reduction on messages, which JGNN implements via summation. This reduction has the same expressive power as maximum-based reduction but is easier to backpropagate through. Perform this as follows:
+
+```java
+modelBuilder
+    .operation("src=from(A)")
+    .operation("dst=to(A)")
+    .operation("message{l}=h{l}[src] | h{l}[dst]") // two times the number of h{l}'s features
+    .operation("transformed_message{l}=...") // apply transformations
+    .operation("received{l}=reduce(transformed_message{l}, A)");
+```
+