diff --git a/docs/qcl.rst b/docs/qcl.rst
new file mode 100644
index 0000000..5b04af5
--- /dev/null
+++ b/docs/qcl.rst
@@ -0,0 +1,250 @@
+Quantum-Circuit-Learning (QCL)
+=====================================
+
+Overview
+--------
+
+The Quantum-Circuit-Learning  (QCL) [`1 <https://arxiv.org/abs/1803.00745>`_] is a quantum/classical hybrid algorithm that aims to perform supervised or unsupervised learning tasks. In supervised tasks, the algorithm is supplied with data :math:`\{x_i\}` and corresponding labels :math:`\{y_i\}`. Then, the algorithm learns to output :math:`\{f(x_i,\theta)\}` which is as close as possible to testing labels, not used in training. In this hybrid algorithm a quantum subroutine calculates the output :math:`\{f(x_i,\theta)\}`, whereas the optimization of :math:`\theta` is executed in a classical optimization loop.
+
+The outline of the algorithm for :math:`N`-qubits circuit is as follows:
+
+1. **Encode input data** :math:`\{x_i\}` into a quantum state :math:`|\Psi_{in}(x_i)\rangle` applying some unitary transformation       :math:`\U(x_i)}` to initialized qubits :math:`|0\rangle`.
+   
+2. Apply :math:`\theta`-parametrized unitary on input state :math:`|\Psi_{in}(x_i)\rangle` and **generate output state** :math:`U(\theta)|\Psi_{in}(x_i)\rangle=|\Psi_{out}(x_i,\theta)\rangle`.
+   
+3. Measure the **expectation values** of a subset of Pauli operators :math:`\{B\} \subset \{I,X,Y,Z\}^N`.
+
+4. If necessary, make a transformation of expectation value by some function :math:`F` and calculate an **output** :math:`\{f(x_i,\theta)\}=F(\langle B(x_i,\theta) \rangle)`. 
+
+5. Calculate **loss** :math:`L{y_i,f(x_i,\theta))` and **minimize** it using classical optimization algorithm such as gradient descend.
+
+The quantum subroutine of QCL amounts to encoding an input data into a quantum state, applying an :math:`\theta`-parametrized unitary and calculating expectation values for each training sample. Additionally, a gradient descend method requires to calculate gradients, which are also obtained by calculation of two expectation values :math:`\frac{\partial \langle B \rangle}{\partial \theta_j}=\frac{\langle B \rangle^{+}-\langle B \rangle^{-}}{2}`. 
+To calculate :math:`\langle B \rangle^{\pm}`, one has to modify the output state by inserting :math:`\pm \frac{\pi}{2}` rotations 
+next to the :math:`\theta_j`-dependend unitary :math:`U_j(theta_j)` (assuming that :math:`U(\theta)` consists of 
+a chain of unitary transformatiosn :math:`\{U_j(theta_j)\}`. For more details see the original paper [`1 <https://arxiv.org/abs/1803.00745>`_].
+
+Below there are simple examples of regression task and classification task presented.
+
+Regression example
+-----------
+
+In this example, QCL will try to learn a simple quadratic function :math:`x^2`.
+
+Firstly, we generate a small dataset:
+
+.. code:: python
+
+   import numpy as np
+
+   np.random.seed(0)
+   m = 8
+   X = np.linspace(-0.95,0.95,m)
+   y = X**2
+
+Next thing is to write a pyQuil program that encodes input data into a quantum state :math:`|\Psi_{in}(x_i)\rangle`.
+We will do that applying :math:`R^X(\cos(x^2))R^Y(\sin(x))` gate on each qubit in :math:`N=3`-qubits quantum circuit. 
+
+.. code:: python
+
+   import pyquil.quil as pq
+   from pyquil.gates import RX, RY, RZ
+   
+   n_qubits = 3
+   def input_prog(sample):
+       p = pq.Program()
+       for j in range(n_qubits):
+           p.inst(RY(np.arcsin(sample[0]), j))
+           p.inst(RZ(np.arccos(sample[0]**2), j))
+       return p 
+
+The output program is generated by evolving quantum system accordingly to a fully connected transverse Ising model Hamilonian 
+and then applying :math:`R^X(\theta_{j,1})R^Z(\theta_{j,2})R^X(\theta_{j,3})` gate on each qubit. This procedure is reapeated :math:`D` times to increase the learning capacity of QCL. The Ising model Hamiltonian is expressed by following formula:
+
+:math:`H = \sum{j=1}{N}h_j X_j + \sum{j=1}{N}\sum{k=1}{j-1}J_{jk}Z_j Z_k`
+
+and then the evolution with time :math:`T=10` is expressed by :math:`\exp(-iTH)`. The dynamics of this Hamiltonian generates a highly entangled state and is a key ingredient to successfully learn an output. To generate a Quil program responsible for the approximation of Ising model evolution we use a helper function:
+
+.. code:: python
+
+   from grove.pyqcl.qcl import ising_prog_gen 
+   ising_prog = ising_prog_gen(trotter_steps=1000, T=10, n_qubits=n_qubits)
+
+The output state is generated with the following function (:math:`D` is denoted here by depth variable):
+
+.. code:: python
+
+   depth = 3   
+   def output_prog(theta):
+       p = pq.Program()
+       theta = theta.reshape(3,n_qubits,depth)
+       for i in range(depth):
+           p += ising_prog
+           for j in range(n_qubits):
+               p.inst(RX(theta[0,j,i], j))
+               p.inst(RZ(theta[1,j,i], j))
+               p.inst(RX(theta[2,j,i], j))
+       return p
+
+The last thing is to write a function generating Quil program responsible for gradient calculations accordingly to formulas presented in the paper. This is done by inserting :math:`\pm \frac{\pi}{2}` rotations:
+
+.. code:: python
+
+   def grad_prog(theta, idx, sign):
+       theta = theta.reshape(3,n_qubits,depth)
+       idx = np.unravel_index(idx, theta.shape)
+       p = pq.Program()
+       for i in range(depth):
+           p += ising_prog
+           for j in range(n_qubits):
+               p.inst(RX(theta[0,j,i], j))
+               if idx == (0,j,i):
+                   p.inst(RX(sign*np.pi/2.0, j))
+               p.inst(RZ(theta[1,j,i], j))
+               if idx == (1,j,i):
+                   p.inst(RZ(sign*np.pi/2.0, j))
+               p.inst(RX(theta[2,j,i], j))
+               if idx == (2,j,i):
+                   p.inst(RX(sign*np.pi/2.0, j))
+       return p
+
+Now, it is time to run QCL. We initialize :math:`\theta` parameters with random numbers drawn from uniform distribution on :math:`[0,2*\pi]`. The output is taken from :math:`Z` expectation values on the first qubit and we use mean squared error as a loss function minimized. A number of training iterations (epochs) is set to :math:`20` and we use full-batch gradient descend. For mean squared error loss function, the expectation value is multiplied by a coefficient which is also optimized, however, this is done inside the code. The aim of this multiplication is to scale the expectation value. 
+
+.. code:: python
+
+   import pyquil.api as api
+   from pyquil.gates import Z
+   from grove.pyqcl.qcl import QCL
+
+   qvm = api.QVMConnection()
+
+   state_generators = dict()
+   state_generators['input'] = input_prog
+   state_generators['output'] = output_prog
+   state_generators['grad'] = grad_prog
+
+   initial_theta = np.random.uniform(0.0, 2*np.pi, size=3*n_qubits*depth)
+
+   operator = [pq.Program(Z(n_qubits-1))] 
+   est = QCL(state_generators, initial_theta, loss="mean_squared_error",  
+             operator_programs=operator, epochs=20, batch_size=m,
+             verbose=True, qvm=qvm)
+
+We fit a QCL estimator based on our data and labels, get the results for inspection and predict to produce a plot.
+(Training can take a while as :math:`3*n_qubits*depth*m*3*epochs=12960` expectation values need to be simulated on QVM machine.)
+
+.. code:: python
+
+   est.fit(X,y)
+   results = est.get_results()
+
+   X_test = np.linspace(-1.0,1.0,50)
+   y_pred = est.predict(X_test)
+
+   import matplotlib.pyplot as plt
+   plt.plot(X, y, 'bs', X_test,  y_pred, 'r-')
+
+As we see the QCL fits nicely to the data. Assuming, we have a real quantum computer, we can increase significantly number of qubits, depth and number of samples to cope with more complex regression tasks.
+
+.. image:: qcl/qcl_regression.png
+    :align: center
+
+History of loss values is presented in the plot below.
+
+.. image:: qcl/qcl_loss.png
+    :align: center
+    
+Classification example
+-----------
+
+In this example, QCL will try to perform a simple nonlinear classification task. The data points belong to two classes 0 (red dots) and 1 (blue dots).
+
+.. image:: qcl/qcl_classification_data.png
+    :align: center
+
+The algorithm structure is very similar to the QCL regression example. We generate a data with sklearn make_circles method:
+
+.. code:: python
+
+   from sklearn.datasets import make_circles
+   np.random.seed(0)
+   m = 10
+   X, y = make_circles(n_samples=m, factor=.1, noise=.0, random_state=0)
+
+Next, we produce a function generating input, output and gradient states. The default methods of QCL can be used.
+
+.. code:: python
+
+   from grove.pyqcl.qcl import (ising_prog_gen, default_input_state_gen, 
+                    default_output_state_gen, default_grad_state_gen)
+
+   n_qubits, depth = 4, 4
+
+   ising_prog = ising_prog_gen(trotter_steps=1000, T=10, n_qubits=n_qubits)
+   state_generators = dict()
+   state_generators['input'] = default_input_state_gen(n_qubits)
+   state_generators['output'] = default_output_state_gen(ising_prog, n_qubits, depth)
+   state_generators['grad'] = default_grad_state_gen(ising_prog, n_qubits, depth)
+
+We increase the number of qubits and depth of quantum circuit in comparison to regression task to get a better fit. The output is taken from :math:`Z` expectation values on the first and second qubit and we use binary crossentropy as a loss function minimized. A number of training iterations (epochs) is set to :math:`20` and we use full-batch gradient descend. For binary crossentropy loss function the expectation values are transformed by softmax function to get valid probabilities, however, this is done inside the code.
+(Training can take a while as many expectation values need to be simulated on QVM machine.)
+
+.. code:: python
+
+   initial_theta = np.random.uniform(0.0, 2*np.pi, size=3*n_qubits*depth)
+   operator = [pq.Program(Z(n_qubits-1)), pq.Program(Z(n_qubits-2))] 
+   est = QCL(state_generators, initial_theta, loss="binary_crossentropy",  
+             operator_programs=operator, epochs=20, batch_size=m,
+             verbose=True, qvm=qvm)
+
+   est.fit(X,y)
+   results = est.get_results()
+
+Now, we can plot the decision surface of a fitted QCL estimator:
+
+.. code:: python
+
+   import matplotlib.pyplot as plt
+   from matplotlib.colors import ListedColormap
+   cm = plt.cm.RdBu
+   cm_bright = ListedColormap(['#FF0000', '#0000FF'])
+   xx, yy = np.meshgrid(np.linspace(-1.0, 1.0, 10),
+                        np.linspace(-1.0, 1.0, 10))
+   y_pred = est.predict(np.c_[xx.ravel(), yy.ravel()])[:,0]
+   Z = y_pred.reshape(xx.shape)
+   plt.contourf(xx, yy, Z, cmap=cm, alpha=.8)
+   plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cm_bright, edgecolors='k')
+
+As we see the QCL fits sufficiently to the data. We can increase the predictive force by increasing number of qubits and depth of quantum circuit.
+
+.. image:: qcl/qcl_classification.png
+    :align: center
+    
+Links and Further Reading
+-------------------------
+
+This concludes our brief introduction to QCL. There are many areas of machine learning in which QCL can used, thus if you have ideas how to expand the algorithm feel free to make suggestions.
+
+Link to the original paper:
+
+- `Quantum Circuit Learning <https://arxiv.org/abs/1803.00745>`_
+
+Source Code Docs
+----------------
+
+Here you can find documentation for the different submodules in pyQCL.
+
+grove.pyqcl.qcl
+~~~~~~~~~~~~~~~
+
+.. automodule:: grove.pyqcl.qcl
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+grove.pyqcl.optimizer
+~~~~~~~~~~~~~~~
+
+.. automodule:: grove.pyqcl.optimizer
+    :members:
+    :undoc-members:
+    :show-inheritance:
diff --git a/docs/qcl/qcl_classification.png b/docs/qcl/qcl_classification.png
new file mode 100644
index 0000000..012b619
Binary files /dev/null and b/docs/qcl/qcl_classification.png differ
diff --git a/docs/qcl/qcl_classification_data.png b/docs/qcl/qcl_classification_data.png
new file mode 100644
index 0000000..79d5a46
Binary files /dev/null and b/docs/qcl/qcl_classification_data.png differ
diff --git a/docs/qcl/qcl_loss.png b/docs/qcl/qcl_loss.png
new file mode 100644
index 0000000..b89a917
Binary files /dev/null and b/docs/qcl/qcl_loss.png differ
diff --git a/docs/qcl/qcl_regression.png b/docs/qcl/qcl_regression.png
new file mode 100644
index 0000000..ebd221b
Binary files /dev/null and b/docs/qcl/qcl_regression.png differ
diff --git a/grove/pyqcl/__init__.py b/grove/pyqcl/__init__.py
new file mode 100644
index 0000000..30c0ff9
--- /dev/null
+++ b/grove/pyqcl/__init__.py
@@ -0,0 +1,15 @@
+##############################################################################
+# Copyright 2016-2017 Rigetti Computing
+#
+#    Licensed under the Apache License, Version 2.0 (the "License");
+#    you may not use this file except in compliance with the License.
+#    You may obtain a copy of the License at
+#
+#        http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS,
+#    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#    See the License for the specific language governing permissions and
+#    limitations under the License.
+##############################################################################
\ No newline at end of file
diff --git a/grove/pyqcl/optimizer.py b/grove/pyqcl/optimizer.py
new file mode 100644
index 0000000..ff874f1
--- /dev/null
+++ b/grove/pyqcl/optimizer.py
@@ -0,0 +1,262 @@
+##############################################################################
+# Copyright 2016-2017 Rigetti Computing
+#
+#    Licensed under the Apache License, Version 2.0 (the "License");
+#    you may not use this file except in compliance with the License.
+#    You may obtain a copy of the License at
+#
+#        http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS,
+#    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#    See the License for the specific language governing permissions and
+#    limitations under the License.
+##############################################################################
+import math
+import numpy as np
+
+import pyquil.api as api
+
+class OptResults(dict):
+    """
+    Object for holding theta optimization results from QCL.
+    """
+    def __getattr__(self, name):
+        try:
+            return self[name]
+        except KeyError:
+            raise AttributeError(name)
+
+    __setattr__ = dict.__setitem__
+    __delattr__ = dict.__delitem__
+    
+   
+class GradientOptimizer():
+    """
+    Implementation of gradient descent method. 
+
+    :param initial_theta: (ndarray) initial parameters for optimization.
+    :param loss: (string) loss function definition.
+    :params learning_rate: (float) learning rate for gradient descend method.
+                           Default=1.0
+    :params epochs: (int) number of gradient descend epochs. Default=1 
+    :params batch_size: (int) batch size. Default=None
+    :params verbose: (bool) Print intermediate results. Default=False                      
+    """
+    def __init__(self, initial_theta, loss, learning_rate=1.0, epochs=1, 
+                 batch_size=None, verbose=False):
+        
+        self.loss_mse = 'mean_squared_error'
+        self.loss_entropy = 'binary_crossentropy'
+        self.available_losses = [self.loss_mse, self.loss_entropy]
+        
+        self.initial_theta = initial_theta
+        
+        # Loss function
+        self.loss = loss
+        if self.loss not in self.available_losses:
+            raise ValueError("Available losses are " + self.available_losses)
+            
+        # Learning rate 
+        self.learning_rate = learning_rate
+        if not isinstance(self.learning_rate, float) and not isinstance(self.learning_rate, int):
+            raise TypeError("learning_rate variable must be a number")
+        if isinstance(self.learning_rate, int):
+            self.learning_rate = float(self.learning_rate)
+        if self.learning_rate <= 0:
+            raise ValueError("learning_rate variable must be a postive number")
+        
+        # Epochs and batch size
+        self.epochs = epochs
+        if not isinstance(self.epochs, int):
+            raise TypeError("epochs variable must be an integer")
+        if self.epochs <= 0:
+            raise ValueError("epochs variable must be a postive integer")
+        self.batch_size = batch_size
+        if self.batch_size is not None:
+            if not isinstance(self.batch_size, int):
+                raise TypeError("batch size variable must be an integer")
+            if self.batch_size <= 0:
+                raise ValueError("batch size variable must be a postive integer")   
+               
+        self.verbose = verbose
+        
+    def gradient_descend(self, X, y, state_generators, operator_programs=None,
+                         qvm=None): 
+        """
+        Perform theta optimization using gradient descend method.
+ 
+        :param X: (ndarray) Training data of shape (n_samples,n_features).
+        :param y: (ndarray) Training labels of shape (n_samples,n_classes) for 
+            classification task and (n_samples,) for regression task.
+        :param state_generators: (dict) Dictionary with pyQuil programs generating
+            input state, output state and gradient states.
+        :param list operator_programs: A list of Programs, each specifying an 
+            operator whose expectation to compute. Default is a list containing 
+            only the empty Program.
+        :param qvm: (optional, QVM) forest connection object.
+                               
+        :return (qcl.OptResult()) object :func:`OptResult <vqe.OptResult>`.
+                 The following fields are initialized in OptResult:
+                 -theta: set of optimized parameters
+                 -coeff: scalar value of optimized mulitiplicative coefficient. 
+                 -history_theta: a list of all intermediate parameter vectors.
+                 -history_loss: a list of all intermediate losses.
+                 -history_grad: a list of all intermediate gradient arrays.
+        """
+        history_theta, history_loss, history_grad = [], [], []
+        coeff, theta = 1.0, self.initial_theta
+        
+        prog_input_gen = state_generators['input']
+        prog_output_gen = state_generators['output']
+        prog_output_grad = state_generators['grad']
+        
+        n_samples = len(X)
+        n_theta = len(theta)
+        
+        if qvm is None:
+            self.qvm = api.QVMConnection()
+        else:
+            self.qvm = qvm
+            
+        # Check operators
+        if not isinstance(operator_programs, list):
+            operator_programs = [operator_programs]
+        n_operators = len(operator_programs)
+        
+        # Check batch size
+        if self.batch_size is None:
+            self.batch_size = n_samples
+        self.batch_size = min(self.batch_size, n_samples)
+        
+        # Loop over epochs
+        for e in range(self.epochs):  
+            
+            # Loop over batches
+            batches = self.generate_batches(X, y, self.batch_size)
+            n_batches = len(batches)
+            for i, batch in enumerate(batches):
+                
+                batch_X, batch_y = batch
+                n_samples_in_batch = len(batch_X)
+                
+                # Predictions
+                batch_y_pred = np.zeros((n_samples_in_batch, n_operators))
+                for k in range(n_samples_in_batch):
+                    prog = prog_input_gen(batch_X[k,:])
+                    prog += prog_output_gen(theta)
+                    batch_y_pred[k,:] = coeff * np.array(qvm.expectation(prog, operator_programs))
+                    if self.loss == self.loss_entropy:
+                       batch_y_pred[k,:] = np.exp(batch_y_pred[k,:]) / np.sum(np.exp(batch_y_pred[k,:]))
+                    
+                # Comput loss
+                loss_value = self._compute_loss(batch_y, batch_y_pred)
+                
+                # Display status
+                if self.verbose:
+                    print('Epoch: {}/{} ::: Batch: {}/{} ::: Loss: {:.5f}'.format(e+1, self.epochs, i+1, n_batches, loss_value)) 
+                
+                # Gradient
+                if not (e == self.epochs - 1 and i == n_batches - 1):
+                    grad = np.zeros((n_samples_in_batch, n_operators, n_theta))
+                    for k in range(n_samples_in_batch):
+                        
+                        # Define input state 
+                        prog_input = prog_input_gen(batch_X[k,:])
+                        
+                        # Caclulate gradient for each theta_j
+                        for j in range(n_theta):
+                            
+                            # Gradient +/- 
+                            for sign in [1,-1]:
+                                grad_sign = np.zeros(n_operators)
+                                grad_progs = prog_output_grad(theta, j, sign)
+                                # Generally, the gradient programs could return
+                                # a program or list of programs (in case the 
+                                # gradient +/- is the sum of expectations)
+                                if not isinstance(grad_progs, list):
+                                    grad_progs = [grad_progs]
+                                for grad_prog in grad_progs:
+                                    prog = prog_input
+                                    prog += grad_prog
+                                    # B_j +/- expectation
+                                    grad_sign += np.array(qvm.expectation(prog, operator_programs))
+                                # Gradient = (B_j+ - B_j-) / 2
+                                grad[k, :, j] += sign / 2.0 * grad_sign
+                                
+                    # Gradient update
+                    grad_full = self._compute_grad_full(batch_y, batch_y_pred, grad)
+                    if self.loss == self.loss_mse:
+                        grad_full_coeff = -2.0 * np.mean((batch_y - batch_y_pred) * batch_y_pred)
+                    
+                    # Update theta
+                    theta -= self.learning_rate * grad_full
+                    if self.loss == self.loss_mse:
+                        coeff -= self.learning_rate * grad_full_coeff
+                    
+                # Append to history
+                history_loss.append(loss_value)
+                history_theta.append(theta)
+                history_grad.append(grad)
+            
+            # Prepare results
+            results = OptResults()
+            results.theta, results.coeff = theta, coeff
+            results.loss = loss_value
+            results.history_loss = history_loss
+            results.history_theta = history_theta
+            results.history_grad = history_grad
+            
+        return results
+    
+    @staticmethod
+    def generate_batches(X, y, batch_size):
+        """
+        Creates a list of shuffled batches from (X, y)
+     
+        :param X: (ndarray) Training data of shape (n_samples,n_features).
+        :param y: (ndarray) Training labels of shape (n_samples,n_classes) for 
+            classification task and (n_samples,) for regression task.   
+        :param batch_size: (int) batch size
+        
+        :return batches: returns a list of tuples (batch_X, batch_y)
+        """
+        m = len(X)
+        batches = []
+            
+        # Shuffle 
+        permutation = list(np.random.permutation(m))
+        shuff_X = X[permutation,:]
+        shuff_y = y[permutation]
+    
+        # Partition
+        num_complete_batches = math.floor(m/batch_size)
+        for k in range(0, num_complete_batches):
+            batch_X = shuff_X[k*batch_size:(k+1)*batch_size, :]
+            batch_y = shuff_y[k*batch_size:(k+1)*batch_size]
+            batches.append((batch_X, batch_y))
+        
+        # End case (last mini-batch < mini_batch_size)
+        if m % batch_size != 0:
+            batch_X = shuff_X[num_complete_batches*batch_size:m]
+            batch_y = shuff_y[num_complete_batches*batch_size:m]
+            batches.append((batch_X, batch_y))
+        
+        return batches 
+    
+    def _compute_loss(self, y, y_pred):
+        if self.loss == self.loss_mse:
+            return np.mean((y - y_pred) ** 2)
+        elif self.loss == self.loss_entropy:
+            return -1.0 * np.mean(np.sum(y * np.log(y_pred), axis=1)) 
+        else:
+            raise ValueError("Available losses are " + self.available_losses)
+            
+    def _compute_grad_full(self, y, y_pred, grad):
+        if self.loss == self.loss_mse:
+            return -2.0 * np.mean((y - y_pred) * grad[:,0,:], axis=0)
+        elif self.loss == self.loss_entropy:
+            return -1.0 * np.mean((grad[:,0,:] - grad[:,1,:]) * (y[:,0,np.newaxis] - y_pred[:,0,np.newaxis]), axis=0)
+        else:
+            raise ValueError("Available losses are " + self.available_losses)
\ No newline at end of file
diff --git a/grove/pyqcl/qcl.py b/grove/pyqcl/qcl.py
new file mode 100644
index 0000000..e55b128
--- /dev/null
+++ b/grove/pyqcl/qcl.py
@@ -0,0 +1,318 @@
+##############################################################################
+# Copyright 2016-2017 Rigetti Computing
+#
+#    Licensed under the Apache License, Version 2.0 (the "License");
+#    you may not use this file except in compliance with the License.
+#    You may obtain a copy of the License at
+#
+#        http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS,
+#    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#    See the License for the specific language governing permissions and
+#    limitations under the License.
+##############################################################################
+
+import itertools
+import numpy as np
+from scipy import linalg
+
+import pyquil.api as api
+import pyquil.quil as pq
+from pyquil.gates import RX, RY, RZ
+
+from grove.pyqcl.optimizer import GradientOptimizer
+
+class QCL(object):
+    """
+    The Quantum Circuit Learning algorithm
+
+    QCL is an object that encapsulates the Quantum Circuit Learning algorithm -
+    a classical-quantum hybrid algorithm for machine learning on near-term 
+    quantum processors. The main components of the QCL algorithm are options for 
+    gradient descend options performing the loss function minimization,
+    a unitary encoding input data, a unitary generating output state and 
+    operator programs on which expectation values are measured in the output state.
+
+    Using this object:
+
+        1) initilze with `inst = QCL(state_generators, operator_programs, **params)
+
+        2) call `inst.fit(X, y)` to fit QCL to input data 
+    
+        3) call `inst.predict(X)` to get predictions of fitted QCL
+
+    :param state_generators: (dict) Dictionary with pyQuil programs generating
+        input state, output state and gradient states.
+    :param operator_programs: (list) A list of Programs, each specifying an 
+        operator whose expectation to compute. Default is a list containing 
+        only the empty Program.
+    :param qvm: (optional, QVM) forest connection object.
+    :param initial_theta: (ndarray) initial parameters for optimization.
+    :param loss: (string) loss function definition.
+    :params learning_rate: (float) learning rate for gradient descend method.
+                           Default=1.0
+    :params epochs: (int) number of gradient descend epochs. Default=1 
+    :params batch_size: (int) batch size. Default=None
+    :params verbose: (bool) Print intermediate results. Default=False
+                                   
+    """
+
+    def __init__(self, state_generators, initial_theta, loss, operator_programs=None, 
+                 learning_rate=1.0, epochs=1, batch_size=None, qvm=None, verbose=False): 
+                 
+        self.loss_mse = 'mean_squared_error'
+        self.loss_entropy = 'binary_crossentropy'
+        
+        self.initial_theta = initial_theta
+        self.loss = loss
+        self.learning_rate = learning_rate
+        self.epochs = epochs
+        self.batch_size = batch_size
+        
+        self.state_generators = state_generators
+        if 'input' not in self.state_generators.keys():
+            raise ValueError("Provide pyQuil program responsible for " + 
+                             "encoding X into quantum state!")
+        if 'output' not in self.state_generators.keys():
+            raise ValueError("Provide pyQuil program responsible for " + 
+                             "genering X output state!")
+        if 'grad' not in self.state_generators.keys():
+            raise ValueError("Provide pyQuil program responsible for " + 
+                             "calculating a gradient!")
+           
+        self.operator_programs = operator_programs
+
+        if qvm is None:
+            self.qvm = api.QVMConnection()
+        else:
+            self.qvm = qvm
+        
+        self.verbose = verbose
+        
+        self.fit_OK = False
+    
+    def fit(self, X, y):
+        """
+        Fit QCL.
+        
+        :param X: (ndarray) Training data of shape (n_samples,n_features).
+        :param y: (ndarray) Training labels of shape (n_samples,n_classes) for 
+            classification task and (n_samples,) for regression task.  
+
+        :return self: returns an instance of self.
+        """
+    
+        X, y = self._make_data(X, y)  
+        
+        optimizer = GradientOptimizer(self.initial_theta, self.loss, 
+                                      self.learning_rate, self.epochs, 
+                                      self.batch_size, self.verbose)            
+        self.results = optimizer.gradient_descend(X, y, self.state_generators, 
+                                                  self.operator_programs, self.qvm)
+        
+        self.fit_OK = True
+        
+        return self
+
+    def predict(self, X):
+        """
+        Predict QCL.
+        
+        :param X: (ndarray) Testing data of shape (n_samples,n_features). 
+
+        :return y_pred: (ndarray) Predictions of shape (n_samples,n_classes) for 
+            classification task and (n_samples,) for regression task. 
+        """        
+        if not self.fit_OK:
+            raise ValueError("The QCL instance must be fitted before.")   
+                
+        X = self._make_data(X) 
+        n_samples = len(X)
+        
+        # Check operators
+        if not isinstance(self.operator_programs, list):
+            operator_programs = [self.operator_programs]
+        else:
+            operator_programs = self.operator_programs
+        n_operators = len(self.operator_programs)
+        
+        prog_input_gen = self.state_generators['input']
+        prog_output_gen = self.state_generators['output']
+        
+        y_pred = np.zeros((n_samples, n_operators))
+        for k in range(n_samples):
+            prog = prog_input_gen(X[k,:])
+            prog += prog_output_gen(self.results.theta)
+            y_pred[k,:] = self.results.coeff * np.array(self.qvm.expectation(prog, operator_programs))
+            if self.loss == self.loss_entropy:
+                y_pred[k,:] = np.exp(y_pred[k,:]) / np.sum(np.exp(y_pred[k,:])) 
+
+        return y_pred
+                    
+    def get_results(self):
+        """
+        Extract fitting results.
+
+        :return (qcl.OptResult()) object :func:`OptResult <vqe.OptResult>`.
+                 The following fields are initialized in OptResult:
+                 -theta: set of optimized parameters
+                 -coeff: scalar value of optimized mulitiplicative coefficient. 
+                 -history_theta: a list of all intermediate parameter vectors.
+                 -history_loss: a list of all intermediate losses.
+                 -history_grad: a list of all intermediate gradient arrays.
+        """ 
+        if not self.fit_OK:
+            raise ValueError("The QCL instance must be fitted before.")
+        
+        return self.results
+        
+    def _make_data(self, X, y=None):
+        if np.ndim(X) == 1:
+            X = X.reshape(-1,1)
+            
+        if y is not None:
+            if np.ndim(y) == 1:
+                y = y.reshape(-1,1)
+            if self.loss == self.loss_entropy and y.shape[1] == 1:
+                y = np.c_[y, 1-y]
+            return X, y
+        else:
+            return X
+        
+def default_input_state_gen(n_qubits):
+    """
+    Generates Quil program encoding input data in a way described in 
+    https://arxiv.org/abs/1803.00745
+    
+    :param n_qubits: (int) number of qubits in a circuit. 
+
+    :return (Program) Quil program for input data encoding 
+    """ 
+    def prog(sample):
+        p = pq.Program()
+        n_features = len(sample)
+        for j in range(n_qubits):
+            p.inst(RY(np.arcsin(sample[j % n_features]), j))
+            p.inst(RZ(np.arccos(sample[j % n_features]**2), j))
+        return p 
+    return prog
+
+def default_output_state_gen(ising_prog, n_qubits, depth):
+    """
+    Generates Quil program preparing output state in a way described in 
+    https://arxiv.org/abs/1803.00745
+    
+    :param ising_prog: (Program) Program evolving system according to fully 
+        connected transverse Ising hamiltionian. 
+    :param n_qubits: (int) number of qubits in a circuit. 
+    :param depth: (int) depth of a circuit. 
+    
+    :return (Program) Quil program for preparing output state 
+    """
+    def prog(theta):
+        p = pq.Program()
+        theta = theta.reshape(3,n_qubits,depth)
+        for i in range(depth):
+            p += ising_prog
+            for j in range(n_qubits):
+                p.inst(RX(theta[0,j,i], j))
+                p.inst(RZ(theta[1,j,i], j))
+                p.inst(RX(theta[2,j,i], j))
+        return p
+    return prog
+
+def default_grad_state_gen(ising_prog, n_qubits, depth):
+    """
+    Generates Quil program preparing a state allowing for calculation of 
+    gradients in a way described in https://arxiv.org/abs/1803.00745. Gradients
+    are created by inserting R(+/- pi/2) rotations in a chain of unitary 
+    transformations.
+    
+    :param ising_prog: (Program) Program evolving system according to fully 
+        connected transverse Ising hamiltionian. 
+    :param n_qubits: (int) number of qubits in a circuit. 
+    :param depth: (int) depth of a circuit. 
+    
+    :return (Program) Quil program for preparing a state allowing for gradient 
+        calculation.
+    """
+    def prog(theta, idx, sign):
+        theta = theta.reshape(3,n_qubits,depth)
+        idx = np.unravel_index(idx, theta.shape)
+        p = pq.Program()
+        for i in range(depth):
+            p += ising_prog
+            for j in range(n_qubits):
+                p.inst(RX(theta[0,j,i], j))
+                if idx == (0,j,i):
+                    p.inst(RX(sign*np.pi/2.0, j))
+                p.inst(RZ(theta[1,j,i], j))
+                if idx == (1,j,i):
+                    p.inst(RZ(sign*np.pi/2.0, j))
+                p.inst(RX(theta[2,j,i], j))
+                if idx == (2,j,i):
+                    p.inst(RX(sign*np.pi/2.0, j))
+        return p
+    return prog  
+   
+def ising_prog_gen(trotter_steps, T, n_qubits):
+    """
+    Generates Quil program evolving system according to fully connected
+    transverse Ising hamiltionian. The exponential of a sum of non-commuting 
+    Pauli operators is generated by Trotter Suzuki approximation of first order 
+    (Lie product formula). 
+    Important: The generation of matrix exponential could be done only on QVM 
+    connection. 
+    
+    :param trotter_steps: (int) trotter steps. 
+    :param T: (int) evolution time. 
+    :param n_qubits: (int) number of qubits in a circuit.  
+    
+    :return (Program) Quil program evolving system according to fully connected
+        transverse Ising hamiltionian.
+    """
+    gate_I = np.eye(2)
+    gate_X = np.array([[0.0,1.0],
+                       [1.0,0.0]])
+    gate_Z = np.array([[1.0,0.0],
+                       [0.0,-1.0]])
+       
+    def multi_kron(*args):
+        ret = np.array([[1.0]])
+        for q in args:
+            ret = np.kron(ret, q)
+        return ret
+    
+    def multi_dot(*args):
+        for i, q in enumerate(args):
+            if i == 0:
+                ret = q
+            else:
+                ret = np.dot(ret, q)
+        return ret
+        
+    # Initilize coefficients
+    h_coeff = np.random.uniform(-1.0, 1.0, size=n_qubits)
+    J_coeff = dict()
+    for val in itertools.combinations(range(n_qubits),2):
+        J_coeff[val] = np.random.uniform(-1.0, 1.0)
+       
+    # Unitary
+    for steps in range(trotter_steps):
+        
+        non_inter = [linalg.expm(-(1j)*T/trotter_steps*multi_kron(*[h * gate_X if i == j else gate_I for i in range(n_qubits)])) for j, h in enumerate(h_coeff)]
+        inter = [linalg.expm(-(1j)*T/trotter_steps*multi_kron(*[J * gate_Z if i == k[0] else gate_Z if i == k[1] else gate_I for i in range(n_qubits)])) for k, J in J_coeff.items()]
+        ising_step = multi_dot(*non_inter+inter)
+    
+        if steps == 0:
+            ising_gate = ising_step
+        else:
+            ising_gate = multi_dot(ising_step, ising_gate) 
+            
+    ising_name = 'ISING_GATE'
+    ising_prog = pq.Program().defgate(ising_name, ising_gate)
+    ising_prog.inst(tuple([ising_name] + list(reversed(range(n_qubits)))))
+    
+    return ising_prog
\ No newline at end of file