-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tn #74
base: main
Are you sure you want to change the base?
Update tn #74
Changes from all commits
17813a6
9d37020
bbacc26
f358d0e
1aaa838
fb5e0e7
c1dd326
f4d00c4
8aef3af
fe83631
c3a845c
f33c44e
f9e74fe
289c8e2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -62,6 +62,7 @@ def dense_vector_tn_MPI(qibo_circ, datatype, n_samples=8): | |
Dense vector of quantum circuit. | ||
""" | ||
|
||
import cuquantum.cutensornet as cutn | ||
from cuquantum import Network | ||
from mpi4py import MPI | ||
|
||
|
@@ -71,21 +72,31 @@ def dense_vector_tn_MPI(qibo_circ, datatype, n_samples=8): | |
size = comm.Get_size() | ||
|
||
device_id = rank % getDeviceCount() | ||
cp.cuda.Device(device_id).use() | ||
mempool = cp.get_default_memory_pool() | ||
|
||
# Perform circuit conversion | ||
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype) | ||
if rank == 0: | ||
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype) | ||
|
||
operands = myconvertor.state_vector_operands() | ||
operands = myconvertor.state_vector_operands() | ||
else: | ||
operands = None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the actual purpose of this? If Even if it is somehow meaningful (I'm not seeing how, but that may be my limitation), the result could only be trivial, so you could even return immediately, without executing all the other operations... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Each rank needs the same initial set of operands for computation. Here, the operands are created in Rank 0, for all other rank the operands are just set to None. In line 86, the operands created in Rank 0 is then broadcasted to all other ranks. |
||
|
||
# Assign the device for each process. | ||
device_id = rank % getDeviceCount() | ||
Comment on lines
-80
to
-81
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you remember why it was repeated before? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment may be still useful, and you could lift to the line above. |
||
operands = comm.bcast(operands, root) | ||
|
||
# Create network object. | ||
network = Network(*operands, options={"device_id": device_id}) | ||
|
||
# Compute the path on all ranks with 8 samples for hyperoptimization. Force slicing to enable parallel contraction. | ||
path, info = network.contract_path( | ||
optimize={"samples": n_samples, "slicing": {"min_slices": max(32, size)}} | ||
optimize={ | ||
"samples": n_samples, | ||
"slicing": { | ||
"min_slices": max(32, size), | ||
"memory_model": cutn.MemoryModel.CUTENSOR, | ||
}, | ||
} | ||
) | ||
|
||
# Select the best path from all ranks. | ||
|
@@ -114,6 +125,9 @@ def dense_vector_tn_MPI(qibo_circ, datatype, n_samples=8): | |
# Sum the partial contribution from each process on root. | ||
result = comm.reduce(sendobj=result, op=MPI.SUM, root=root) | ||
|
||
del network | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this for? Why do you now need it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Assuming it is for memory management
So, unless it is exactly documented that you should apply |
||
mempool.free_all_blocks() | ||
|
||
return result, rank | ||
|
||
|
||
|
@@ -136,6 +150,7 @@ def dense_vector_tn_nccl(qibo_circ, datatype, n_samples=8): | |
Returns: | ||
Dense vector of quantum circuit. | ||
""" | ||
import cuquantum.cutensornet as cutn | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above |
||
from cupy.cuda import nccl | ||
from cuquantum import Network | ||
from mpi4py import MPI | ||
|
@@ -148,21 +163,33 @@ def dense_vector_tn_nccl(qibo_circ, datatype, n_samples=8): | |
device_id = rank % getDeviceCount() | ||
|
||
cp.cuda.Device(device_id).use() | ||
mempool = cp.get_default_memory_pool() | ||
|
||
# Set up the NCCL communicator. | ||
nccl_id = nccl.get_unique_id() if rank == root else None | ||
nccl_id = comm_mpi.bcast(nccl_id, root) | ||
comm_nccl = nccl.NcclCommunicator(size, nccl_id, rank) | ||
|
||
# Perform circuit conversion | ||
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype) | ||
operands = myconvertor.state_vector_operands() | ||
if rank == 0: | ||
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype) | ||
operands = myconvertor.state_vector_operands() | ||
else: | ||
operands = None | ||
|
||
operands = comm_mpi.bcast(operands, root) | ||
|
||
network = Network(*operands) | ||
|
||
# Compute the path on all ranks with 8 samples for hyperoptimization. Force slicing to enable parallel contraction. | ||
path, info = network.contract_path( | ||
optimize={"samples": n_samples, "slicing": {"min_slices": max(32, size)}} | ||
optimize={ | ||
"samples": n_samples, | ||
"slicing": { | ||
"min_slices": max(32, size), | ||
"memory_model": cutn.MemoryModel.CUTENSOR, | ||
}, | ||
} | ||
) | ||
|
||
# Select the best path from all ranks. | ||
|
@@ -200,6 +227,9 @@ def dense_vector_tn_nccl(qibo_circ, datatype, n_samples=8): | |
stream_ptr, | ||
) | ||
|
||
del network | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above |
||
mempool.free_all_blocks() | ||
|
||
return result, rank | ||
|
||
|
||
|
@@ -226,6 +256,7 @@ def expectation_pauli_tn_nccl(qibo_circ, datatype, pauli_string_pattern, n_sampl | |
Returns: | ||
Expectation of quantum circuit due to pauli string. | ||
""" | ||
import cuquantum.cutensornet as cutn | ||
from cupy.cuda import nccl | ||
from cuquantum import Network | ||
from mpi4py import MPI | ||
|
@@ -238,23 +269,36 @@ def expectation_pauli_tn_nccl(qibo_circ, datatype, pauli_string_pattern, n_sampl | |
device_id = rank % getDeviceCount() | ||
|
||
cp.cuda.Device(device_id).use() | ||
mempool = cp.get_default_memory_pool() | ||
|
||
# Set up the NCCL communicator. | ||
nccl_id = nccl.get_unique_id() if rank == root else None | ||
nccl_id = comm_mpi.bcast(nccl_id, root) | ||
comm_nccl = nccl.NcclCommunicator(size, nccl_id, rank) | ||
|
||
# Perform circuit conversion | ||
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype) | ||
operands = myconvertor.expectation_operands( | ||
pauli_string_gen(qibo_circ.nqubits, pauli_string_pattern) | ||
) | ||
if rank == 0: | ||
|
||
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype) | ||
operands = myconvertor.expectation_operands( | ||
pauli_string_gen(qibo_circ.nqubits, pauli_string_pattern) | ||
) | ||
else: | ||
operands = None | ||
|
||
operands = comm_mpi.bcast(operands, root) | ||
|
||
network = Network(*operands) | ||
|
||
# Compute the path on all ranks with 8 samples for hyperoptimization. Force slicing to enable parallel contraction. | ||
path, info = network.contract_path( | ||
optimize={"samples": n_samples, "slicing": {"min_slices": max(32, size)}} | ||
optimize={ | ||
"samples": n_samples, | ||
"slicing": { | ||
"min_slices": max(32, size), | ||
"memory_model": cutn.MemoryModel.CUTENSOR, | ||
}, | ||
} | ||
) | ||
|
||
# Select the best path from all ranks. | ||
|
@@ -292,6 +336,9 @@ def expectation_pauli_tn_nccl(qibo_circ, datatype, pauli_string_pattern, n_sampl | |
stream_ptr, | ||
) | ||
|
||
del network | ||
mempool.free_all_blocks() | ||
|
||
return result, rank | ||
|
||
|
||
|
@@ -318,6 +365,7 @@ def expectation_pauli_tn_MPI(qibo_circ, datatype, pauli_string_pattern, n_sample | |
Returns: | ||
Expectation of quantum circuit due to pauli string. | ||
""" | ||
import cuquantum.cutensornet as cutn | ||
from cuquantum import Network | ||
from mpi4py import MPI # this line initializes MPI | ||
|
||
|
@@ -326,24 +374,35 @@ def expectation_pauli_tn_MPI(qibo_circ, datatype, pauli_string_pattern, n_sample | |
rank = comm.Get_rank() | ||
size = comm.Get_size() | ||
|
||
# Assign the device for each process. | ||
device_id = rank % getDeviceCount() | ||
cp.cuda.Device(device_id).use() | ||
mempool = cp.get_default_memory_pool() | ||
|
||
# Perform circuit conversion | ||
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype) | ||
if rank == 0: | ||
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype) | ||
|
||
operands = myconvertor.expectation_operands( | ||
pauli_string_gen(qibo_circ.nqubits, pauli_string_pattern) | ||
) | ||
operands = myconvertor.expectation_operands( | ||
pauli_string_gen(qibo_circ.nqubits, pauli_string_pattern) | ||
) | ||
else: | ||
operands = None | ||
|
||
# Assign the device for each process. | ||
device_id = rank % getDeviceCount() | ||
operands = comm.bcast(operands, root) | ||
|
||
# Create network object. | ||
network = Network(*operands, options={"device_id": device_id}) | ||
|
||
# Compute the path on all ranks with 8 samples for hyperoptimization. Force slicing to enable parallel contraction. | ||
path, info = network.contract_path( | ||
optimize={"samples": n_samples, "slicing": {"min_slices": max(32, size)}} | ||
optimize={ | ||
"samples": n_samples, | ||
"slicing": { | ||
"min_slices": max(32, size), | ||
"memory_model": cutn.MemoryModel.CUTENSOR, | ||
}, | ||
} | ||
) | ||
|
||
# Select the best path from all ranks. | ||
|
@@ -372,6 +431,9 @@ def expectation_pauli_tn_MPI(qibo_circ, datatype, pauli_string_pattern, n_sample | |
# Sum the partial contribution from each process on root. | ||
result = comm.reduce(sendobj=result, op=MPI.SUM, root=root) | ||
|
||
del network | ||
mempool.free_all_blocks() | ||
|
||
return result, rank | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to keep the imports within the functions? (instead of top-level)
I know it was like this even before this PR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason was that not all functions require the import, specifically dense_vector_tn(), expectation_pauli_tn(), dense_vector_mps(), pauli_string_gen(). Do you think it is better to bring them to the top-level?