Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MKL DSYEVD error when running with twosite_truncation=heev #2

Open
1234zou opened this issue Mar 1, 2021 · 14 comments
Open

MKL DSYEVD error when running with twosite_truncation=heev #2

1234zou opened this issue Mar 1, 2021 · 14 comments

Comments

@1234zou
Copy link

1234zou commented Mar 1, 2021

Hi,

I've performed a DMRG-CASCI(18,18)/cc-pVDZ computation for the tetracene molecule, where (18,18) is just the active space containing Pi bonding and anti-bonding orbitals. Using GVB orbitals as the initial guess (orbital shape similar to Pipek-Mezey localized orbtials), I've compared the results from Block and QCMaquis:

QCMaquis: different values of nsweeps are tested
nsweeps = 5, E = -688.897193 a.u.
nsweeps = 9, E = -688.897265 a.u.
nsweeps = 15, E = -688.897297 a.u.
Block: -688.899740 a.u.

max_bond_dimension = 1000 is used among all calculations. It seems the DMRG-CASCI energy of QCMaquis slowly becomes lower with the increase of nsweeps. Is there any option or keyword to accelerate the convergence (e.g. orbital ordering, do not canonicalize localized orbitals, etc)?

The OpenMolcas input file is attached
tetracene_cc-pVDZ.zip

Thanks for any suggestion!

@kommerck
Copy link
Collaborator

kommerck commented Mar 1, 2021

There are several ways to accelerate convergence in QCMaquis, one recommended way is to use the Fiedler orbital ordering and CI-DEAS. To enable them in the OpenMolcas interface, you may use the Fiedler and CIDEAS keywords of the DMRGSCF module (see https://molcas.gitlab.io/OpenMolcas/sphinx/users.guide/programs/dmrgscf.html). Additional possibility is to use the perturbative correction in the first several sweeps. This can be achieved e.g. with the following QCMaquis input (to be added to the RGInput...EndRG or DMRGSettings...EndDMRGSettings block in OpenMolcas):

nsweeps = 10
ngrowsweeps = 2
nmainsweeps = 3
alpha_initial = 0.0005
alpha_main = 1e-5
alpha_final = 0
twosite_truncation = heev

@1234zou
Copy link
Author

1234zou commented Mar 3, 2021

Thanks for your help @kommerck .

I tried some options with a fixed nsweeps = 9:
E = -688.897265 a.u. (using &RASSCF and RGinput)
E = -688.897250 a.u. (using &DMRGSCF)
E = -688.897254 a.u. (using &DMRGSCF and Fiedler = ON)

These energies differ little. When I tried the perturbative correction, an Intel MKL error occurred

Intel MKL ERROR: Parameter 10 was incorrect on entry to DSYEVD.

We can speculate the error is due to an improper SVD on a matrix, but I do not know how to solve the problem. Or, if there is any other suggestion?

Files are attached. Many thanks.
tetracene_perturb.zip

@kommerck
Copy link
Collaborator

kommerck commented Mar 3, 2021

Unfortunately I cannot reproduce the intel MKL error, your input runs fine for me. Have you compiled OpenMolcas/QCMaquis with ILP64 MKL interface?
Also using Fiedler=ON and perturbative correction (both at the same time), I get an energy of -688.8997325 a.u. after only two sweeps.

@1234zou
Copy link
Author

1234zou commented Mar 6, 2021

Sorry for the delayed feedback. Yes, the OpenMolcas/QCMaquis is compiled with ILP64 MKL interface. I conclude this from

ldd rasscf.exe | grep 'lp'
ldd dmrgscf.exe | grep 'lp'

the results are

libmkl_gf_ilp64.so => /opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_gf_ilp64.so (0x00002af8a5e60000)
libalps.so => /home/jxzou/software/OpenMolcas_q/bin/./../qcmaquis/lib/libalps.so (0x00002af8ac526000)

Then I thought maybe the version of GCC matters, or a re-compilation might solve the MKL error. However, the same error occurs after I tried these boring things. On the other hand, using no perturbative correction, and nsweeps = 40, the energy is -688.897320 a.u.

Could you please tell me your versions of GCC, GSL, HDF5, BOOST, Intel MKL, OpenMolcas and QCMaquis? I want to take a try using your versions. I think maybe versions of MKL or QCMaquis matters.

@kommerck
Copy link
Collaborator

kommerck commented Mar 9, 2021

We test our setup with several Docker images, and so far I'm afraid I was not able to reproduce this issue. Which distribution and versions do you have? This way I could fire up a Docker image and check if I can reproduce it. However, perhaps it's better to open a corresponding OpenMolcas issue re compilation and the error.

@1234zou
Copy link
Author

1234zou commented Mar 9, 2021

Thanks! I've opened an issue in OpenMolcas GitLab, and showed details of my compilation.

@kommerck kommerck closed this as completed Mar 9, 2021
@kommerck kommerck reopened this Mar 16, 2021
@1234zou
Copy link
Author

1234zou commented Mar 16, 2021

Thank you. I used the same input file in tetracene_perturb.zip. All versions of packages are the same as described in 278, the calculation is run on the same node. The only difference is this time I specify LINALG=Internal.

And I downloaded the lapack-3.9.0.tar.gz and unzip it into External/lapack/. If I did not do that, this directory is empty and compilation of OpenMolcas will result

CMake Error at CMakeLists.txt:1861 (message):
   LAPACK+BLAS sources not available, run "/usr/bin/git submodule update --init /home/jxzou/software/OpenMolcas_q1/External/lapack"

But my node cannot access to the Internet. So I manually downloaded lapack-3.9.0.tar.gz and unzip it into External/lapack/. After successful compilation, running ldd dmrgscf.exe|grep lp leads to

        libalps.so => /home/jxzou/software/OpenMolcas_q1/bin/./../qcmaquis/lib/libalps.so (0x00002b73f0d8f000)

And ldd dmrgscf.exe|grep mkl leads to

        /opt/intel/mkl/lib/intel64/libmkl_rt.so (0x00002ba5b19cb000)

So I supposed LINALG=Internal worked. Then the DMRG-CASCI(18,18) energy is -688.897228 a.u., which is still 2 mH higher. Adding Fiedler=ON leads to -688.897226. I've uploaded the output file, which may do some help.
tetracene_perturb1.zip

Sorry for the lengthy descriptions.

@kommerck
Copy link
Collaborator

kommerck commented Mar 16, 2021

With modifying your OpenMolcas input after Gateway/Seward to

&DMRGSCF
ActiveSpaceOptimizer=QCMaquis
Fiedler=ON
OOptimizationSettings
Charge = 0
Spin = 1
RAS2 = 18
nActEl= 18 0 0
FILEORB = tetracene_cc-pVDZ_uhf_gvb42_2CASCI.INPORB
CIonly
EndOOptimizationSettings
DMRGSettings
 conv_thresh = 1E-7
 max_bond_dimension = 1000
 nsweeps = 6
    ngrowsweeps = 2
    nmainsweeps = 3
    alpha_initial = 0.001
    alpha_main = 1e-4
    alpha_final = 0
    twosite_truncation = heev
EndDMRGSettings

I get an energy of -688.8997412270 a.u. after 6 sweeps. Please try this and let me know if you get the same energy.

@1234zou
Copy link
Author

1234zou commented Mar 17, 2021

Thanks. I copy your input and submit two jobs. For the LINALG=MKL version, it leads to the same MKL DSYEVD error. While for the LINALG=Internal version, the result is strange

 Fiedler orbital ordering: 9,10,6,13,3,5,16,14,1,18,1
terminate called after throwing an instance of 'std::runtime_error'
  what():  Number of orbitals in the orbital order does not match the total number of orbitals

Program received signal SIGABRT: Process abort signal.

Maybe this is a truncated line? Files are attached.
tetracene_perturb2.zip

@kommerck
Copy link
Collaborator

Are you using the latest QCMaquis version? Your output shows QCMaquis version 3.0.1, whereas we are at 3.0.3.

@1234zou
Copy link
Author

1234zou commented Mar 17, 2021

Yes, I used QCMaquis 3.0.1, as I said in 278. I'll try 3.0.3.

@1234zou
Copy link
Author

1234zou commented Mar 17, 2021

Hi, QCMaquis-3.0.3 works excellent! By using your recommended input,

for LINALG=Internal, I got -688.899738 a.u. within 6 nsweeps (cost 1h 55min);

for LINALG=MKL, keeping the perturbative correction still leads to MKL DSYEVD error. But remove the perturbative correction, I got -688.899741 a.u. within 6 nsweeps (cost 26min).

I'll use QCMaquis >= 3.0.3, no perturbative correction and LINALG=MKL for OpenMolcas in the future.

By the way, anything updated in QCMaquis-3.0.3 concerning MKL DSYEVD?

@kommerck
Copy link
Collaborator

Which distribution are you using? So far I could not reproduce that error (I know you listed your software version in the OpenMolcas issue, but I'm interested specifically in the distribution so that I can fire up a Docker image to test it).
The DSYEVD call in question is wrapped by Boost numeric bindings, which we provide as part of the ALPS/Boost distribution, so I cannot immagine they could be doing something wrong.

@kommerck kommerck changed the title Calculated DMRG-CASCI energy too high MKL DSYEVD error when running with twosite_truncation=heev Mar 17, 2021
@1234zou
Copy link
Author

1234zou commented Mar 17, 2021

Oh, I just realize that maybe you are asking me the Linux distribution. It's CentOS 7.4.1708. More specifically, the result of command cat /proc/version is

Linux version 3.10.0-693.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Aug 22 21:09:27 UTC 2017

The result of command lsb_release -a is

LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.4.1708 (Core)
Release:        7.4.1708
Codename:       Core

shivupa pushed a commit to shivupa/qcmaquis that referenced this issue Feb 23, 2024
update latest changes for CQ interface
stefabat pushed a commit to stefabat/qcmaquis that referenced this issue Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants