MKL DSYEVD error when running with twosite_truncation=heev #2

1234zou · 2021-03-01T06:02:40Z

Hi,

I've performed a DMRG-CASCI(18,18)/cc-pVDZ computation for the tetracene molecule, where (18,18) is just the active space containing Pi bonding and anti-bonding orbitals. Using GVB orbitals as the initial guess (orbital shape similar to Pipek-Mezey localized orbtials), I've compared the results from Block and QCMaquis:

QCMaquis: different values of nsweeps are tested
nsweeps = 5, E = -688.897193 a.u.
nsweeps = 9, E = -688.897265 a.u.
nsweeps = 15, E = -688.897297 a.u.
Block: -688.899740 a.u.

max_bond_dimension = 1000 is used among all calculations. It seems the DMRG-CASCI energy of QCMaquis slowly becomes lower with the increase of nsweeps. Is there any option or keyword to accelerate the convergence (e.g. orbital ordering, do not canonicalize localized orbitals, etc)?

The OpenMolcas input file is attached
tetracene_cc-pVDZ.zip

Thanks for any suggestion!

The text was updated successfully, but these errors were encountered:

kommerck · 2021-03-01T10:38:54Z

There are several ways to accelerate convergence in QCMaquis, one recommended way is to use the Fiedler orbital ordering and CI-DEAS. To enable them in the OpenMolcas interface, you may use the Fiedler and CIDEAS keywords of the DMRGSCF module (see https://molcas.gitlab.io/OpenMolcas/sphinx/users.guide/programs/dmrgscf.html). Additional possibility is to use the perturbative correction in the first several sweeps. This can be achieved e.g. with the following QCMaquis input (to be added to the RGInput...EndRG or DMRGSettings...EndDMRGSettings block in OpenMolcas):

nsweeps = 10
ngrowsweeps = 2
nmainsweeps = 3
alpha_initial = 0.0005
alpha_main = 1e-5
alpha_final = 0
twosite_truncation = heev

1234zou · 2021-03-03T03:15:04Z

Thanks for your help @kommerck .

I tried some options with a fixed nsweeps = 9:
E = -688.897265 a.u. (using &RASSCF and RGinput)
E = -688.897250 a.u. (using &DMRGSCF)
E = -688.897254 a.u. (using &DMRGSCF and Fiedler = ON)

These energies differ little. When I tried the perturbative correction, an Intel MKL error occurred

Intel MKL ERROR: Parameter 10 was incorrect on entry to DSYEVD.

We can speculate the error is due to an improper SVD on a matrix, but I do not know how to solve the problem. Or, if there is any other suggestion?

Files are attached. Many thanks.
tetracene_perturb.zip

kommerck · 2021-03-03T20:02:36Z

Unfortunately I cannot reproduce the intel MKL error, your input runs fine for me. Have you compiled OpenMolcas/QCMaquis with ILP64 MKL interface?
Also using Fiedler=ON and perturbative correction (both at the same time), I get an energy of -688.8997325 a.u. after only two sweeps.

1234zou · 2021-03-06T14:13:25Z

Sorry for the delayed feedback. Yes, the OpenMolcas/QCMaquis is compiled with ILP64 MKL interface. I conclude this from

ldd rasscf.exe | grep 'lp'
ldd dmrgscf.exe | grep 'lp'

the results are

libmkl_gf_ilp64.so => /opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_gf_ilp64.so (0x00002af8a5e60000)
libalps.so => /home/jxzou/software/OpenMolcas_q/bin/./../qcmaquis/lib/libalps.so (0x00002af8ac526000)

Then I thought maybe the version of GCC matters, or a re-compilation might solve the MKL error. However, the same error occurs after I tried these boring things. On the other hand, using no perturbative correction, and nsweeps = 40, the energy is -688.897320 a.u.

Could you please tell me your versions of GCC, GSL, HDF5, BOOST, Intel MKL, OpenMolcas and QCMaquis? I want to take a try using your versions. I think maybe versions of MKL or QCMaquis matters.

kommerck · 2021-03-09T11:02:49Z

We test our setup with several Docker images, and so far I'm afraid I was not able to reproduce this issue. Which distribution and versions do you have? This way I could fire up a Docker image and check if I can reproduce it. However, perhaps it's better to open a corresponding OpenMolcas issue re compilation and the error.

1234zou · 2021-03-09T13:12:42Z

Thanks! I've opened an issue in OpenMolcas GitLab, and showed details of my compilation.

1234zou · 2021-03-16T12:41:40Z

Thank you. I used the same input file in tetracene_perturb.zip. All versions of packages are the same as described in 278, the calculation is run on the same node. The only difference is this time I specify LINALG=Internal.

And I downloaded the lapack-3.9.0.tar.gz and unzip it into External/lapack/. If I did not do that, this directory is empty and compilation of OpenMolcas will result

CMake Error at CMakeLists.txt:1861 (message):
   LAPACK+BLAS sources not available, run "/usr/bin/git submodule update --init /home/jxzou/software/OpenMolcas_q1/External/lapack"

But my node cannot access to the Internet. So I manually downloaded lapack-3.9.0.tar.gz and unzip it into External/lapack/. After successful compilation, running ldd dmrgscf.exe|grep lp leads to

        libalps.so => /home/jxzou/software/OpenMolcas_q1/bin/./../qcmaquis/lib/libalps.so (0x00002b73f0d8f000)

And ldd dmrgscf.exe|grep mkl leads to

        /opt/intel/mkl/lib/intel64/libmkl_rt.so (0x00002ba5b19cb000)

So I supposed LINALG=Internal worked. Then the DMRG-CASCI(18,18) energy is -688.897228 a.u., which is still 2 mH higher. Adding Fiedler=ON leads to -688.897226. I've uploaded the output file, which may do some help.
tetracene_perturb1.zip

Sorry for the lengthy descriptions.

kommerck · 2021-03-16T17:16:01Z

With modifying your OpenMolcas input after Gateway/Seward to

&DMRGSCF
ActiveSpaceOptimizer=QCMaquis
Fiedler=ON
OOptimizationSettings
Charge = 0
Spin = 1
RAS2 = 18
nActEl= 18 0 0
FILEORB = tetracene_cc-pVDZ_uhf_gvb42_2CASCI.INPORB
CIonly
EndOOptimizationSettings
DMRGSettings
 conv_thresh = 1E-7
 max_bond_dimension = 1000
 nsweeps = 6
    ngrowsweeps = 2
    nmainsweeps = 3
    alpha_initial = 0.001
    alpha_main = 1e-4
    alpha_final = 0
    twosite_truncation = heev
EndDMRGSettings

I get an energy of -688.8997412270 a.u. after 6 sweeps. Please try this and let me know if you get the same energy.

1234zou · 2021-03-17T02:37:28Z

Thanks. I copy your input and submit two jobs. For the LINALG=MKL version, it leads to the same MKL DSYEVD error. While for the LINALG=Internal version, the result is strange

 Fiedler orbital ordering: 9,10,6,13,3,5,16,14,1,18,1
terminate called after throwing an instance of 'std::runtime_error'
  what():  Number of orbitals in the orbital order does not match the total number of orbitals

Program received signal SIGABRT: Process abort signal.

Maybe this is a truncated line? Files are attached.
tetracene_perturb2.zip

kommerck · 2021-03-17T08:46:09Z

Are you using the latest QCMaquis version? Your output shows QCMaquis version 3.0.1, whereas we are at 3.0.3.

1234zou · 2021-03-17T09:04:05Z

Yes, I used QCMaquis 3.0.1, as I said in 278. I'll try 3.0.3.

1234zou · 2021-03-17T14:19:53Z

Hi, QCMaquis-3.0.3 works excellent! By using your recommended input,

for LINALG=Internal, I got -688.899738 a.u. within 6 nsweeps (cost 1h 55min);

for LINALG=MKL, keeping the perturbative correction still leads to MKL DSYEVD error. But remove the perturbative correction, I got -688.899741 a.u. within 6 nsweeps (cost 26min).

I'll use QCMaquis >= 3.0.3, no perturbative correction and LINALG=MKL for OpenMolcas in the future.

By the way, anything updated in QCMaquis-3.0.3 concerning MKL DSYEVD?

kommerck · 2021-03-17T14:31:59Z

Which distribution are you using? So far I could not reproduce that error (I know you listed your software version in the OpenMolcas issue, but I'm interested specifically in the distribution so that I can fire up a Docker image to test it).
The DSYEVD call in question is wrapped by Boost numeric bindings, which we provide as part of the ALPS/Boost distribution, so I cannot immagine they could be doing something wrong.

1234zou · 2021-03-17T15:17:05Z

Oh, I just realize that maybe you are asking me the Linux distribution. It's CentOS 7.4.1708. More specifically, the result of command cat /proc/version is

Linux version 3.10.0-693.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Aug 22 21:09:27 UTC 2017

The result of command lsb_release -a is

LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.4.1708 (Core)
Release:        7.4.1708
Codename:       Core

update latest changes for CQ interface

kommerck closed this as completed Mar 9, 2021

kommerck reopened this Mar 16, 2021

kommerck changed the title ~~Calculated DMRG-CASCI energy too high~~ MKL DSYEVD error when running with twosite_truncation=heev Mar 17, 2021

shivupa pushed a commit to shivupa/qcmaquis that referenced this issue Feb 23, 2024

Merge pull request qcscine#2 from stknecht/devel

de80d08

update latest changes for CQ interface

stefabat pushed a commit to stefabat/qcmaquis that referenced this issue Jul 24, 2024

fix reference values qcscine#2

69f1d72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MKL DSYEVD error when running with twosite_truncation=heev #2

MKL DSYEVD error when running with twosite_truncation=heev #2

1234zou commented Mar 1, 2021

kommerck commented Mar 1, 2021

1234zou commented Mar 3, 2021

kommerck commented Mar 3, 2021

1234zou commented Mar 6, 2021

kommerck commented Mar 9, 2021

1234zou commented Mar 9, 2021

1234zou commented Mar 16, 2021

kommerck commented Mar 16, 2021 •

edited

Loading

1234zou commented Mar 17, 2021

kommerck commented Mar 17, 2021

1234zou commented Mar 17, 2021

1234zou commented Mar 17, 2021

kommerck commented Mar 17, 2021

1234zou commented Mar 17, 2021

MKL DSYEVD error when running with twosite_truncation=heev #2

MKL DSYEVD error when running with twosite_truncation=heev #2

Comments

1234zou commented Mar 1, 2021

kommerck commented Mar 1, 2021

1234zou commented Mar 3, 2021

kommerck commented Mar 3, 2021

1234zou commented Mar 6, 2021

kommerck commented Mar 9, 2021

1234zou commented Mar 9, 2021

1234zou commented Mar 16, 2021

kommerck commented Mar 16, 2021 • edited Loading

1234zou commented Mar 17, 2021

kommerck commented Mar 17, 2021

1234zou commented Mar 17, 2021

1234zou commented Mar 17, 2021

kommerck commented Mar 17, 2021

1234zou commented Mar 17, 2021

kommerck commented Mar 16, 2021 •

edited

Loading