Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change how electrode flow retrieves charge density data #1055

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

esoteric-ephemera
Copy link
Contributor

Small change in the electrode insertion workflow to avoid needing large object storage: Currently there's a get_charge_density_job function which does not point its output to e.g. GridFS. That causes problems when trying to store a CHGCAR in normal mongo stores

I've moved the get_charge_density_job within get_inserted_structures so that the charge density is not stored when generating inserted structures (it still may be stored in the user's GridFS from the static calc)

Another solution would be just adding a data=[Chgcar] kwarg to the get_charge_density_job function, but I think it makes more sense to remove get_charge_density_job

@JaGeo
Copy link
Member

JaGeo commented Nov 12, 2024

@esoteric-ephemera does it make sense to add the data to tge datastore for other parts of the workflow? It will surely fail at some point

@esoteric-ephemera
Copy link
Contributor Author

It should work in this setup because the charge density is only used in one part of the flow (get_inserted_structures). If the user wanted to retrieve their charge densities for analysis, the host structure static calculation should store them in the datastore as well (this static is required in the electrode flow)

Maybe @jmmshn has thoughts on this - don't want to break the flow logic

@@ -204,7 +202,8 @@ def get_insertion_electrode_doc(

@job
def get_inserted_structures(
chg: VolumetricData,
prev_dir: Path | str,
get_charge_density: Callable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
get_charge_density: Callable,
get_charge_density: Callable[[str | Path, VolumetricData]

@@ -213,7 +212,8 @@ def get_inserted_structures(

Parameters
----------
chg: The charge density.
prev_dir: The previous directory where the static calculation was performed.
get_charge_density: A function to get the charge density from a task document.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
get_charge_density: A function to get the charge density from a task document.
get_charge_density: A function to get the charge density from a run directory.

@jmmshn
Copy link
Contributor

jmmshn commented Nov 14, 2024

@esoteric-ephemera, copying the file over was kind of a fireworks-related hack.
When this was added, whenever there was a large object that was the input of the job. The UI that rendered the fireworks wf would try to render the entire thing which basically crashed it every time. So you might want to confirm that this is no longer the case.

@esoteric-ephemera
Copy link
Contributor Author

Thanks for the suggestions @janosh, wasn't aware of the type hinting for Callable!

Thanks, @jmmshn. This is probably a safer approach then, since get_inserted_structures no longer takes VolumetricData as input, just the path to the previous calc and a function to parse it

Copy link

codecov bot commented Nov 25, 2024

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 4.13%. Comparing base (4244da9) to head (59b7f26).

Files with missing lines Patch % Lines
src/atomate2/common/jobs/electrode.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #1055       +/-   ##
==========================================
- Coverage   72.82%   4.13%   -68.69%     
==========================================
  Files         187     187               
  Lines       13637   13634        -3     
  Branches     1370    1370               
==========================================
- Hits         9931     564     -9367     
- Misses       3161   13039     +9878     
+ Partials      545      31      -514     
Files with missing lines Coverage Δ
src/atomate2/common/jobs/electrode.py 0.00% <0.00%> (-84.54%) ⬇️

... and 166 files with indirect coverage changes

@jmmshn
Copy link
Contributor

jmmshn commented Jan 7, 2025

Thanks for the suggestions @janosh, wasn't aware of the type hinting for Callable!

Thanks, @jmmshn. This is probably a safer approach then, since get_inserted_structures no longer takes VolumetricData as input, just the path to the previous calc and a function to parse it

@esoteric-ephemera , it looks like current version of get_charge_density_job

def get_charge_density_job(

breaks for large objects because it need to send the entire charge density to the jobstore.
I think this can be fixed by adding data=True but we will run into the same problem with jobflow + Firework,
since you now have the following charge density in the input:

https://github.com/materialsproject/atomate2/blob/95ea0600e00bcded2a0cb6cfe6d190fa6a980c39/src/atomate2/common/jobs/electrode.py#L97C9-L97C16

So I don't see any way of not causing the problem with fireworks where the object is deserialized during viewing of the job in the webgui. Maybe it's best to move on completely from fireworks anyways.

So I think the choices are:

  1. Go back to the old implementation where the charge density is never "sent" between jobs so it never appears as in input.
  2. Just forget about any problems with view in fireworks... I have not used it in months and month and we should just worry about jobflow-remote.

I think we are already down the path of 2 mostly anyways so I can just complete refactor out the get_charge_density_job and send the charge density from static calculation directly.

@esoteric-ephemera
Copy link
Contributor Author

Hey @jmmshn, the way I've rewritten it in the PR shouldn't hit either issue you've identified - it just slightly tweaks the electrode flow to take the path of the charge density file and a function as args. Should not have issues with storing large objects in Mongo, nor with rendering large objects in the fireworks webgui

We're still using fireworks, and I don't see a strong case to store the charge density in a database. Even in that case, the user can write a custom get_charge_density function that pulls from a database directly

@jmmshn
Copy link
Contributor

jmmshn commented Jan 7, 2025

So in my testing the following job fails with a "Document too large" error.

def get_charge_density_job(
prev_dir: Path | str,
get_charge_density: Callable,
) -> VolumetricData:
"""Get the charge density from a task document.
Parameters
----------
prev_dir: The previous directory where the static calculation was performed.
get_charge_density: A function to get the charge density from a task document.
Returns
-------
The charge density.
"""
return get_charge_density(prev_dir)

My current understanding of what is happening is that:

get_charge_density_job calls the VASP version of the function:

def get_charge_density(self, prev_dir: Path | str) -> VolumetricData:
"""Get the charge density of a structure.
Parameters
----------
prev_dir:
The previous directory where the static calculation was performed.
Returns
-------
The charge density.
"""
prev_dir = Path(strip_hostname(prev_dir))
aeccar0 = Chgcar.from_file(prev_dir / "AECCAR0.gz")
aeccar2 = Chgcar.from_file(prev_dir / "AECCAR2.gz")
return aeccar0 + aeccar2

So it will produced a document that contains a VolumetricData/Chgcar object.

It looks like the fact that this result does not have a data=[Chgcar, VolumetricData] in the decorator is the reason I'm seeing the failures.

PS: Thanks for the fast reply!

@esoteric-ephemera
Copy link
Contributor Author

Yeah exactly what I've observed - adding Chgcar and VolumetricData is to the data field of job should work as well, but is there a need for storing the charge density?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants