Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As per offline discussion with @nerkulec, this PR seeks to improve the ACSD by
The ACSD up to now operates on the assumption that for two distinct grid-points, both electronic and ionic local structure should be the same; it measures the distance between reality and idealized case to this effect. As @nerkulec has pointed out, there are two problems with the current approach:
Regarding 2.: the ACSD also penalizes for cases where the bispectrum descriptors are more dissimilar than the LDOS vectors, which it shouldn't. The only case we care about is the same bispectrum descriptors yielding different LDOS vectors (i.e., bispectrum descriptors are less dissimilar than LDOS vectors). This can be fixed by simply clipping the distance metrics rather than taking their absolute value, and has already been implemented here.
Regarding 1.: The cosine similarity is not the best metric for this kind of investigation. We compare similarities on two different vector spaces, and the cosine similarity is not equipped for that. There seem to be approaches to adapt it (e.g., here), but if I understand correctly, that requires to solve an eigenvalue problem. One may also think about the Jaccard distance, since the bispectrum descriptors are not really a vector denoting a direction but more a set of coefficients; of course, the Jaccard distance is designed for binary sets, so we would have to adapt this to float values. At any rate, I think @nerkulec already had a good idea for this problem, but when we talked about this in person, I didn't take proper notes. Could you maybe share your idea here again and/or implement it?