Efficient first principles based modeling via machine learning: from simple representations to high entropy materials

Paper

This is the repo associated for our paper Efficient first principles based modeling via machine learning: from simple representations to high entropy materials (publisher version, arXiv version), which we create a large DFT dataset for HEMs and evaluate the in-distribution and out-of-distribution performance of machine learning models.

DFT dataset for high entropy alloys

Our DFT dataset encompasses bcc and fcc structures composed of eight elements and overs all possible 2- to 7-component alloy systems formed by them. The dataset used in the paper is publicly available on Zenodo, which includes initial and final structures, formation energies, atomic magnetic moments and charges among other attributes.

Note: The trajectory data (energies and forces for structures during the DFT relaxations) is not published with this paper; it will be released later with a work on machine learning force fields for HEMs.

Table: Numbers of alloy systems and structures.

No. components	2	3	4	5	6	7	Total
Alloy systems	28	56	70	56	28	8	246
Ordered (2-8 atoms)	4975	22098	29494	6157	3132	3719	69575
SQS (27, 64, or 128 atoms)	715	3302	3542	4718	1183	762	14222
Ordered+SQS	5690	25400	33036	10875	4315	4481	83797

Number of structures as a function of a given constituent element.

The legend indicates the number of components.

Generalization performance of machine learning models

The data on Zenodo provide the Matminer features of initial and final structures and a demo script to train tree-based models. The results in the paper can be readily reproduced by adapting the demo script for different train-test splits. The codes folder provides the scripts that we used in the paper.

Generalization performance from small to large structures.

(a) Normalized error obtained by training on structures with ≤ N atoms and evaluating on structures with > N atoms. (b) ALIGNN prediction on SQSs with > 27 atoms, obtained by training on structures with ≤ 4 atoms. (c) Parity plot of the ALIGNN prediction on SQSs with > 27 atoms, obtained by training on structures with ≤ 8 atoms.

Generalization performance from low-order to high-order systems.

(a) Normalized error obtained by training on structures with ≤ N elements and evaluating on structures with >N elements. (b) Parity plot of the ALIGNN prediction on structures with ≥ 3 elements, obtained by training on binary structures. (c) Parity plot of the ALIGNN prediction on structures with ≥ 4 elements, obtained by training on binary and ternary structures.

Generalization performance from (near-)equimolar to non-equimolar structures.

(a) Normalized error obtained by training on structures with maxΔc below a given threshold and evaluating on the rest. (b) Predictions on non-equimolar structures (maxΔc>0) by the ALIGNN model trained on equimolar structures (maxΔc=0). (c) Predictions on structures with relatively strong deviation from equimolar composition (maxΔc > 0.2) by the ALIGNN model trained on structures with relatively weak deviation from equimolar composition (maxΔc ≤ 2). maxΔc is defined as the maximum concentration difference between any two elements in a structure.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
codes		codes
figs		figs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient first principles based modeling via machine learning: from simple representations to high entropy materials

Paper

DFT dataset for high entropy alloys

Table: Numbers of alloy systems and structures.

Number of structures as a function of a given constituent element.

Generalization performance of machine learning models

Generalization performance from small to large structures.

Generalization performance from low-order to high-order systems.

Generalization performance from (near-)equimolar to non-equimolar structures.

Effects of dataset size and use of unrelaxed vs. relaxed structures

Overview of model performance on different generalization tasks

About

Releases

Packages

Languages

mathsphy/high-entropy-alloys-dataset-ML

Folders and files

Latest commit

History

Repository files navigation

Efficient first principles based modeling via machine learning: from simple representations to high entropy materials

Paper

DFT dataset for high entropy alloys

Table: Numbers of alloy systems and structures.

Number of structures as a function of a given constituent element.

Generalization performance of machine learning models

Generalization performance from small to large structures.

Generalization performance from low-order to high-order systems.

Generalization performance from (near-)equimolar to non-equimolar structures.

Effects of dataset size and use of unrelaxed vs. relaxed structures

Overview of model performance on different generalization tasks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages