Skip to content

DFT dataset and machine learning models for high entropy alloys

Notifications You must be signed in to change notification settings

mathsphy/high-entropy-alloys-dataset-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 

Repository files navigation

Efficient first principles based modeling via machine learning: from simple representations to high entropy materials

Paper

This is the repo associated for our paper Efficient first principles based modeling via machine learning: from simple representations to high entropy materials (publisher version, arXiv version), which we create a large DFT dataset for HEMs and evaluate the in-distribution and out-of-distribution performance of machine learning models.

DFT dataset for high entropy alloys DOI

Our DFT dataset encompasses bcc and fcc structures composed of eight elements and overs all possible 2- to 7-component alloy systems formed by them. The dataset used in the paper is publicly available on Zenodo, which includes initial and final structures, formation energies, atomic magnetic moments and charges among other attributes.

Note: The trajectory data (energies and forces for structures during the DFT relaxations) is not published with this paper; it will be released later with a work on machine learning force fields for HEMs.

Table: Numbers of alloy systems and structures.

No. components 2 3 4 5 6 7 Total
Alloy systems 28 56 70 56 28 8 246
Ordered (2-8 atoms) 4975 22098 29494 6157 3132 3719 69575
SQS (27, 64, or 128 atoms) 715 3302 3542 4718 1183 762 14222
Ordered+SQS 5690 25400 33036 10875 4315 4481 83797

Number of structures as a function of a given constituent element.

The legend indicates the number of components.

image

Generalization performance of machine learning models

The data on Zenodo provide the Matminer features of initial and final structures and a demo script to train tree-based models. The results in the paper can be readily reproduced by adapting the demo script for different train-test splits. The codes folder provides the scripts that we used in the paper.

Generalization performance from small to large structures.

image
(a) Normalized error obtained by training on structures with ≤ N atoms and evaluating on structures with > N atoms. (b) ALIGNN prediction on SQSs with > 27 atoms, obtained by training on structures with ≤ 4 atoms. (c) Parity plot of the ALIGNN prediction on SQSs with > 27 atoms, obtained by training on structures with ≤ 8 atoms.

  

Generalization performance from low-order to high-order systems.

image
(a) Normalized error obtained by training on structures with ≤ N elements and evaluating on structures with >N elements. (b) Parity plot of the ALIGNN prediction on structures with ≥ 3 elements, obtained by training on binary structures. (c) Parity plot of the ALIGNN prediction on structures with ≥ 4 elements, obtained by training on binary and ternary structures.

  

Generalization performance from (near-)equimolar to non-equimolar structures.

image
(a) Normalized error obtained by training on structures with maxΔc below a given threshold and evaluating on the rest. (b) Predictions on non-equimolar structures (maxΔc>0) by the ALIGNN model trained on equimolar structures (maxΔc=0). (c) Predictions on structures with relatively strong deviation from equimolar composition (maxΔc > 0.2) by the ALIGNN model trained on structures with relatively weak deviation from equimolar composition (maxΔc ≤ 2). maxΔc is defined as the maximum concentration difference between any two elements in a structure.

  

Effects of dataset size and use of unrelaxed vs. relaxed structures

image

  

Overview of model performance on different generalization tasks

image

About

DFT dataset and machine learning models for high entropy alloys

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages