This is the first year the MSE track participates in an OAEI campaign. We are proud to be part of it and would like to thank everyone who has supported us along the way!
We performed the evaluation using the MELT platform on a Windows 10 system with 16 GB RAM. For the time being, no background knowledge was used for evaluation. Every participating system was executed in its standard setting and we compare precision, recall and F-measure as well as the computation time.
This year five systems registered on this track, each of which was used for evaluation with the three test cases of the MSE benchmark. AMD produced errors and an empty alignment file, so results are only available for four of the matchers:
- A-LIOn
- LogMap
- LogMapLite
- Matcha
We have collected all generated alignments and make them available in a zip-file via the following link. These alignments are the raw results that the following report is based on.
The following results were obtained for the participating matching systems in the OAEI campaign 2022 within the MSE track.
System | Correspondences | Precision | Recall | F1-Measure | Time [s] |
---|---|---|---|---|---|
A-LIOn | 23 | 0.130 | 0.130 | 0.130 | 38 |
LogMap | 1 | 1.000 | 0.043 | 0.083 | 9 |
LogMapLight | 5 | 0.400 | 0.087 | 0.143 | 27 |
Matcha | 4 | 0.000 | 0.000 | 0.000 | 22 |
The first test case evaluates matching systems regarding their capability to find "equal" (=), "superclass" (>) and "subclass" (<) correspondences between the mid-sized MatOnto and the small-sized (since reduced) MaterialInformation ontology. None of the evaluated systems finds correspondences other than "equal" (=). All evaluated systems compute the alignment in less than a minute. LogMap stands out for its very fast calculation time of 9s and the maximum precision of 1.0. However, since only one correspondence was found by LogMap, the recall and hence the F1-measure is low (0.083). In direct comparison, LogMapLight calculates the alignment in three times the time and achieves much lower precision (0.4) but due to a greater amount of correctly found correspondences the F1-measure is the best of the tested systems in the first test case - although still low with 0.142. A-LIOn finds the highest number of correspondences, but of those 23 found correspondences 20 are false positives which results in the second best F1-measure at the slowest pace.
System | Correspondences | Precision | Recall | F1-Measure | Time [s] |
---|---|---|---|---|---|
A-LIOn | 163 | 0.387 | 0.209 | 0.271 | 208 |
LogMap | 67 | 0.881 | 0.195 | 0.320 | 3 |
LogMapLight | 67 | 0.851 | 0.189 | 0.309 | 83 |
Matcha | 6 | 0.000 | 0.000 | 0.000 | 15 |
The second test case evaluates the matching systems to find correspondences between the large-sized MaterialInformation and the mid-sized BFO-based MatOnto. In comparison to the first test case, two of the four evaluated systems (A-LIOn, Matcha) need much longer to calculate the alignment, suprisingly two of the systems are even quicker (LogMap, LogMapLight) than in the first test case. A-LIOn finds a large number of correspondences, hence has the highest recall of the evaluated systems but 100 out of the 163 found correspondences are incorrect which results in a moderate F1-measure of 0.271 and a rather slow calculation time of over 3 minutes. LogMap stands out again for its very fast computation time of only 3s at a high precision of 0.881. Since LogMap found only 59 correct correspondences out of the 302 reference correspondences, the recall is rather low but the F1-measure is still the highest of the tested systems. LogMapLight is almost 30 times slower than LogMap but finds the same amount of correspondences with 2 additional false positives, so it achieves a slightly lower overall F1-measure than LogMap.
System | Correspondences | Precision | Recall | F1-Measure | Time [s] |
---|---|---|---|---|---|
A-LIOn | 0 | 0.000 | 0.000 | 0.000 | 135 |
LogMap | 56 | 0.946 | 0.841 | 0.891 | 14 |
LogMapLight | 56 | 0.911 | 0.810 | 0.857 | 84 |
Matcha | 4 | 0.500 | 0.032 | 0.060 | 21 |
The third test case evaluates matching systems to find correspondences between the large-sized MaterialInformation and the mid-sized EMMO. All evaluated systems compute the alignments in under 3 minutes. Surprisingly, A-LIOn takes the longest to compute the alignments but does not find any correspondence which might be due to some reasoning errors that were produced for EMMO. LogMap again stands out for the fast computation time and high precision with 53 correct correspondences out of the 56 in total. Although LogMap misses out 10 reference correspondences, the F1-measure of 0.891 is the best of the whole MSE track. LogMapLight is 6 times slower than LogMap with a slightly lower precision and 2 additional false positives. Due to 2 additional false negatives, LogMapLight achieves a slightly worse F1-measure of 0.857 - but still the second best of the whole MSE track. Positively surprising, Matcha finds at least 2 correct correspondences out of the 63 reference correspondences and earns its non-zero recall with a fair precision of 0.5 and a rather fast calculation time of 21s.
Unfortunately, none of the evaluated matcher finds all reference correspondences correctly.
LogMap stands out for its very fast computing speed with very high precision at the same time. LogMapLight is significantly slower in every test case and almost constantly shows worse results - only in the first test case the recall of LogMapLight is higher than for LogMap. In our opinion, LogMap is definitely recommended for MSE applications where high precision is demanded. In comparison to that, LogMapLight does not appear to bring any decisive advantage over LogMap.
A-LIOn produces moderate results but does not bring any advantage over LogMap. Furthermore, A-LIOn produces errors while reasoning on EMMO. The latter is the only one of the MSE ontologies used with a significant proportion of essential axioms. According to the annotations in EMMO, this ontology exclusively can be inferred with the FaCT++ reasoner. That might be a cause for the occuring reasoning errors of A-LIOn and bad results in the third test case.
Since the creation of this MSE benchmark, a large amount of new MSE ontologies have been developed and utilized in various applications. In contrast to the early development stages of this benchmark, those ontologies are now easily accessible on the new Matportal. In the future, the mse benchmark should be updated with the currently most used top and mid-level MSE ontologies, which include the BWMD-mid, the MSEO, the PMDco, the prov-o. Apart from also considering frequently used domain and application ontologies like the BWMD-domain, the LPBFO and others, also multi-ontology matching, knowledge graph mathing e.g. using the AluTrace data and the usage of background knowledge should be considered in future OAEI campaigns.
This track is organized by Engy Nasr and evaluation was performed by Martin Huschka. If anything is unclear, if you have any comments on the MSE track or would like to contribute, feel free to write an email to huschka [at] emi [.] fraunhofer [.] de.