E-Materials Data
The Time Has Come for a New Approach
Increasing the availability of materials data, especially data from the use of ASTM International test methods.
Given that today the Internet allows you to find virtually any kind of information on virtually any subject, why is it still difficult to find test data for so many common materials? These data are generated almost exclusively in electronic formats, but unlike most areas of science, there are few large collections of test data available for metals, ceramics, polymers, construction materials, composites and nanomaterials. The collections that do exist are incomplete, rarely up-to-date and contain data of unknown quality. Further, there are no data repositories for newly generated data and very few standards to share such data if desired.
Some examples of successful test data repositories include:
- BioGRID, an online interaction repository of more than 700,000 protein and genetic interactions of major biological species;
- The European EarthServer, a coordinated collection of more than 11 terabytes of data on the cryospheric, airborne, atmospheric, geologic, oceanographic and planetary sciences; and
- The Tree of Life, an international collaborative effort of biologists and nature enthusiasts with more than 10,000 web pages providing information about biodiversity, the characteristics of different groups of organisms and their phylogeny.
In today's world of big data, the lack of materials data is an anomaly, and ASTM International, as a leading provider of standard test methods and practices, is studying what it can do to help improve the accessibility of materials data, especially those generated by its test methods. We want to bring you up to date on what is going on as well as to encourage you and your committees to think about how to get involved.
First Steps
In October 2012, ASTM International participated in an exploratory intersociety meeting sponsored by the Materials Genome Initiative, which brought together more than 20 materials societies to explore potential actions to work together on increasing the availability of materials data. As a result of that meeting, ASTM began an internal effort to survey what is happening with its committees and what cross-committee activities are needed to maintain ASTM leadership in materials testing in an electronic world.
During the November 2013 ASTM International committee week in Jacksonville, Fla., ASTM held a workshop on digitizing materials test data. The meeting was attended by representatives from 14 technical committees. I provided an overview of today's materials data activities. Brian Hall gave a presentation on ASTM E2215, Practice for Evaluation of Surveillance Capsules from Light-Water Moderated Nuclear Power Reactor Vessels, which contains a section on electronic data formatting. From the vigorous discussion at the workshop, ASTM is interested in exploring actions that will improve the accessibility of materials data.
The following possible actions were discussed:
- Developing recommendations for recording and collecting results from ASTM test methods;
- Providing guidelines and practices to share materials test data when desired;
- Preserving data validating ASTM test methods, guides and practices;
- Working with groups establishing repositories of materials test results; and
- Improving communications within the materials informatics community.
Developing Recommendations
The first step in improving the computerization of materials test results, especially those generated using ASTM International test methods, is to have recording formats included in each ASTM test method procedure. Test method committees are in the best position to determine what information is important to record and report, whether in an informative annex or as a normative section.
Guidelines for the type of data and metadata that should be included had previously been developed by the now-disbanded ASTM Committee E49 on Computerized Systems and Chemical and Material Information. ASTM is also considering the establishment of a cross-committee coordinating group to update those guidelines to reflect current practice. Examples of such guidelines are given in E2215, mentioned above.
Providing Guidelines
ASTM International and its committees recognize that sharing materials test result data has to be the decision of the groups that have generated and owned such data. For example, product manufacturers have often purchased material tested by independent test houses. Designers want to automatically send test results to analysis codes. Production planners want to send material requirements to purchasing agents. These and other groups have a need to share data, and there is great utility in establishing guidelines and practices for data sharing in general.
The largest effort for practices of sharing materials data is the effort of the International Organization for Standardization Technical Committee 184 on Automation Systems and Integration/Subcommittee 4 on Product Data, also known as ISO STEP (The Standard for Exchanging Product Data, ISO 10303, Industrial Automation Systems and Integration - Product Data Representation and Exchange). Though the capability of sharing materials with other computerized engineering activities was included in ISO 10303, this technology has never been implemented. The other prominent effort has been MatML (material markup language), as based on XML. MatML has not been fully supported over the last 10 years and appears to be used minimally. Over the past three years, the European Commission for standardization has sponsored two projects designed to revive materials data sharing efforts, but neither effort appears ready to be adopted by a broad user community.
Opportunities exist for ASTM to continue working within the Materials Genome Initiative framework to identify the need for the next generation materials data exchange technology and how materials test data needs to be part of whatever new technology emerges.
Preserving Data
Each of ASTM International's test methods is based on analysis of extensive test results performed by experts involved in the development of the standard test method. These data can be an invaluable collection of quality data that not only document the standard test method but also provide insight to material scientists and engineers designing new or improving existing materials.
Today most of these data get archived in paper or electronic records that are not easily available. Modern database routines and data repositories make preserving these data much easier (see below). ASTM technical committees should be thinking about how to preserve existing data collections and how best to capture similar data in the future.
Working with Other Groups
Professional societies, such as ASM International and the American Ceramic Society, and national programs such as the Materials Genome Initiative, are exploring how to create modern materials data repositories. Many barriers exist, including proprietary issues, the commercial value of materials data (e.g., the materials data handbook and online industries), the effort to document and deposit materials data, and the cost of maintaining a materials repository over the long term (decades and more).
Regardless, these groups are moving ahead, and ASTM International and its committees, which are responsible for many of the standard test methods that are used to generate the data, must play a role in developing the guidelines for recording and depositing such data. ASTM committees have the expertise to understand what is important and should be actively involved in the process of setting up repositories of data resulting from their standard test methods.
Improving Communications
There is renewed interest in the materials community in collecting and making available materials data of all types, from data typically found on routine materials test certificates to fundamental properties needed to design novel nanomaterials. Some of these efforts are top-down, such as the Materials Genome Initiative started in the United States and now being emulated in Europe and Asia. Other projects are bottom-up and reflect the interest of individual researchers in taking advantage of new data science tools as applied to materials. Regardless of the scale, the sharing of experiences in building, disseminating and exploiting materials data brings together the materials data community and allows knowledge about materials data to spread.
ASTM is exploring reconstituting its successful Computerization and Networking of Materials Databases Symposium Series (which in the past resulted in ASTM Special Technical Publications 1017, 1106, 1140, 1257 and 1311) that provided an international forum for sharing advances in materials data.
Materials Data, You and Your Technical Committee
Each of the possible actions described above must involve individual experts, relevant committees and ASTM International as an organization. ASTM is very interested in your thoughts about priorities as well as other possible activities. Please contact
John Rumble is president of R&R Data Services, a consulting service in Gaithersburg, Md. He specializes in planning and implementing a wide variety of data activities. Rumble was chair of the now-disbanded ASTM Committee E49 on Computerized Systems and Chemical and Material Information. He has been involved in many materials data activities over the last three decades.