Automation, advanced instrumentation and high-performance computing have revolutionized science through an exponential growth in data.
Materials scientists, chemists and engineers seeking to discover next-generation materials for energy are stymied by this abundance of information. “Big data” has an intoxicating allure, since the answers are in there — somewhere. But finding them requires new tools, techniques and approaches specifically designed for large datasets.
“In science today, people have big datasets collected from computers, instruments, microscopes,” said Jim Pfaendtner, associate professor of chemical engineering at the University of Washington. “It’s not the amount of data we can collect that’s the limiting factor. Now the limiting factor is data-handling.”
Removing the logjam requires an overhaul in educating young scientists, and Pfaendtner is leading a new endeavor funded by the National Science Foundation, via its new flagship National Research Traineeship (NRT) program, to bring big data to graduate education in clean energy research at the UW. Known as DIRECT — or Data Intensive Research Enabling Clean Technologies — this traineeship will phase in practical, data-driven research projects for graduate students in fields such as chemistry, renewable energy and chemical engineering.
“There’s been a recent ‘explosion’ of data in these fields, and we need new approaches to help our graduate students grow into data-intensive researchers in these subjects,” said Pfaendtner, who is also a member of the UW’s Clean Energy Institute and the Molecular Engineering & Sciences Institute.
Masters and doctoral students in four UW departments — materials science and engineering, chemistry, chemical engineering and human centered design and engineering — will participate in DIRECT, as well as the Clean Energy Institute (CEI), the Molecular Engineering & Sciences Institute (MolES) and the eScience Institute. The program will match students with short, goal-driven projects in renewable energy or materials science early in their graduate education.
This project will not replace independent thesis or dissertation research. Instead, students will work temporarily on a big-data project already underway at UW or a partner institution. Graduate students will learn as they go how to handle, organize and analyze large datasets, both furthering the project and boosting their analysis toolkits for their own master’s or doctoral research projects.
“These are not classroom exercises. These are not simulations. These projects will support ongoing research,” said Pfaendtner. “Graduate students completing our classroom training will be ready to do this, and learn as they go from senior scientists.”
For example, one graduate student could help develop machine-learning approaches to predict the properties of new materials that have not yet been produced. Her classmate might explore new methods to synthesize the next generation of light-harvesting solar cells. Pfaendtner envisions pairing students with projects that fit their interests, though he stresses that the skills they would acquire would be applicable across the physical and engineering sciences.
“Whether students move on to do experiments, simulations or modeling for their research, the big-data skills they learn here will be invaluable,” he said.
Partner institutions, which will field projects for DIRECT, are the Pacific Northwest National Laboratory, Boeing Research and Technology, Zhejiang University in China, the University of Campinas in Brazil and Bellevue College. The $3,000,000 in support from the NSF for the five-year NRT project is also supported by the UW to allow additional students to participate and leverage the grant to improve the diversity of doctoral students entering the UW to do clean energy research.
Pfaendtner’s co-principal investigators on DIRECT are associate professor of human centered design and engineering Cecilia Aragon, chemistry professor David Ginger, chemistry professor Xiaosong Li and professor Christine Luscombe in the Department of Materials Science & Engineering. Aragon is also a member of the eScience Institute, while Pfaendtner, Ginger, Li and Luscombe are members of the CEI and the MolES Institute.