Transition to Open Science, with Dr. Jiajia Sun
Open science is fundamentally changing how scientists and researchers approach scholarly communication and collaboration, from publishing preprints and interactive research results in new formats, to sharing methods, code, or data, and having open research meetings and seminars. At Curvenote we strive to support scientists as they conduct their research, compose and develop their findings, and share ideas with collaborators and the world. There can be many benefits to working more openly, however, this can be both daunting and complex for researchers who are adopting these practices.
To better understand how Curvenote can help in the transition towards open science, we are constantly talking to researchers who have transitioned their own work to practicing more in the open, and understanding what Curvenote can do to help in this transition.
For this blog, we talked with Dr. Jiajia Sun, current Assistant Professor of Geophysics at the University of Houston, about his experiences with open science, open-source programming, and open-educational resources. Throughout our discussion we learned more about Dr. Sun’s transition into open-science practices, as well as the questions that arise about the benefits, challenges, and tools necessary to practice open science.
Dr. Sun’s introduction to open science started with open-educational materials. In his first semester at the University of Houston, he was asked to create a machine learning course. He knew about machine learning theory, but hadn’t personally implemented all the algorithms. He discovered the book, Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurélien Géron, which included openly available code examples. Modifying these examples, Dr. Sun created a collection of Jupyter notebooks that served as lab exercises for his students. When asked how he has benefited from others sharing their work, he explained:
“I feel like I have benefited so much from the open-source community that I feel morally obligated to give back. I would feel guilty if I don’t. They helped me and I should contribute back.”
True to his word, all of Dr. Sun’s course materials are openly available on GitHub. Dr. Sun’s experiences in educational resources, where he had transparent access to methods, data, and previous work to easily build upon, have influenced his passion for open science in many other parts of his team’s research process, specifically around how he uses tools like Jupyter, GitHub, and Curvenote.
Both Jupyter and GitHub are popular within open-source coding and research communities. Jupyter, built within an open-source community, provides a set of tools for scientific computing that have become one of the go-to resources for teaching and learning scientific programming. GitHub’s platform provides collaborative features around version control in many open-source communities. Curvenote is designed to integrate directly with Jupyter, to maintain an active link between research code and the resulting outputs, even when they appear outside of a Jupyter environment. The collaboration and version control features within Curvenote are built specifically for Jupyter notebooks, overcoming many issues of versioning notebooks in Git. Curvenote presents an approachable alternative to sharing and collaborating on scientific work for those not familiar with code development, or who may find GitHub intimidating.
Dr. Sun has continued to develop courses with open-source materials including an undergraduate geophysics course on electromagnetics using EM GeoSci, which was originally created for the SEG 2017 Distinguished Instructor Short Course on “Geophysical Electromagnetics: Fundamentals and Applications.” For the lab portion of the course Sun, “used a lot of the [Jupyter] Notebooks that they developed and modified [them] for undergraduates.” He added, gratefully that “the core part, background — the hard part — was taken care of by the SimPEG community.” Dr. Sun’s EM course materials are also openly available on GitHub.
#Fostering a Community
As we continued to discuss education, Dr. Sun mentioned that the field of geoscience is experiencing a steep global decline of student enrollment at the university level (Geoscience on the chopping block, 2021). In relation to open science, many fields of geoscience have been slower to adopt such practices, perhaps in part due to the field’s proprietary past in both the oil & gas and mining industries.
Dr. Sun remarked, “On the other side, if you look at the machine learning community, [it] has been exploding! One thing that they have is a really good community that are willing to share. If you look at the papers published in top machine learning conferences almost all of them share their code on GitHub so you can immediately follow up, reproduce, and improve upon their work.”
Open science is an opportunity to get engagement and detailed constructive feedback. By sharing our own contributions we provide resources for fellow scientists, and by allowing others to provide feedback we can help to foster a community. These conversations can lead to collaboration on a global scale, working to move science and discovery forward. Given these ideas, Dr. Sun continued with a call for increased and continued openness in the geosciences with a “mindset of collaboration and sharing.”
This goal of global collaborative science can seem ambitious and daunting to a single researcher, but there are several achievable benefits that open-science practices can provide. Open source helps to accelerate knowledge and discoveries because researchers are no longer required to reinvent or reimplement standard resources. Dr. Sun gave an example of how two of his graduate students are using SimPEG, an open-source Python package for geophysics, to develop their research code. “SimPEG has tremendously accelerated [their] research since they just use it to implement [their ideas],” he said. “Without SimPEG they would be years behind what they have achieved today.”
Using such open-source packages gives researchers access to more than previously written code. Open-source code is built within a group of researchers who communicate, collaborate, and contribute. Dr. Sun and his students are not only using SimPEG, but are contributing members of its open-source community. The SimPEG group host weekly community meetups to discuss code development and implementation, as well as a monthly seminar series where community members present their ongoing research.
SimPEG community members use Curvenote to take meeting notes, collaborate on snippets of code in Jupyter, and share their notes and seminar videos. Their material is available on their public Curvenote profile. Dr. Sun was glad that he and his students could share their ongoing research projects within the seminar series and hopes that other students, “will be inspired by how we do our research and use open source.”
Communicating earlier and often throughout the research cycle can provide a variety of benefits, especially to early career researchers. Creating code within an open-source package forces developers to write quality code that others will be able to read, use, and edit in the future. Early career researchers can gain valuable experience, participate in meaningful reviews, and be exposed to new productivity tools and practices within these groups.
By sharing ongoing research projects beyond a small group of co-authors, researchers receive corrective and instructive feedback that can influence a project’s trajectory and results. Often the feedback provided after journal submission or publication is overlooked as researchers feel that project is complete and have moved on to other subjects.
The content privacy settings on Curvenote provide opportunities for researchers to share their ideas with the public or with only a select group of collaborators. Dr. Sun gave an example, “My student might want to share with me or the whole SimPEG team. I think that would be a really good thing. First of all it protects the students. You aren’t sharing with the whole world but my student is stepping out of my small group and engaging with more people.”
Dr. Sun and his graduate students have also made an effort to share their data and code along with their publications. “What we do when we first submit our manuscript is to also publish code on an open-source repository like Zenodo, so when the reviewers and editors get our manuscript they will immediately have access to our code and they can reproduce our figures.”
Dr. Sun also mentioned the recent addition of a preprint service to their open-science workflow, “Just a few days ago we submitted a paper to JGR (Journal of Geophysical Research) and they offer a service to transfer your papers to ESSOAr (Earth and Space Science Open Archive).”
Sharing the additional information along with their publications is a “good way to maximize the impact of your work,” Dr. Sun expressed. This fact was evident when he and his student published a paper in the Journal of Geophysical Research: Solid Earth (Nurindrawati & Sun, 2020) in September of 2020 and shared their data and code, using Zenodo. “In just 4 weeks there were 400 downloads!”, he continued, “If you talk about real impact, this is real impact!”
As scientists, we often think of the impact of our research in number of citations, but how often do we actually implement, reproduce, or even fully read a scholarly article before we cite it? Sharing the building blocks of our research allows others to better understand, contribute to, and expand the science. We discussed how Curvenote links its writing platform and the Jupyter coding environment and its potential to improve how we conduct and share our open science.
Dr. Sun commented, “Curvenote can do so many things: version control, [Jupyter] Notebook, text, documentation, image creation. It has almost all the features that researches would need to develop, document, communicate and collaborate on their projects and all these features are integrated into one unified framework.”
#Challenges or Limitations?
Even with a long list of benefits, there was still the question of challenges or limitations of open science. Is it possible to still contribute or make use of open-source materials when working within exclusive research contracts?
“There are certain situations where you can’t do open source because of an agreement or contract,” Dr. Sun said. He added that he was fortunate to not currently be limited by these constraints, and even with industry sponsored projects there are ways to contribute back to open projects. “All the work is done in an open-source framework; it’s just that we can’t publish our results.”
A commonly debated disadvantage to open science is the idea of being “scooped.” When asked about these concerns, Dr. Sun said, “different people have different interests, focuses, hypotheses, tests and questions they want to answer” and a research project is much more than a single idea or method. He explained, “you can scoop my method but are you going to focus on the same study area, and try to answer the same geological questions that I am interested in? Probably not.”
Open science changes the cadence in which researchers openly collaborate. New forms of sharing such as Curvenote, GitHub, and Zenodo, and publishing such as preprints and digital archives, allow registration and comprehension of methods or ideas much earlier in a research cycle. All of science is based on the evolution and adaptation of ideas, sharing earlier and more transparently can increase impact.
After our discussion it’s easy to understand how Dr. Sun can be so passionate and enthusiastic about the open-science, open-source, and open-education practices and communities. Sharing information earlier and often throughout the research process accelerates the discoveries and conclusions we are able to make collectively. By getting feedback sooner, we can implement these ideas, improve our skills as researchers and expand the impact of our work.
When early career researchers participate in open science they learn valuable skills and practices that can only serve to benefit them throughout their career. In regards to promoting open-science among his students, Dr. Sun commented, “these skills are transferable and especially helpful for those students who end up working in a non-geoscience-related industry. Quite a few of my students are now data scientists instead of geoscientists. The skills mentioned above are currently not in the curriculum [but] will serve them well even if they want to switch their career.”
A variety of tools exist to conduct open science. Zenodo makes it possible to link a publication to shared code and data. GitHub serves as a repository for open-source coding packages and freely available datasets. When publishing papers, open-access journals and preprint services, such as arXiv, allow readers earlier access to ideas.
As the open-science movement continues to grow, Curvenote is being designed to integrate and expand on these tools to serve researchers throughout their workflows: promoting reproducibility and open science. Today, researchers can version control Jupyter Notebooks and integrate directly with collaborative tools for technical writing. Results and findings remain linked and reproducible, all of which can remain private, be shared publicly on Curvenote, or exported to preprint services. By supporting reproducibility and allowing granular controls on privacy throughout the research process, we aim to support scientists and researchers as they accelerate and share their discoveries — no matter where they are on their open-science journeys.