top of page
Search
Writer's pictureLaToya Anderson

Scientific Software Engineering: What is it and How Do I Start?



If you are an undergraduate students who's interested in learning more about the intersection between science and software engineering and want to know how to start learning the software tools used in research, you've come to the right place. Let's cover a few things first...

What is scientific software engineering?


Scientific software engineering is a broad field that includes scientists who develop code to answer research questions, software engineers interested in code development of scientific software packages, and enthusiasts interested in contributing to the open science and software community. Given this broad spectrum, I define this field as application of software engineering principles in scientific research and code development of computational tools used by scientists.

Great, so why is it needed?


The physics community, to varying degrees, has embraced computational tools to study areas as large as the universe down to the smallest particles. This often produces vast amount of data, requires computationally intensive statistical analysis, and the computational resources to render complex visualizations (think of shows at your local planetarium). The programming languages and tools often used are Python 3.# (and Python 2 in legacy software), C++, Fortran (again for legacy software), high performance computing clusters, and software such as Quantum Espresso, Ab Initio, Vesta etc. Scientific software engineering can be an invaluable skillset because they understand how to seamlessly navigate between their scientific field and software engineering, which reduces research time, improves computational resource usage (i.e. faster code), and allows science to just focus on research. And scientists are also more likely to share their code on open source repositories (i.e. social media for programmers) like GitHub, GitLab, and Kaggle, leading to improved code development.

What often happens is that scientists, graduate students, and undergraduates are thrown into this intersection. The undergrad is assigned a coding project as part of their coursework without prior training, the graduate student has to use someone else's code base to perform their research or even worse, they don't realize that they may be able to tweak someone else's code and will a start coding project from scratch, and the scientist may hit an impasse as they perform their research yet finds it difficult to utilize their university's IT's resources due to a language barrier. And overall, it leads to coding habits that make their program inefficient (i.e. computationally expensive) or completely usable outside of their laptop or desktop, which affects reproducibility, one of the cornerstones of research.


Yikes! I'm an undergrad/grad student, and I didn't have more than a semester of programming. What can I do?

Glad you asked. First, know that I was and still am in your shoes. I'm an undergrad with a couple of semesters of programming under my belt and have also watched Ph.D. students struggle to create code for their thesis work. Talk about pressure. What I did was use my spare time to teach myself how to program using tutorials, created small projects based on what I learned, and just kept coding. This eventually led to my current role as Associate Research Analyst and helped me craft my career as a scientific software engineer. There were a fair number of late nights debugging sessions, too many coding tutorial struggles, and feelings of overwhelm along, and while I wouldn't trade any of it because it confirmed my career path as a scientific software engineer, I compiled a list of steps I would've taken if I started my coding journey all over again:

  1. Think about how you learn. Are you primarily a tactile, visual, or auditory learner? For example, I'm a tactile learner. This means that the blessing (and curse) is that I need to type out and compile (i.e. have my computer "read" my coding instructions) entire programs in order to fully understand what's happening in my code. I'm also an auditory learner, so I listen to podcasts like the RealPython to pick up the language software engineers use since I'm developing my own career in software engineering. Here are some resources listed by learning type:

    1. Tactile:

      1. Use coding tutorials that lets you do most of the typing. For example, sites such as Total Data Science have great tutorials for beginning programs (and use the same analysis used by scientists). Don't copy and paste the code. Actually type their program "by hand" and run it to get a "feel" for what you're learning.

      2. Use no-download coding frameworks (i.e. tools) such as Google Colab. That way, you can just start coding without worrying about getting your software to work.

    2. Auditory:

      1. Use text-to-speech software as you read through coding tutorials and code blocks. This means that you can use tutorials that only require you to add small code snippets (i.e. short lines of code to an existing program) such as Datacamp.

      2. Listen to software engineering podcasts to eavesdrop on tech speak. You'll begin to understand more and more what they are talking about as you continue to develop your skills. This is especially important if you decide to have a technical career once you graduate.

    3. Visual:

      1. Again, use coding tutorials that only require you to enter code snippets or use sites like on W3Schools to run already prepared code snippets. That way, you can see what it actually does before typing it out yourself.

  2. Learn Python 3, not Python 2. I'm emphasizing this because:

    1. Python 3 is not an improved version of Python 2 the way we think of Windows 11 as the better version of Windows 10. Python 3's syntax, or language structure, is different enough that if you try to add a Python 2 code snippet to a Python 3 - based program, it will most likely crash. (By the way, crash in software engineering just means that your program couldn't run because something was wrong with your code. Your physical computer however, won't suddenly stop working.) It's like inserting a Spanish phrase into a book written in Italian. Same language family, but different enough that it make sense to the reader.

    2. Why Python 3? Python 3 is the easiest programming to learn, it's the most popular language used within (and outside) the scientific community, and it teaches you good programming habits. Just like in English where we use written cues or formatting like indentation for paragraphs and spaces after a period to indicate the next sentence, the same is true for computer languages. We call it readable code. While most programming languages don't technically require you to indent, for example, Python 3 (and 2) does. Establishing good habits early on will make your programming journey exponentially easier and will look more professional.

  3. Sign up for GitHub. Software engineering is a collaborative process. GitHub, I mentioned earlier, is like the social media site for software engineering. It's a cloud-based website where anyone interested in programming uploads their code for storage and for others to contribute code, submit bug issues (i.e. submitting a comment telling the programmer that there's a problem with the code), and to also make code contributions. Unfortunately, there's a stereotype that software engineers are these solitary creatures that program in their mama's (or dad's) basement in the dark listening to trap music in the middle of the night. Not true (mostly lol). College programming courses often don't help because you're often completing assignments by yourself. GitHub is your first introduction to the collaborative process, is a secure way to store your code, and is a site to build your coding portfolio (because tech recruiters do look to see the work you've done).

  4. Learn out loud. This means don't silo yourself as your coding journey. This could be anything from asking questions on tutorial sites, asking computer science professors at your university, finding Discord coding groups, to finding mentors who are invested in your development. Why? Because of what I said in point 3. Programming is a collaborative effort and no one knows everything. All the software products we use are the result of a collaborative effort of technologists at almost all stages of their career. Here are some ways to learn out loud:

    1. Start a blog/YouTube Channel/Discord stream: Sharing your learning process is a great way to figure out if you actually understood a concept versus understanding the words. Similar for us STEM students who face the, "I understood the concept but couldn't solve the problem," issue we face in class, same is true for programming. Writing forces you to organize your thoughts, use the utilize software engineering language, and eventually develop a community around your interest. It's also a great addition to your portfolio for recruiters.

    2. Tweet: There's a huge software engineering community on Twitter and with enough engagement you can built a network of software engineers and enthusiasts who would be happy to answer your questions and offer insights.

    3. Find a mentor: But where:

      1. Your university computer science club. Regularly attend meetings and try to participate. You'll make friends and acquaintances, and there will most likely be a computer science student who will take you under their wing.

      2. Your science lab. If you are a part of one, is there anyone who appears to be a strong coder? Talk to them and get them to talk about their coding journey. That's a great beginning into mentorship.

      3. Attend science talks given by scientists with a coding background. One of the silver linings with the COVID-19 pandemic is that many conferences were virtual and are available on YouTube. If you find an interesting talk, Google the scientists name and you'll most likely find their professional email. Send them a business-style email that introduces who you are, why you're interested in their work, and if they would be willing to have a 15 minute conversation for you to learn more about their career path. That's a great time to listen and then share your interest in programming within science.

      4. And finally, Google. Search terms like, "programming mentorship", "python mentorship", or "software engineering mentor" to find mentorship-based nonprofits to apply to, Slack and Discord groups to join, LinkedIn profiles to view and contact (professionally!), and hackathons for all programming levels to participate in.


Starting your programming journey can feel daunting, especially if you're a STEM student or science professional with an overwhelming workload. Taking the steps above can streamline the learning process, make it more enjoyable, and will become even more vital as we continue to push the boundaries of scientific discovery. If nothing else, just start and you're already half way there!


65 views0 comments

Recent Posts

See All

Comments


bottom of page