Learning to code for Bioinformatics


banner built in canvas

  Hello all! In the last group meeting here in my lab was talking that she was having some problems parsing some outputs from bioinformatics tools. She had two outputs, one fasta file ( a plain text file that contains nucleotide sequences plus an informative header) and one table (tsv file) with the sequences ids that are also in the fasta files plus some genome ids from organisms. She was doing manually finding curating the table, finding the genome ids and checking and after finding the corresponding sequence ids , she was moving into the fasta file to find the sequence id. But we are talking about a fasta file with thousands of sequences and also the table had also some thousands of rows.


A hypothetical fasta file( with small nucleotide sequences)


A hypothetical table with similar data

  She is a biologist, who used to do PCR experiments and other wet lab work, so I volunteered to help her make her life easier. There isn't a need for a huge code to solve that, we made a list with the genome ids of interest opened the two files, did the association between the fasta file and the table, and filtered by the genome_ids of interests. In the end we produced a fasta file with a header concatenating the sequence and genome id together. Her fasta file in the end instead of hundreds or thousands of sequences had now only some dozens. Programming facilitates a lot when we deal with large amount of data.

How to start

  My advice here is not to follow the exact path that I made, everyone has a different path to learning programming, and being a biologists is very challenging to learn to code! Why? Because biologists usually (in most of cases) learned that nature is subjective, and everything has exceptions. But computers are logical they follow steps and exact steps. For example, one question that this co-worker had when we were coding wasn't understanding why during a condition line (if ), NC_001111.11 was different from NC_001111.10 , because for her is the same organisms, the only difference is that it is the genome version, however, a computer doesn't think that way right? This is the human interpretation of the data. For those who laugh after reading this example, don't do that! It is pretty normal this mistake. Sometimes it takes time to understand that a computer isn't a human and doesn't think like a human.

R

  I didn't start to code with R, but I guess this is a good way to start building some logic for coding. Why? You can see the progress of your script line by line. If you type "1+1" and press enter, it will show the answer in the console. So if you type "sum= 1+1" press 'enter key' and after that, you just type 'sum' and 'enter key' you will have the same result, so it is easy to check the value of a variable, there isn't a need for typing the function "print" to get a result. I have been using it a lot , and in addition it is very good to play with stats and graphical plots, but still, you have tons of functions for helping you out. Of course, you can still perform loops and conditions if you want like any coding language. In addition, there is a cool IDE called Rstudio, also available in the cloud for free (with some memory limitations of course) that you can see your script, console and outputs at the same time that you are generating.


screenshot of R studio

Python

  Probably this is the most popular first language to learn for biologists and similar careers, probably because there are lot's of libraries available in the area, for example, there is a library to recognize fasta files from 'BioPython' and separate the headers and sequences. I think still that python has fewer ready-to-go functions than R still. There are good IDE's for python, the one I use is called spyder, but there are some people that use pycharm for example. Also python has some interesting libraries that also deal with plots, like matplotlib and plotly


screenshot from a random script that I have

Perl

  I learned to code using Perl, usually devs hate this language, since it is a very flexible language and still can make a code work even without indentations, if you don't use the 'use strict' rule in the beginning you even don't need to declare a variable to start using it! A Perl code can be very confusing depending by who wrote it. But still it is a possibility for learning, it doesn't have also the tons of libraries python has, so probably we use more lines for coding programs in Perl compared to python, but I love Perl to train a bit of Regex sintaxes. I don't know any good IDE for Perl, I used to code in emacs on a Linux system, at least there I could put some indentation in my codes.

Ruby

  I don't have much experience in Ruby, but it is similar to perl, a very flexible language good to parse a large amount of data. People usually use Nano, emacs, or similar text editors to code on it.

Conclusion

  Independent of the way that you choose to learn to code, it will build some experience to learn another language when you need it. You need to think of coding languages as a spoken language. Usually when you learn 1 or 2 , it is easier to learn the 3rd or 4th language. Here are some resources for learning:

Tutorial for learning R in biology sciences
Biopython tutorial
Python for beginners
Perl for beginners

  I hope this article helps someone in your path.



0
0
0.000
4 comments
avatar

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support. 
 

0
0
0.000
avatar

Congratulations @gwajnberg! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s):

You got more than 2250 replies.
Your next target is to reach 2500 replies.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out the last post from @hivebuzz:

World Cup Contest - New Sponsor - LeoFinance joins the party with 1000 more HIVE!
World Cup Contest - New Sponsor and Prizes - dCrops adds 30000 CROP and 300 NFTs
HiveBuzz World Cup Contest - Collect badges and win prizes - More than 5500 HIVE to win
0
0
0.000
avatar

Congratulations @gwajnberg! You received a personal badge!

You successfully registered for the HiveBuzz World Cup 2022 Contest.

We hope you will enjoy this event and will have fun. Good luck!

You can view your badges on your board and compare yourself to others in the Ranking

0
0
0.000