Bioinformatics essay: Overview and tools.

Defining Bioinformatics

Question:

Discuss About The Bioinformatics Interdisciplinary Comprising?

Bioinformatics is an interdisciplinary field comprising of genetics, molecular biology, mathematics, statistics and computer science. This field can simply be envisioned as a merger between biology, information technology (IT) and computer (Can, 2014). Primarily, bioinformatics aims to facilitate the discovery and characterisation of new biological insights and design a global viewpoint from which unifying biological principles can be discerned. Bioinformatics has three fundamental sub-disciplines, which are pursued by bioinformaticians globally. Firstly, the creation of new statistics and algorithms that can be used to identify relations among individual members of large data sets (Chiang, 2009). Secondly, it is the evaluation and elucidation of different kinds of data such as nucleotides and amino acids, protein structures, and protein domains. The lastly sub-discipline is the creation and application of tools that allow for systematic assessment and organisation of various types of biological information. Evidently, the definition and understanding of bioinformatics are not universal globally. Researchers, however, agree that bioinformatics is the creation of advanced information as well as computational technologies management for solving biological problems (Wightman & Hark, 2012). As a result, this field entails the storage, retrieval, evaluation and interpretation of biological data.

There are four steps in deriving a bioinformatics solution. The initial step entails the collection of statistics from biological data. In the second stage, a computational model is built. Next, the computational modelling problem is solved. The fourth step entails testing and evaluation of a computational algorithm (Can, 2014).

Various types of biological data are used in bioinformatics including nucleic acids, structures of biological molecules (mainly 3D structures), biochemical pathways, gene expression profiles, and phylogenetic data (Rigden, Fernández-Suárez, & Galperin, 2016). The two kinds of nucleotides are (DNA) and (RNA). DNA is the building block of life otherwise known as genetic material. It is the material that is inherited and passed from one generation to the other.

Structures of biological molecules are important to biologists, since macromolecules handle most functions of the cells. Bioinformatics has a specific focus on the three-dimensional structures of macromolecules. Gene expression profiling entails the measurement of the expression of thousands of genes at a short time to design a global picture of cellular function. DNA mediates the synthesis of RNA, and controls protein synthesis through RNA, which is a process known as gene expression. Gene expression profiling can show cells that are actively, or exhibit how particular cells react to a given treatment. In gene expression profiling, an entire genome can be measured simultaneously meaning that every gene in a given cell can be characterised. Several transcriptomics technologies are used to generate the essential data for evaluation. DNA microarrays are designed to measure the relative activity of already pinpointed target genes. Sequence based techniques such as RNA-Seq, offer information on the sequences of genes plus their expression level. Mantione and colleagues predict that RNA-Seq will form the future of data collection in bioinformatics (Mantione, et al., 2014).

Sub-disciplines of Bioinformatics

Another kind of data that is used in bioinformatics is biochemical pathways. Biochemical pathways are a series of interlinked chemical reactions happening in a cell. Metabolites are the primary reactants on metabolic pathways, and various specific enzymes catalyze the reactions. In metabolic pathways, the products of one enzyme act as the substrates for the next. Specific metabolic pathways occur based on the certain position in a eukaryotic cell and the importance of the metabolic pathways in the particular compartment of the cell. For example, oxidative phosphorylation, citric acid cycle, electron transport chain happens in the mitochondrial membrane (Campbell, Farrell, & McDougal, 2016). On the other hand, fatty acid biosynthesis, glycolysis and pentose phosphate pathway take place in the cytosol of a cell (Voet, Voet, & W, 2013).

Phylogenetic data is also of great importance to biologists. Phylogeny is crucial in bioinformatics because it expounds knowledge and delineates how genes, genomes and species evolve. For instance, phylogenetic data has been used to successfully study the evolution of genomes and genes in drosophila (Clark, et al., 2007). Molecular sequences are the main focus in phylogeny. Molecular biologists assert that phylogenetic data can be used to trace the evolution of a certain sequence to the current date, and even in the prediction of how the sequence will change in the future.

These data sets are stored in various biological databases. The biological databases are categorized based on the specific data they hold. Currently, there are databases for nucleotides, proteins, protein structures, and genomes maps. Two of the commonly used biological databases are the Universal Protein Resource (UniPort) and European Nucleotide Archive (ENA). UniPort is a resource for functional annotation and protein sequences (UniProt Consortium, 2008). These biological databases contain complete sets of nucleotide and protein sequences from all organisms that have been published (deposited) by the international research community. There are, however, specialized biological database such as organism specific and functional databases. Organism specific databases contain sequences of data from different organisms such as human and mouse. On the other hand, functional databases include vector database and TRANSFAC: Transcription factors.

Mactol is a bioinformatics tool that has been developed to enhance early detection of cancer. Mactol helps in the identification of protein and DNA colocalisation visualised through fluorescence microscopy. The development of this tool was motivated by the fact that pixel intensity-based coefficients cannot be used to study object-based colocalisation in biological systems (Khushi, et al., 2017). Matloob Khushi, who developed this tool, acknowledged that single image analysis is slow and takes many hours. Thus, this novel innovation can succeed manual co-localisation counting (Australian Cancer Research Foundation, 2017). Besides, it can be used in many biological areas. The tool automates the traditional quantification task and can quantify multiple, possibly hundreds of images automatically in a short time. Mactol identifies regions of fluorescent signal in two channels, determines the co-located sections of these regions and calculates the statistical significance of the colocalisation (CMRI, 2017). The features of Matcol allow users to view an area of interest and customise several parameters to analyse the region of interest completely. Unlike traditional tools that focus on pixel intensity-based correlation, Matcol is meant to visualise object-based colocalisation. It has a threshold multiplier that filters the background. Cannistraci and colleagues note that the removal of background minimises the visualisation of false-positive signals (Cannistraci, Montevecchi, & Alessio, 2009). This bioinformatics tools is a breakthrough in cancer detection and might assist researchers in designing novel therapies to treat cancer in its early stages.

Types of Biological Data

GT-Scan is an online bioinformatics tool (web-based) that organises the possible targets in a user-chosen section of a genome based on the number of off-targets available (O’Brien & Bailey, 2014). This bioinformatics tool offers the users’ flexibility to determine the required attributes of targets as well as off-targets through a straightforward “’-target rule’-”. In addition, GT-Scan delivers an interactive output allowing for comprehensive scrutiny of all the potential candidate targets. GT-Scan is mainly used to distinguish the most favourable targets for (Clustered Regularly Interspaced Short Palindromic Repeats) CRISPR/Cas systems (O’Brien & Bailey, 2014). However, the tool can be used for other genome-targeting techniques because it is highly flexible.

GT-Scan utilizes the basic idea of genome targeting. The initial stage in the successful operations of a genome targeting technique is to determine the potential target or targets with the section of interest that possess the least number of off-targets. In some instances, the potential target might be a gene, promoter or exon. The classical targets are sub-sequences within the desired part that have no similar copies in another section within the genome.

An attribute that makes GT-Scan reliable is the interactive output. The interactivity of the output enables the users to evaluate a potential target and the traits of is possible off-target. These include points of incongruous, number of incongruous and even genomic location. Currently, the web site support targets selections in over 25 Ensembl genomes (O’Brien & Bailey, 2014). When using the GT-Scan, researchers choose a suitable genome from a list and submit a DNA sequence of the genomic section in which they want to determine ideal targets. Several options are available to the users based on how they want to perform the identification. A user can select a rule-pair or design their personal customized rule-pair. A candidate target is a point in the specific genomic section that complements the target rule. In all candidate targets, the tool records the possible off-target in the genome that has less than three incongruous in the candidate targets as well as match congruous off-targets filter. Also, the researcher can independently control the number of incongruous in off-targets. In summary, GT-Scan helps users to answer two specific questions. The first question is what the ideal candidate targets in the genomic section of interest are? The second question is the number of potential off-targets in the target genome being used by the researcher (GT-Scan, n.d.).

Biological Databases

Australian researchers in collaboration with international partners have developed a novel drug for the treatment of common viral disease. The drug was developed following analyses of nucleotides, and gene expression profiles. In their quest to characterise the occurrence of viral diseases, the researchers found that NOX2 oxidase is activated by single stranded DNA and RNA viruses in endocytic compartments. Once triggered, NOX2 generates endosomal hydrogen peroxide, which subdues the body’s humoral and antiviral signalling networks (To, et al., 2017). As a result, the body’s ability to fight viral diseases is suppressed, and the viral infection becomes virulent. Many people experience this pathogenesis since NOX2 is found in most viral such as common cold, influenza, HIV and dengue fever. The primary research on the action of NOX2 was based on mice, and human subjects are yet to be included in the study.

NOX enzyme is unavailable in prokaryotes but evolved approximately 1.5 billion years in single cell eukaryotes. The enzyme is present in eukaryotic groups such as algae, fungi, amoeba, and nematodes. After characterising the enzyme, the team of scientists designed a novel drug that proved effective in mice. Specifically, the prototype drug inhibited the effect of NOX2 oxidase in mice. The customised drug suppressed the disease caused by influenza infection. However, the drug is still undergoing development and will only be available to humans after five years. This novel viral drug developed using bioinformatics technology aims to improve the efficiency of treatment. The current treatment techniques are limited because they target circulating viruses and have an uncertain or minimal impact against new viruses that affect humans.

Empirical evidence suggests that flu virus results in the hospitalisation of 13,500 Australians and results in 3000 deaths among the population aged over 50 years. Even the global burden of the flu virus is increasing. It has been found that approximately five million cases of infections are reported annually, and about 10% of these cases lead to death (Kenrick, 2017). Based on this finding, the discovery of a viral diseases drug is a major milestone towards addressing the disease burden.

MEME Suite is a web-based and software toolkit for conducting motif-based sequences analysis and is accessible through meme-suite.org (MEME Suite, n.d.). Motif sequence inspection is important in the multiple scientific contexts. As such, Suite software is a fundamental toolkit for studying biological processes comprising of RNA, DNA and proteins. The toolkit has been used to analyse results for approximately 9800 published papers (Bailey, et al., 2015). Onset of proteomics and genomics means that many researchers will need to conduct motif analysis and thus, MEME Suite will become more important. Even before the advent of these fields of study, MEME Suite has been used widely for biological discoveries.

Mactol: A Bioinformatics Tool for Early Detection of Cancer

On the web, MEME Suite contains several tools and integrated databases used to perform motif analyses. The basis of the suite is the “meme motif discovery algorithm”. This meme searches for the motif in unaligned collection of protein sequences, RNA and DNA. From its discovery and launch, meme continues to gain popularity in the scientific field. For instance, in 2014, the meme gained about 2014 alone unique users (Bailey, et al., 2015).

The MEME Suite was developed based on the existing understanding of motif. An RNA, DNA or protein motif sequence is a small pattern that is consistent within evolutions. Ideally, a motif is conserved by the evolution. In either of these sequences, a motif might correspond to different sites. For instance, in DNA motifs might be homologous to specific protein-binding sites. On the other hand, in proteins, motifs might correspond to the active sites of enzymes. In proteins, the motif might still correspond to a structural unit essential for correct folding of the specific protein. Hence, a sequence motif is among the elementary functional units of molecular evolution. Due to these facts, determining and characterising the motif is important to designing models of cellular processes. The identification of the motifs is further important to understand the mechanisms and pathophysiology of human diseases.

The MEME Suite toolkit consists of 13 tools for conducting motif discoveries, motif enhancement scrutiny, motif inspection and motif-motif correlation. The newest six tools in the MEME Suite toolkit are MCAST, DREME, MEME-ChIP, AME, CentriMO, SpaMo (Bailey, et al., 2009). When performing motif discoveries and motif enhancement, the users give a set of a unaligned protein sequence, RNA, or DNA sequence. Customarily, the sequences may be promoters of coexpressed genes or proteins with a common role.

Motif discoveries locate de novo motif in the deposited sequence. A researcher can then deposit the motif instantly, to the scanning and correlation tools with the MEME Suit to determine any other protein or genomic sequence with the identified motif. This process might also aim to discern if the motif is homologous to formerly studied motif. The Suite offers a wide array of genomic and proteomic sequence databases for motif inspection and numerous motif databases for motif correlation.

The MEME Suite toolkit has a flexible and straightforward user interface to facilitate fast motif analysis. All the input fields explain the specific information that is needed, how to input the information and in most cases, an example is provided. A question mark (?) guides the user to get the required help (Bailey, et al., 2015). The whole interface of the MEME toolkit is flexible and consistent. For instance, a user can input the required information in a certain field by typing or cut-and-paste or choosing a file for upload.

GT-Scan: An Online Tool for Genomic Targeting

Conclusion

Australia has a huge opportunity in the growing bioinformatics industry. Specifically, there is an opportunity in leveraging the benefits of bioinformatics in medical and health research. Stakeholders across the board now assert that the importance of bioinformatics stretches beyond biotechnology into medical and health research. The application of computers and information technology to match and analyse gene sequences allows a better understanding of causes of diseases and difference in impact across different populations. The application and utilisation of these techniques will give the country a competitive advantage in the pharmaceutical industry. This paper has highlighted the Matcol tool, MEME Suite, development of viral diseases drug and GT-Scan as some of the breakthroughs in the application of bioinformatics. Although some of these applications are still in the trial stage, they are fundamental in setting the ground for major achievements through bioinformatics.

Reference

Australian Cancer Research Foundation. (2017, 9 12). New bioinformatics tool to improve the early detection of cancer. Retrieved 9 12, 2017, from https://acrf.com.au/news/new-bioinformatics-tool-to-improve-the-early-detection-of-cancer/

Bailey, T., Boden, M., Buske, F., Frith, M., Grant, C., Clementi, L., & Noble, W. (2009). MEME SUITE: tools for motif discovery and searching. Nucleic acids research, 37(suppl_2), W202-W208.

Bailey, T., Johnson, J., Grant, C., & Noble, W. (2015). The MEME suite. nursing acids research, 43 (W1), W39-W49.

Campbell, M. K., Farrell, S. O., & McDougal, O. M. (2016). Biochemistry. Cengage Learning.

Can, T. (2014). Introduction to bioinformatics. Methods Molecular Biology, 1107, 51-71.

Cannistraci, C., Montevecchi, F., & Alessio, M. (2009). Median-modified Wiener filter provides efficient denoising, preserving spot edge and morphology in 2-DE image processing. Proteomics, 9(1), 4908-4919.

Chiang, J. H. (2009). Tech-ware: Bioinformatics and computational biology resources [Best of the Web]. IEEE Signal Processing Magazine, 26(5), 153-158.

Clark, A., Eisen, M., Smith, D., Bergman, C., Oliver, B., Markow, T., . . . Pollard, D. (2007). Evolution of genes and genomes on the Drosophila phylogeny. Nature, 450(7167), 1-56.

CMRI. (2017, 9 7). Researcher Devises Tool To Speed Up Cancer Discovery. Retrieved 9 12, 2017, from https://www.cmri.org.au/News/Latest-News/CMRI-researcher-devises-tool-to-speed-up-cancer-di

GT-Scan. (n.d.). GT-Scan: Identifying Unique Genomic Targets. Retrieved 9 12, 2017, from https://gt-scan.csiro.au/

Kenrick, J. (2017, 7 12). New research points to treatment breakthrough for viruses. Retrieved 9 12, 2017, from https://www.rmit.edu.au/news/all-news/2017/jul/new-research-points-to-treatment-breakthrough-for-viruses

Khushi, M., Napier, C. E., Smyth, C. M., Reddel, R. R., & Arthur, J. W. (2017). MatCol: a tool to measure fluorescence signal colocalisation in biological systems. Scientific Reports, 7( 8879 ), 1-9.

Mantione, K., Kream, R., Kuzelova, H., Ptacek, R., Raboch, J., Samuel, J., & Stefano, G. (2014). Comparing bioinformatic gene expression profiling methods: microarray and RNA-Seq. Medical science monitor basic research, 20(1), 138.

MEME Suite. (n.d.). The MEME Suite. Retrieved 9 12, 2017, from https://meme-suite.org/

O’Brien, A., & Bailey, T. L. (2014). GT-Scan: identifying unique genomic targets. Accounting, 30(18), 2673–2675.

Rigden, D., Fernández-Suárez, X., & Galperin, M. (2016). The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection. Nucleic acids research, 44(D1), D1-D6.

To, E., Vlahos, R., Luong, R., Halls, M., Reading, P., King, P., . . . Starkey, M. (2017). Endosomal NOX2 oxidase exacerbates virus pathogenicity and is a target for antiviral therapy. Nature Communications, 8(1), 69.

UniProt Consortium. (2008). The universal protein resource (UniProt). Nucleic acids research, 36(suppl 1), D190-D195.

Wightman, B., & Hark, A. T. (2012). Integration of bioinformatics into an undergraduate biology curriculum and the impact on development of mathematical skills. Biochemistry and Molecular Biology Education, 40(5), 310-319.

Voet, D., Voet, J. G., & W, C. (2013). Fundamentals of Biochemistry: Life at the Molecular Level . John Wiley & Sons

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2018). An Essay On Bioinformatics Data Sets And Tools: An Overview.. Retrieved from https://myassignmenthelp.com/free-samples/bioinformatics-interdisciplinary-comprising.

"An Essay On Bioinformatics Data Sets And Tools: An Overview.." My Assignment Help, 2018, https://myassignmenthelp.com/free-samples/bioinformatics-interdisciplinary-comprising.

My Assignment Help (2018) An Essay On Bioinformatics Data Sets And Tools: An Overview. [Online]. Available from: https://myassignmenthelp.com/free-samples/bioinformatics-interdisciplinary-comprising
[Accessed 26 April 2024].

My Assignment Help. 'An Essay On Bioinformatics Data Sets And Tools: An Overview.' (My Assignment Help, 2018) <https://myassignmenthelp.com/free-samples/bioinformatics-interdisciplinary-comprising> accessed 26 April 2024.

My Assignment Help. An Essay On Bioinformatics Data Sets And Tools: An Overview. [Internet]. My Assignment Help. 2018 [cited 26 April 2024]. Available from: https://myassignmenthelp.com/free-samples/bioinformatics-interdisciplinary-comprising.