1.Given the following DNA sequence:
Construct a keyword tree
Construct a suffix tree
2.How many different nucleotide sequences may code for the following protein sequence:
3.Given the following MSA (Multiple Sequence Alignment), describe (in pseudocode) how you would determine which positions contained informative sites:
4.Describe how gene finding algorithms work.Include a description of all the elements that they search for to help determine whether or not a sequence is a protein coding gene
5.What is BLAST? Describe how the algorithm works. Be sure to include any statistical measures that are used in determining the strength of any BLAST results.
6.A graduate student has written part of an R script to perform an analysis.
- Describe what each line does by adding comment lines to it as appropriate
- Execute the script and show all of the output it generates
- Modify the script so that there are 3 centroids displayed
- Provide the final modified script and its output.
Constructing a keyword tree and suffix tree from a DNA sequence
BLAST enables a self-enabling algorithm with the objective of systematically exploring the nucleic acid and protein databases. It is an effective tool utilized for undertaking a comparative analysis between the query sequences of various nucleic acids and proteins. BLAST prototype (i.e. BLAST-P) substantially delineates the variations between the protein databases and queries in a rational manner. The heuristic potential of BLAST assists in undertaking quick searches in the protein database (Madden, 2013). The expect-value identified by BLAST assists in tracking the quantity of protein sequence matches on the basis of probability. This reciprocally influences user’s confidence in the retrieved sequence alignment. BLAST tool is capable of executing local alignments while effectively searching for sequences of limited-similarity. They also possess the capacity of globally aligning the genomic DNA with the mRNA while undertaking genomic analysis and assembly. The BLAST algorithm effectively filters the query sequence by utilizing the conventions that are customized for evaluating the low complexity regions. The filtering process is limited to the query sequence rather than to the entire database sequences. The nucleic acid sequences and protein sequences are designated by N and X respectively.
BLAST program effectively removes the redundant and low complexity sequences due to that its focus increases on exploring the more relevant database hits (Mount, 2004). BLAST is available at NCBI effectively aligns any DNA sequence with other similar or dissimilar sequences for their comparative evaluation. BLAST algorithm does not compare each sequence residue and utilizes small segments with the objective of generating the alignment seed. The user acquires the privilege to generate a self-defined length of words from a known query sequence. The reduction in unnecessary comparisons enhances the alignment pace with the utilization of BLAST algorithm. The segregation of three residues and enhancement of words magnitude (by the BLAST algorithm) facilitates the process of their comparative analysis while minimizing the regions that require evaluation. BLAST algorithm effectively adjusts the alignment while following the T (threshold) value designated by the end user. The BLAST search mechanism facilitates the word extension beyond the pre-defined threshold value. The algorithm also utilizes a cut off score for segregating the alignment above the cut off limit for significantly reinforcing homology between the sequences. Indeed, after the exploration and tracking of a significant hit, the BLAST algorithm searches the sequence of interest over the extended alignment segment that exhibits a greater value than the cut off score (Lobo, 2008). Alignment termination is performed after the sustained reduction of the alignment score below the threshold score limit. These BLAST mechanisms could be adjusted further with the objective of enhancing the sensitivity and pace of the algorithm for the timely acquisition of the desirable sequences.
Calculating the number of different nucleotide sequences that can code for a protein sequence
BLAST categorises local or global sequence similarity measures for effectively optimizing the alignment between the sequences of interest. The BLAST similarity algorithm systematically excludes the non-conserved sub-sequences and considers the conserved sequence regions for calculating the similarity score (Altschul, et al., 1990). The local similarity approaches that are deployed by the BLAST algorithm utilize cDNA for its comparison with the semi-sequences genes. The isolated similarity locations are displayed by distant proteins that require systematic comparison with the cDNA by the BLAST algorithm (Altschul, et al., 1990). The megaBLAST search algorithm related to nucleotide-nucleotide exploration is useful in comparatively analysing the sequences between the species of related origin (Madden, 2013). This algorithm effectively explores direct matches for the 28-bases and subsequently replicates the matches sequences across the entire alignment. The BLASTN exploratory algorithm compares the nucleotide-nucleotide pattern between the distantly located sequences (Madden, 2013). Sequence comparisons between various proteins are executed by the BLASTP algorithm that also facilitates the pattern of TBLASTN and BLASTX searches. BLASTX search algorithm executes the nucleotide query across the protein database through its systematic translation. Alignment patterns characterized by various BLAST hits are segregated with the utilization of the character “>” along with the subject sequence name and accession number (Leung, 2017).
BLAST Definition line includes the aligned blocks of sequences that incorporate localized similarity regions between the subject and query sequences. The arrangement of the aligned blocks is done in accordance with the descending S values that most of the time mismatch with their structural pattern across the query DNA molecule. The blocks of sequences require sorting by running the query ‘Sort’ for the effective segregation of blocks in the desirable patterns. The recent amendments to BLAST algorithms include the extension of T score, inclusion of gapped alignments and position oriented score matrix (Altschul, et al., 1997). The BLAST algorithm is challenged by the USEARCH and UBLAST tools that claim to acquire elevated scoring local and global sequence alignments (Edgar, 2010). These algorithms are multiple time faster and sensitive than the original BLAST algorithm in their potential to extract and analyse the protein sequences. BLAST exhibits the capacity of integrating with the global alignment algorithm for generating a complete primer-target alignment in the context of exploring the pattern of mismatched primer-based targets (Ye, et al., 2012). BLAST execution of various algorithms (including BLASTN, BLASTP, BLASTX, TBLASTN and TBLASTX) requires the importing of the accession number, subject sequence, query subrange and subject subrange for comparatively analysing a range of nucleotide sequences (NCBI, 2017).
Determining informative sites in a Multiple Sequence Alignment (MSA)
BLAST searches are constrained by their smaller list sizes and because of that the simultaneous retrieval of the entire significant hits proves to be a big challenge for bioinformaticians. The search query requires multiple executions with the objective of acquiring the significant sequence findings at a large scale. This substantiates the requirement of modifying the BLAST search algorithm while effectively increasing the length of search queries for obtaining the desirable hits in a single execution. The adjustment in the magnitude of list size will substantially facilitate the scale and sensitivity of the BLAST sequence comparisons. BLAST algorithm displays the limited quantity of top alignments of the sequence segments by default. However, this alignment sequence limit requires adjustment in a manner to produce the desirable sequences pattern while excluding the unlisted sequences. However, automation of these alignment patterns is highly recommended and would substantially improve the sensitivity and specificity of the BLAST algorithm results. The default word size in the BLAST algorithm substantially constraints the specificity of results.
The systematic scoring of the segment pairs eventually requires the generation of complete matches in accordance with the predefined pattern and magnitude of the optimized DNA bases. Therefore, another significant recommendation attributes to the requirement of automating the length of word size while allowing its user-defined variability in a manner to elevate the sensitivity of the compared sequences. Resultantly, this change of provision in the word length will facilitate the exploration of a variety of nucleotide patterns in accordance with the research requirements. The transformation of proteins word size to various limits without the requirement of manually adjusting the T-threshold will substantially improve the accuracy of results and generation of unreliable sequences. Utilization of BLAST algorithm across a lengthy database warrants the execution of searches in the format of batch queues. This requires executing a multitude of batch commands for sequentially running the batch queues one after another for increasing the speed of the database search. The development of an automated process for saving the database search time (by the BLAST algorithm) would reduce the requirement of manual execution of batch commands and increase the specificity and sensitivity of the algorithm for acquiring the desirable nucleotide patterns. BLAST algorithm also requires modification with the objective of extracting the nucleotide-sequence homologues at the level of amino acids.
References
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic Local Alignment Search Tool. Journal of Molecular Biology, 215(3), 403-410.
Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389–3402. doi:https://doi.org/10.1093/nar/25.17.3389
Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26(19), 2460-2461. doi:https://doi.org/10.1093/bioinformatics/btq461
Leung, W. (2017). Lab Week 8 – An In-Depth Introduction to NCBI BLAST. Washington: Washington University. Retrieved from https://community.gep.wustl.edu/wiki/images/2/28/2011_8b_BLASTrv7_rev.pdf
Lobo, I. (2008). Basic Local Alignment Search Tool (BLAST). Nature Education, 1(1), 215.
Madden, T. (2013). The BLAST Sequence Analysis Tool. The NCBI Handbook, 1-11.
Madden, T. (2013). The BLAST Sequence Analysis Tool. In The BLAST Sequence Analysis Tool. Bethedsa: NCBI. Retrieved from https://www.ncbi.nlm.nih.gov/books/NBK153387/
Mount, D. W. (2004). Using the Basic Local Alignment Search Tool (BLAST). Sequence Database Searching for Similar Sequences.
NCBI. (2017). BLAST. Retrieved Dec 03, 2017, from https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&BLAST_SPEC=blast2seq&LINK_LOC=blasttab&LAST_PAGE=blastn&BLAST_INIT=blast2seq
Ye, J., Coulouris, G., Zaretskaya, I., Cutcutache, I., Rozen, S., & Madden, T. L. (2012). Primer-BLAST: A tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics, 2-11. Retrieved from https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/1471-2105-13-134?site=https://bmcbioinformatics.biomedcentral.com
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2021). Bioinformatics: DNA Sequence Analysis, Gene Finding Algorithms And BLAST. Retrieved from https://myassignmenthelp.com/free-samples/bifs614-data-structure-and-algorithms/comparative-evaluation-data-structure-management.html.
"Bioinformatics: DNA Sequence Analysis, Gene Finding Algorithms And BLAST." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/bifs614-data-structure-and-algorithms/comparative-evaluation-data-structure-management.html.
My Assignment Help (2021) Bioinformatics: DNA Sequence Analysis, Gene Finding Algorithms And BLAST [Online]. Available from: https://myassignmenthelp.com/free-samples/bifs614-data-structure-and-algorithms/comparative-evaluation-data-structure-management.html
[Accessed 09 January 2025].
My Assignment Help. 'Bioinformatics: DNA Sequence Analysis, Gene Finding Algorithms And BLAST' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/bifs614-data-structure-and-algorithms/comparative-evaluation-data-structure-management.html> accessed 09 January 2025.
My Assignment Help. Bioinformatics: DNA Sequence Analysis, Gene Finding Algorithms And BLAST [Internet]. My Assignment Help. 2021 [cited 09 January 2025]. Available from: https://myassignmenthelp.com/free-samples/bifs614-data-structure-and-algorithms/comparative-evaluation-data-structure-management.html.