villaunique.blogg.se

Are sequence logos better than consensus sequences
Are sequence logos better than consensus sequences








Historically, twenty naturally occurring AAs have been classified into two to 19 groups based on some measures of their relative similarity in physicochemical, structural, and functional properties this has resulted in a variety of reduced (degenerate or simplified) alphabets. To date, there are no tools that have the functionalities to group AAs based on their physicochemical or other properties and test the differential usage of AA groups. Among them, iceLogo offers the most choices, i.e., i) species-specific, proteome-wide AA frequencies as the background ii) position-specific AA frequencies as the background and iii) an experimental protocol bias-aware background. To our knowledge, only Two Sample Logo, iceLogo, pLogo, kpLogo, and PTM-Logo offer some advanced options of background models. However, in many cases, they may not be optimal for protein pattern discovery because of the complexity of the amino acid (AA) alphabet, rich post-translational modification of proteins, diverse subcellular localization, cell or tissue type-specific expression profiles, and experimental protocol-specific biases. These simple background models usually work well for identifying and visualizing nucleic acid patterns due to the simple nucleotide alphabet and limited modifications of nucleic acids, especially DNA. For those that do allow it, the only option for a majority of them is a background model based on global residue frequencies or GC%, which can be specified by users, calculated from input sequences, or derived from a species-specific whole genome or proteome ( S1 Table). However, a majority of logo generators do not allow users to choose a dataset-specific background model. It is known that both background models and sizes of input sets affect the identification and visualization of sequence patterns. More than two thirds of the existing tools cannot effectively display both over- and under-represented residues ( S1 Table), even though the information on under-represented residues may be as valuable as that of over-represented residues. A majority of these tools are also based on information theory without statistical significance tests. Almost all tools provide no alignment utilities but require multiple sequence alignments in various formats, or position weight/frequency matrices as inputs. A comprehensive list of these tools, along with their features, is available in S1 Table. Dozens of sequence logo generators with different functionalities and performances have been developed these generators require various types of inputs and background models, and use different algorithms and graphical representations. Since their first introduction in 1990, sequence logos have been widely used to visualize conserved patterns among large sets of nucleic acid and peptide sequences. Case studies showed that dagLogo can better identify and visualize conserved protein sequence patterns from different types of inputs and can potentially reveal the biological patterns that could be missed by other logo generators. Furthermore, dagLogo provides statistical and visual solutions for differential AA (or AA group) usage analysis of both large and small data sets. It implements different reduced AA alphabets to group AAs of similar properties. First, dagLogo allows various formats for input sets and provides comprehensive options to build optimal background models. To fill this gap, we developed an R/Bioconductor package dagLogo, which has several advantages over existing tools. However, there is lack of tools for applying reduced AA alphabets to the identification and visualization of statistically significant motifs. In addition, various reduced AA alphabets based on physicochemical, structural, or functional properties have been valuable in the study of protein alignment, folding, structure prediction, and evolution.

are sequence logos better than consensus sequences

Due to the complexity of the amino acid (AA) alphabet, rich post-translational modification, and diverse subcellular localization of proteins, few versatile tools are available for effective identification and visualization of protein motifs. Sequence logos have been widely used as graphical representations of conserved nucleic acid and protein motifs.










Are sequence logos better than consensus sequences