2008 Rustbelt RNA Meeting
Multiple Sequence Alignments (MSAs) of both protein and nucleic-acid sequences are a ubiquitous method for modeling sequence motifs that pervade every biological domain. Despite their utility, MSAs and methods derived from them fail to capture interpositional relationships that can be as critical to family membership as are positional identities. This deficiency is particularly problematic when considering short nucleic-acid motifs such as endonuclease targeting signals or protein binding sites, since conservation of structure in these signals is often of equal or greater importance than conservation of nucleotide sequence.
The StickWRLD project has developed novel methods to quantitate as well as visualize interpositional relationships between residues that are signature features of some family alignments. We have identified dependencies in many protein and nucleic-acid families that are critical indicators of family membership. Some of these dependencies cannot be modeled by any existing family modeling method, including Hidden Markov Models. In some cases, the dependencies are sufficiently strong that Consensus, Position-Specific Scoring Matrix, and HMM methods all produce models that score sequences that are specificially excluded from the family, as more likely candidate members than any actual members of the family.
StickWRLD provides the researcher with a WWWeb-based interface to a visual survey of a family, enabling the researcher to determine if modeling the family in standard tools is appropriate. Ongoing research on the project is developing improved sequence modeling tools that incorporate interpositional dependencies into family models for sequence searches and classification.
Hatice Gulcin Ozer and William C. Ray. �Informative Motifs in Protein Family Alignments�, Lecture Notes in Bioinformatics 4645, pp 161-170, 2007.
Hatice Gulcin Ozer and William C. Ray. �MAVL/StickWRLD: Analyzing Structural Constraints using Interpositional Dependencies in Biomolecular Sequence Alignments�, Nucleic Acids Research, vol 34, Web Server Issue, pp W133-W136, 2006.
William C. Ray. �MAVL/StickWRLD: Visualizing Protein Sequence Families to Detect Non-Consensus Features�, Nucleic Acids Research, vol 33, Web Server Issue, pp W315-W319, 2005.
Keywords: bioinformatics, alignment, structure