Questions about some of the background areas related to Silencing RNA's.
Some basics of bioinformatics are provided here.
Questions related to the tools and data sets provided on this website.
RNA stands for Ribonucleic acid, which is a long chain molecule made up of nucleotides that operate with cells of living creatures. In general, cells have a nucleus and inside the nucleus is the DNA - a long complex molecule consisting of thousands of genes. Cells need to produce enzymes that can be used to produce the correct proteins that allow the cell to perform its function. The instructions for the enzymes are contained in the genes in the DNA, inside the nucleus. RNA provides the mechanism for the instructions encoded in the genes to be transferred to the cell and to enable the required proteins to be produced, or synthesized.
Some more places for a fuller scientific explaination is available from the following sources.
Eukaryotic Organisms are organisms with cells containing nuclei.
More in-depth information is available from the following sites:
Silencing RNA are RNA molecules that are involved in inhibiting gene expression - the process of negating the instructions contained in the DNA, thereby preventing the protein molecule that should be produced from the DNA gene sequence from being synthesized.
More in-depth information is available from the following sites:
siRNA stands for Small interfering RNA. Most notably, siRNA is involved in the RNA interference (RNAi) pathway, where it interferes with the expression of a specific gene.
miRNA stands for Micro RNA. They regulate gene expression.
miRNA stands for Micro RNA. They regulate gene expression.
Amongst other reasons, with about 157 million base pairs and five chromosomes, Arabidopsis thaliana has one of the smallest genomes among plants. It was the first plant genome to be sequenced, completed in 2000 by the Arabidopsis Genome Initiative. Much work has been done to assign functions to its 27,000 genes and the 35,000 proteins they encode.
Bioinformatics deals with the storage, transport, and analysis of information encoded in the genes and how this information affects the universe of biological processes.
For more information try the following link Wikipedia site.
DNA defines the synthesis of protein by way of an RNA intermediary. Documenting, controlling, and modifying this process is the focus of bioinformatics. Sequencing data is one of the most abundant forms of data available and bioinformatics greatly helps in the storage and analysis of the large amounts of generated data.
One of the aims of Sirocco is to develop tools that assist biologists without having to write programs from scratch. Links to these are listed here. try and entice holders of biological data to use Bioinformatics tools. Some knowledge of languages like Perl and possibly R could also be useful.
There are a number of books on the subject of Bioinformatics. Some of the ones that are available are listed here:
Perl is the main language used in Bioinformatics - a good starting point whether learning the language from scratch or just using its bioinformatics package is http://www.bioperl.org/wiki/Getting_Started. The following book is also suggested: Beginning Perl for Bioinformatics - for those thinking about programming in the Perl language.
R is a very useful statistical analysis package. For more info visit the main page and don't forget its Bioconductor package that provides useful Bioinformatics information. The following book is also suggested: R Programming for Bioinformatics (Chapman & Hall/CRC Computer Science & Data Analysis) - for those thinking about using the R Statistical language
Several different biologists and programmers invented several ways to format sequence data in computer files, and so bioinformaticians must deal with these different formats. There are many such formats, perhaps as many as 20 in regular use for DNA alone.
The main two data formats in use here are FASTA and GenBank, though the FASTQ file format is also used here.
In addition there may be differences depending on the type of machine used to generate the sequences.
The answer to this depends on the nature of the species being sequenced and then the format that the data is in. Read the answer below for more information on data formats.
The sets of tools provided on the tools page which accept user input data have their required file formats discussed here.
Some MicroRNA (miRNA) datasets can be used on some of the tools for all of the toolsets listed on the site.
Data can be made available on this site via links to the GEO database if it has been made publically available there. Data to be made available on the site can also be uploaded directly to our servers. Please note that data on this site can be provided to the general public or restricted to SIROCCO partners only. See our data upload HOWTO for more information.
The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes microarray and other forms of high-throughput data submitted by the scientific community. In addition to data storage, a collection of web-based interfaces and applications are available to help users query and download the experiments and gene expression patterns stored in GEO.
From a practical point of view this means sequence data can be accessed via this website. Be aware that they use the SOFT file format which is designed to be flexible and hold many different types of data. Sequence data can be converted into the FASTA format - a tool to do this will be provided in Q1 2009.
The sets of tools provided on the tools page which accept user input data have their required file formats discussed here. So far these are all plant related tools, some specific to Arabidopsis thaliana.
Not without some leg work. GBrowse applications store data in the GFF3 format (or GFF2 format databases for older installations). Data from many of these tools are provided as CSV files, and tools may have to be written to provide appropriate conversion. GFF3 format details are here.
Currently, none of the tools listed here do so. (TBC)
The UEA Plant sRNA toolkit contains a Sequence file pre-processing tool which removes adaptors.
Currently, none of the tools listed here do so, but the facility may be introduced in WMD3.(TBC)