Silencing RNAs
Questions about some of the background areas related to Silencing RNA’s.
Bioinformatics
Some basics of bioinformatics are provided here.
- What is Bioinformatics?
- How will Bioinformatics help me working with small RNA?
- Will I need to do programming for Bioinformatics?
- What books can I use to learn about Bioinformatics?
- I hear about Perl being used for Bioinformatics – what is it and how do I go about
using this? - I hear about R being used for Bioinformatics – what is it and
how do I go about using this?
Tools and Data Formats
Questions related to the tools and data sets provided on this website.
- What are all these data format types?
- Can I use these tools for my sequencing data?
- What format does my data have to be in to use these tools?
- Can I use these tools for my microarray data?
- How do I transfer my data to this site?
- What is GEO/Gene Expression Omnibus?
- Can I use these tools for any species of organism?
- Can I analyse the output of this site in G browse?
- Does this site give me a size profile of my small RNAs?
- Does this site remove the adaptors from the small RNA reads?
- Does this site tell me the frequency of the first nucleotide?
Small RNAs
- What is RNA?
RNA stands for Ribonucleic acid, which is a long chain molecule
made up of nucleotides that operate with cells of living creatures.
In general, cells have a nucleus and
inside the nucleus is the DNA – a long complex molecule consisting of thousands of genes.
Cells need to produce enzymes that can be used to produce the correct proteins that allow
the cell to perform its function. The instructions for the enzymes are contained in the genes in the DNA, inside the
nucleus. RNA provides the mechanism for the instructions encoded in
the genes to be transferred to the cell and to enable the required proteins to
be produced, or synthesized.
Some more places for a fuller scientific explaination is
available from the following sources. - What are Eukaryotic Organisms?
Eukaryotic Organisms are organisms with cells containing nuclei.
More in-depth information is available from the following sites: - What is Silencing RNA?
Silencing RNA are RNA molecules that are involved in inhibiting gene expression -
the process of negating the instructions contained in the DNA, thereby
preventing the protein molecule that should be produced from the DNA gene sequence from being synthesized.
More in-depth information is available from the following sites: - What is siRNA?
siRNA stands for Small interfering RNA. Most notably, siRNA is involved in the RNA interference (RNAi) pathway,
where it interferes with the expression of a specific gene. - What is miRNA?
miRNA stands for Micro RNA. They regulate gene expression. - What is miRNA?
miRNA stands for Micro RNA. They regulate gene expression. - Why is so much emphasis placed on the species
Arabidopsis thaliana?
Amongst other reasons, with about 157 million base pairs and five chromosomes,
Arabidopsis thaliana has one of the smallest genomes among plants.
It was the first plant genome to be sequenced, completed in 2000 by the Arabidopsis Genome Initiative.
Much work has been done to assign functions to its 27,000 genes and the 35,000 proteins they encode.
More information at Wikipedia.
Bioinformatics
- What is Bioinformatics?
Bioinformatics deals with the storage, transport, and analysis of information
encoded in the genes and how this information affects the universe of biological processes.For more information try the following link
Wikipedia site. - How will Bioinformatics help me working with small RNA?
DNA defines the synthesis of protein by way of an RNA intermediary. Documenting, controlling, and modifying this process
is the focus of bioinformatics. Sequencing data is one of the most abundant forms of data
available and bioinformatics greatly helps in the storage and analysis of the large amounts of generated data. - Will I need to do programming for Bioinformatics?
One of the aims of Sirocco is to develop tools that assist biologists without having to
write programs from scratch. Links to these are listed here. try and entice holders of biological data to use Bioinformatics tools. Some
knowledge of languages like Perl and possibly R could also be useful. - What books can I use to learn about Bioinformatics?
There are a number of books on the subject of Bioinformatics. Some of the ones that are available are listed here:- Introduction to Bioinformatics – for Beginners focussing on theory
- Bioinformatics for Dummies – for Beginners with some theory but more practical guides on interacting with GEO for instance
- Beginning Perl for Bioinformatics – for those thinking about programming in the Perl language
- R Programming for Bioinformatics (Chapman & Hall/CRC Computer Science & Data Analysis) – for those thinking about using the R Statistical language
- I hear about Perl being used for Bioinformatics – what is it and how do I go about
using this?Perl is the main language used in Bioinformatics – a good starting point whether learning the language from
scratch or just using its bioinformatics package is
http://www.bioperl.org/wiki/Getting_Started.
The following book is also suggested: Beginning Perl for Bioinformatics – for those thinking about programming in the Perl language. - I hear about R being used for Bioinformatics – what is it and
how do I go about using this?R is a very useful statistical analysis package. For more info visit the main page
and don’t forget its Bioconductor package that provides useful Bioinformatics information.
The following book is also suggested: R Programming for Bioinformatics (Chapman & Hall/CRC Computer Science & Data Analysis) – for those thinking about using the R Statistical language
Tools And Data
- What are all these data format types?
Several different biologists and programmers invented several ways to format sequence data in computer files,
and so bioinformaticians must deal with these different formats. There are many such formats,
perhaps as many as 20 in regular use for DNA alone.
The main two data formats in use here are FASTA and GenBank, though the FASTQ file format is also used here.- FASTA – The most common and a simple sequence format.
The first line of a sequence entry consists of ‘>’, followed by an identifier, which contains
no whitespace. This can be followed by whitespace and a comment or description. This
first line is referred to as the comment or description line. One or more sequence data lines
may follow. The length of the sequence data lines may not be constant. - SOFT format files are a flexible file format used by GEO. So files downloaded in this format
may not be straightforward sequence data.
More information at the GEO site. - Genetic Sequence Data Bank (GenBank) format – includes lots of information in addition to the sequence,
and important because Genbank itself is a collection of
all publically annotated DNA sequences. - FASTQ combines both the sequence data from a FASTA file and the quality data in one file.
In addition there may be differences depending on the type of machine used to generate the sequences.
- FASTA – The most common and a simple sequence format.
- Can I use these tools for my sequencing data?
The answer to this depends on the nature of the species
being sequenced and then the format that the data is in.
Read the answer below for more information on data formats. - What format does my data have to be in to use these tools?
The sets of tools provided on the tools page which accept user input data
have their required file formats discussed here.- The UEA plant sRNA toolkit – This provides a number of tools that all work with
the FASTA format. However there is a href=”http://srna-tools.cmp.uea.ac.uk/cgi-bin/input_form.cgi?tool=adaptor”>
Sequence file pre-processing tool which reads in a
FASTA or FASTQ format file along with 5′ (optional) and 3′
adaptor sequences and will create a FASTA format file with adaptors
removed which can then be used with other tools on the site. - WMD tools – although designed for plants in general these tools have different
inout file requirements depending on the species being processed. See the
WMD procedure
page for more information. - SHORE is an analysis suite for Illumina short read data.
We will be providing a tool to convert sequence data in SOFT format files
provided by the GEO site into the more
tool friendly FASTA in the first quarter of 2009. - The UEA plant sRNA toolkit – This provides a number of tools that all work with
- Can I use these tools for my microarray data?
Some MicroRNA (miRNA) datasets can be used on some of the tools for all of the toolsets listed on the site. - How do I transfer my data to this site?
Data can be made available on this site via links to the GEO database if it has been made publically available
there. Data to be made available on the site can also be uploaded directly to our servers. Please note
that data on this site can be provided to the general public or restricted to SIROCCO partners only.
See our data upload HOWTO for more information. - What is GEO/Gene Expression Omnibus?
The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes microarray and other forms of high-throughput data submitted by the scientific community. In addition to data storage, a collection of web-based interfaces and applications are available to help users query and download the experiments and gene expression patterns stored in GEO.
From a practical point of view this means sequence data can be accessed via this website. Be aware that
they use the SOFT file format which is designed to be flexible and hold many different types of data.
Sequence data can be converted into the FASTA format – a tool to do this will be provided in Q1 2009.
Further queries regarding GEO can be found here.
Back to the top - Can I use these tools for any species of organism?
The sets of tools provided on the tools page which accept user input data
have their required file formats discussed here. So far these are all plant
related tools, some specific to Arabidopsis thaliana. - Can I analyse the output of this site in G browse?
Not without some leg work. GBrowse applications store data in the GFF3 format
(or GFF2 format databases for older installations).
Data from many of these tools are
provided as CSV files, and tools may have to be written to provide appropriate conversion.
GFF3 format details are here. - Does this site give me a size profile of my small RNAs?
Currently, none of the tools listed here do so. (TBC) - Does this site remove the adaptors from the small RNA reads?
The UEA Plant sRNA toolkit contains a target=”_blank”>
Sequence file pre-processing tool which removes
adaptors. - Does this site tell me the frequency of the first nucleotide?
Currently, none of the tools listed here do so, but the facility may be introduced in WMD3.(TBC)