SIROCCO Data and Bioinformatics Logo

Frequently Asked Questions

Silencing RNAs

Questions about some of the background areas related to Silencing RNA’s.

Bioinformatics

Some basics of bioinformatics are provided here.

Tools and Data Formats

Questions related to the tools and data sets provided on this website.

Small RNAs

  1. What is RNA?
    RNA stands for Ribonucleic acid, which is a long chain molecule
    made up of nucleotides that operate with cells of living creatures.
    In general, cells have a nucleus and
    inside the nucleus is the DNA – a long complex molecule consisting of thousands of genes.
    Cells need to produce enzymes that can be used to produce the correct proteins that allow
    the cell to perform its function. The instructions for the enzymes are contained in the genes in the DNA, inside the
    nucleus. RNA provides the mechanism for the instructions encoded in
    the genes to be transferred to the cell and to enable the required proteins to
    be produced, or synthesized.
    Some more places for a fuller scientific explaination is
    available from the following sources.

    1. Wikipedia
    2. Nobelprize site

    Back to the top

  2. What are Eukaryotic Organisms?
    Eukaryotic Organisms are organisms with cells containing nuclei.
    More in-depth information is available from the following sites:

    1. Wikipedia entry
    2. Cell basics

    Back to the top

  3. What is Silencing RNA?
    Silencing RNA are RNA molecules that are involved in inhibiting gene expression -
    the process of negating the instructions contained in the DNA, thereby
    preventing the protein molecule that should be produced from the DNA gene sequence from being synthesized.
    More in-depth information is available from the following sites:

    1. History and Overview of RNA Interference and Gene Silencing

    2. Google Books link to ‘RNA Silencing’ By Esra Galun, Eithan Galun

    Back to the top

  4. What is siRNA?
    siRNA stands for Small interfering RNA. Most notably, siRNA is involved in the RNA interference (RNAi) pathway,
    where it interferes with the expression of a specific gene.

    1. Wikipedia entry

    Back to the top

  5. What is miRNA?
    miRNA stands for Micro RNA. They regulate gene expression.

    1. Wikipedia entry
    2. miRNA Resource Site
    3. miRNA Animation

    Back to the top

  6. What is miRNA?
    miRNA stands for Micro RNA. They regulate gene expression.

    1. Wikipedia entry
    2. miRNA Resource Site
    3. miRNA Animation

    Back to the top

  7. Why is so much emphasis placed on the species
    Arabidopsis thaliana?

    Amongst other reasons, with about 157 million base pairs and five chromosomes,
    Arabidopsis thaliana has one of the smallest genomes among plants.
    It was the first plant genome to be sequenced, completed in 2000 by the Arabidopsis Genome Initiative.
    Much work has been done to assign functions to its 27,000 genes and the 35,000 proteins they encode.
    More information at Wikipedia.

Bioinformatics

  1. What is Bioinformatics?
    Bioinformatics deals with the storage, transport, and analysis of information
    encoded in the genes and how this information affects the universe of biological processes.

    For more information try the following link
    Wikipedia site.

  2. How will Bioinformatics help me working with small RNA?
    DNA defines the synthesis of protein by way of an RNA intermediary. Documenting, controlling, and modifying this process
    is the focus of bioinformatics. Sequencing data is one of the most abundant forms of data
    available and bioinformatics greatly helps in the storage and analysis of the large amounts of generated data.
  3. Will I need to do programming for Bioinformatics?
    One of the aims of Sirocco is to develop tools that assist biologists without having to
    write programs from scratch. Links to these are listed here. try and entice holders of biological data to use Bioinformatics tools. Some
    knowledge of languages like Perl and possibly R could also be useful.
  4. What books can I use to learn about Bioinformatics?
    There are a number of books on the subject of Bioinformatics. Some of the ones that are available are listed here:

  5. I hear about Perl being used for Bioinformatics – what is it and how do I go about
    using this?

    Perl is the main language used in Bioinformatics – a good starting point whether learning the language from
    scratch or just using its bioinformatics package is
    http://www.bioperl.org/wiki/Getting_Started.
    The following book is also suggested: Beginning Perl for Bioinformatics – for those thinking about programming in the Perl language.

  6. I hear about R being used for Bioinformatics – what is it and
    how do I go about using this?

    R is a very useful statistical analysis package. For more info visit the main page
    and don’t forget its Bioconductor package that provides useful Bioinformatics information.
    The following book is also suggested: R Programming for Bioinformatics (Chapman & Hall/CRC Computer Science & Data Analysis) – for those thinking about using the R Statistical language

Back to the top

Tools And Data

  1. What are all these data format types?
    Several different biologists and programmers invented several ways to format sequence data in computer files,
    and so bioinformaticians must deal with these different formats. There are many such formats,
    perhaps as many as 20 in regular use for DNA alone.
    The main two data formats in use here are FASTA and GenBank, though the FASTQ file format is also used here.

    • FASTA – The most common and a simple sequence format.
      The first line of a sequence entry consists of ‘>’, followed by an identifier, which contains
      no whitespace. This can be followed by whitespace and a comment or description. This
      first line is referred to as the comment or description line. One or more sequence data lines
      may follow. The length of the sequence data lines may not be constant.
    • SOFT format files are a flexible file format used by GEO. So files downloaded in this format
      may not be straightforward sequence data.
      More information at the GEO site.
    • Genetic Sequence Data Bank (GenBank) format – includes lots of information in addition to the sequence,
      and important because Genbank itself is a collection of
      all publically annotated DNA sequences
      .
    • FASTQ combines both the sequence data from a FASTA file and the quality data in one file.

    In addition there may be differences depending on the type of machine used to generate the sequences.

    Back to the top

  2. Can I use these tools for my sequencing data?
    The answer to this depends on the nature of the species
    being sequenced and then the format that the data is in.
    Read the answer below for more information on data formats.
  3. Back to the top

  4. What format does my data have to be in to use these tools?
    The sets of tools provided on the tools page which accept user input data
    have their required file formats discussed here.

    • The UEA plant sRNA toolkit – This provides a number of tools that all work with
      the FASTA format. However there is a href=”http://srna-tools.cmp.uea.ac.uk/cgi-bin/input_form.cgi?tool=adaptor”>
      Sequence file pre-processing tool which reads in a
      FASTA or FASTQ format file along with 5′ (optional) and 3′
      adaptor sequences and will create a FASTA format file with adaptors
      removed which can then be used with other tools on the site.
    • WMD tools – although designed for plants in general these tools have different
      inout file requirements depending on the species being processed. See the
      WMD procedure
      page
      for more information.
    • SHORE is an analysis suite for Illumina short read data.

    We will be providing a tool to convert sequence data in SOFT format files
    provided by the GEO site into the more
    tool friendly FASTA in the first quarter of 2009.

    Back to the top

  5. Can I use these tools for my microarray data?
    Some MicroRNA (miRNA) datasets can be used on some of the tools for all of the toolsets listed on the site.

    Back to the top

  6. How do I transfer my data to this site?
    Data can be made available on this site via links to the GEO database if it has been made publically available
    there. Data to be made available on the site can also be uploaded directly to our servers. Please note
    that data on this site can be provided to the general public or restricted to SIROCCO partners only.
    See our data upload HOWTO for more information.

    Back to the top

  7. What is GEO/Gene Expression Omnibus?
    The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes microarray and other forms of high-throughput data submitted by the scientific community. In addition to data storage, a collection of web-based interfaces and applications are available to help users query and download the experiments and gene expression patterns stored in GEO.
    From a practical point of view this means sequence data can be accessed via this website. Be aware that
    they use the SOFT file format which is designed to be flexible and hold many different types of data.
    Sequence data can be converted into the FASTA format – a tool to do this will be provided in Q1 2009.

    Further queries regarding GEO can be found here
    .
    Back to the top
  8. Can I use these tools for any species of organism?
    The sets of tools provided on the tools page which accept user input data
    have their required file formats discussed here. So far these are all plant
    related tools, some specific to Arabidopsis thaliana.

    • The UEA plant sRNA toolkit – processes plant RNA.
    • WMD tools target=”_blank”>plants are supported.
    • SHORE – works with Arabidopsis thaliana
    • At-TAX – works with Arabidopsis thaliana

    Back to the top

  9. Can I analyse the output of this site in G browse?
    Not without some leg work. GBrowse applications store data in the GFF3 format
    (or GFF2 format databases for older installations).
    Data from many of these tools are
    provided as CSV files, and tools may have to be written to provide appropriate conversion.
    GFF3 format details are here.

    Back to the top

  10. Does this site give me a size profile of my small RNAs?
    Currently, none of the tools listed here do so. (TBC)

    Back to the top

  11. Does this site remove the adaptors from the small RNA reads?
    The UEA Plant sRNA toolkit contains a target=”_blank”>
    Sequence file pre-processing tool which removes
    adaptors.

    Back to the top

  12. Does this site tell me the frequency of the first nucleotide?
    Currently, none of the tools listed here do so, but the facility may be introduced in WMD3.(TBC)

Back to the top