SIROCCO Data and Bioinformatics Logo

A FASTA Maker – SOFT to FASTA file conversion utility

Many toolkits (for example the UEA Small Plant and Animal toolkit) require data to be provided in FASTA format. However sequence data sets from GEO often come in the SOFT format, and hence need to be converted before they can be used by the many tools used in the bioinformatic sphere.

This utility operates in two modes – it can either transform local SOFT files containing sequence data into FASTA data or, given an accession number, can fetch a dataset from GEO. It also allows the creation of a ZIP version of the newly created FASTA file and a choice or redundant or non-redundant FASTA formats.

Starting the Application (from the Web)

The tool requires Java to be installed and running on your machine. If it is not installed, it can be downloaded from the main Java site. Java is available on Microsoft Windows, Apple OSX and Linux variations.

Click here to launch the A FASTA Maker utility.

If Java is installed the application can be run by the clicking on the link above.  You can also launch an application from a command prompt by typing “javaws http://www.sirocco-project.eu/toolsite/afastamaker/launch.jnlp”.

To start the application without having to revisit this page you can also save the above file to your Desktop to prevent having to re-visit this web page (choose Right-Click and save as).

Using The Application

A FASTA Maker can either transform local SOFT files containing sequence data into FASTA data or, given an accession number, can fetch a dataset from GEO.  Choose the tab appropriate to your source of sequence data.

From A SOFT file

A Fasta Maker screenshot

A Fasta Maker screenshot (source from file interface)

To convert from a local file ensure the ‘From Soft File’ option has been selected.

Choose you input file. A preview of the input file will appear in the Preview area. Please have a look at this to ensure that this is the file you require and that the lines of sequence data are of the format

SEQUENCE      n

where SEQUENCE is letters A,C,T,G or U  and n is a number representing count data. If the sequence data is not the first item on the line this program can not be used. If there are multiple columns of digits after the sequence then you may need to adjust the ‘Column With FASTA Counts’ setting. See the Conversion Options section for more information. After you have chosen your output file you can proceed with the conversion.

From GEO with an Accession Number

To convert from a local file ensure the ‘From GEO’ option has been selected.

Enter the accession code of the data, complete with GSM prefix, into the Accession textfield. Press the Preview GEO Data and if the program can connect to the GEO website successfully text should appear in the Preview area. Please have a look at this to ensure that this is the file you require. In particular make sure this is a SOFT file and not code for a HTML file, which would indicate the Accession Number is wrong. A successful preview should show something like the diagram below.

A FASTA Maker previewing GEO Data

A FASTA Maker previewing GEO Data

Before converting you should also verify that the lines of sequence data (after the initial header) are of the format

SEQUENCE      n

where SEQUENCE is letters A,C,T,G or U  and n is a number representing count data. If the sequence data is not the first item on the line this program can not be used. If there are multiple columns of digits after the sequence then you may need to adjust the ‘Column With FASTA Counts’ setting. See the Conversion Options section for more information. After you have chosen your output file you can proceed with the conversion.

Conversion Options

The following options are available to the user.

  1. Redundant format – by default A FASTA Maker creates non-redundant files where each sequence on a line occurs once for each time it occurs in the SOFT file. Some programs require count data to be assessed, so the sequence needs to be included the exact number of times in the file as its count. Selecting this option will produce these much larger ‘redundant’ fasta files.
  2. Create ZIP Archive – If the file is to be uploaded or emailed somewhere creating a ZIP archive creates a smaller file as well as the generated FASTA file. The file will be the same file name with ‘.zip’ added at the end.
  3. Column With FASTA Counts – The SOFT specification is a container for data which means the type of content may vary between SOFT files. A FASTA Maker utility assumes the data in the first column after the sequence is the count data.  Some SOFT files may have the count data in another column, e.g. the count data may be in column 2. The preview pane allows you to assess the file values and set the column information as required.

Issues and Feature Requests

If you have any issues with A FASTA Maker or feature requests for future versions of this software please contact the author Rishi Nag at rn202@cam.ac.uk