#########################
How to cluster MS/MS data
#########################
This tutorial explains how to cluster MS/MS data using the
`spectra-cluster-cli`_ command line tool.
.. _spectra-cluster-cli: https://github.com/spectra-cluster/spectra-cluster-cli
Preparing your data
===================
The peak list files must be present in the **MGF format**.
Use `ProteoWizard's msconvert`_ tool to convert
your peak list data from other formats. For raw files,
do not forget to enable the **"peak picking"** filter.
.. _ProteoWizard's msconvert: http://proteowizard.sourceforge.net/
For this tutorial, let's assume that your data looks like this::
C:\ms_data\
sample_1.mgf
sample_2.mgf
sample_3.mgf
Searching your data
===================
Many of the spectra-cluster toolsuite's analysis tools work with
identification data.
To add identification data, first search the above created MGF files with
the search engine of your choice (currently supported are MSGF+,
X!Tandem, MSAmanda, and Scaffold).
**Important:** You must use the above created MGF files as input for your
search engine. Otherwise the identification data cannot be mapped to the
spectra correctly.
If you use **X!Tandem** you must enable the output in the mzIdentML format
by adding the following option to your XML configuration file::
yes
After this step, your files should be prepared similar to this::
C:\ms_data\
sample_1.mgf
sample_1.mzid
sample_2.mgf
sample_2.mzid
sample_3.mgf
sample_3.mzid
Merging identification data
===========================
For the spectra-cluster pipeline tools to integrate your identification data
with the clustering results the identification data needs to be merged with
your MGF files.
This is done using the `mgf_search_result_annotator`_ tool which you can
download as part of the `spectra-cluster-py (download link)`_ tools.
For this example, we assume that you have copied the
*mgf_search_result_annotator.exe* into the directory containing your data.
The `mgf_search_result_annotator`_ tool is a command line tool. To use it,
you first need to open the command line. On Windows, for example, press the
[Windows Key] + [r], enter ``cmd`` and press [Enter].
Next, you have to navigate to the directory containing your search results. In
our example this would look like this::
C:\Documents and Settings\User> cd \
C:\>cd ms_data
C:\ms_data>
Now, to combine your search results with your peak list files you have to
execute the following command (**Note:** you have to adapt the format
to your search engine. For more information see the
`mgf_search_result_annotator`_ documentation)::
C:\ms_data>mgf_search_result_annotator.exe --format MSGF+ --input sample_1.mgf --search sample_1.mzid --output sample_1_annotated.mgf --fdr 0.01 --decoy_string "DECOY"
This command has to be launched for every file (simply adapt the input filename, search result filename, and output name).
Running the clustering
======================
.. warning::
The spectra-cluster-gui is currently out of date. To get the latest version of the spectra-cluster
algorithm, please use the spectra-cluster-cli tool or the
:doc:`Proteome Discoverer node `.
Running the actual clustering job is most likely the easiest job once the
MGF files are created.
For this tutorial, we will use the `spectra-cluster-gui`_ to run the
spectra-cluster algorithm.
You can download the latest release of the `spectra-cluster-gui`_ `here`_. In able to launch
the `spectra-cluster-gui`_ you need to have `Java`_ installed on your computer.
.. _spectra-cluster-gui: https://github.com/spectra-cluster/spectra-cluster-gui
.. _Java: https://www.java.com
Once you have downloaded the `spectra-cluster-gui`_ tool, simply extract the zip file
into any folder and double-click the ``spectra-cluster-gui-[VERSION].jar`` file (**Note:**
[Version] will depend on the current version of the `spectra-cluster-gui`_ tool).
.. _here: https://github.com/spectra-cluster/spectra-cluster-gui/releases
.. _mgf_search_result_annotator: http://spectra-cluster-py.readthedocs.io/en/latest/tools/mgf_search_result_annotator.html
.. _spectra-cluster-py (download link): https://github.com/spectra-cluster/spectra-cluster-py/
After launching the tool, you need to select "Cluster new dataset" to cluster your files.
.. image:: ../_static/spectra-cluster-gui_screen1.png
Next, simply select the **annotated** MGF files as input files for the clustering.
.. image:: ../_static/spectra-cluster-gui_screen2.png
Clustering settings
~~~~~~~~~~~~~~~~~~~
The default values set in this screen should be working for the vast majority of datasets. The two
values that **should be adapted** are:
* Precursor tolerance: Set this value to the precursor tolerance that you would use for your search
* Fragment tolerance: Again, set this value to the fragment ion tolerance you would use for your search
**Note**: If you used a labelled approach you also need to select the appropriate reporter ion type for the
"Remove reporter ion peaks" option.
.. image:: ../_static/spectra-cluster-gui_screen3.png
Launching the clustering
~~~~~~~~~~~~~~~~~~~~~~~~
Simply select where your outputfile should be saved. All other values can generally be left at their default.
.. image:: ../_static/spectra-cluster-gui_screen4.png