MS-DIAL tutorial

Last edited in March. 12, 2020

ABSTRACT

MS-DIAL was launched as a universal program for untargeted metabolomics that supports multiple instruments (GC/MS, GC/MS/MS, LC/MS, and LC/MS/MS) and MS vendors (Agilent, Bruker, LECO, Sciex, Shimadzu, Thermo, and Waters). Common data formats such as netCDF (AIA) and mzML, can also be managed in our project. In addition, we released several MSP files including both EI- and MS/MS spectra as a ‘start-up kit’. Moreover, MS-DIAL internally has a version of Fiehn lab’s GC/MS database (oriented by FAME RI index), and in silico retention time- and MS/MS database for LC/MS/MS based lipidomics. The isotope labeled tracking can also be executed in LC/MS project.
It features (1) spectral deconvolution for both GC/MS and data-independent MS/MS, (2) streamlined criteria for peak identification, (3) support of all data processing steps from raw data import to statistical analysis, and (4) user-friendly graphic user interface.
MS-DIAL has been developed as the collaborative work between Prof. Masanori Arita team (RIKEN) and Prof. Oliver Fiehn team (UC Davis) supported by the JST/NSF SICORP “Metabolomics for the low carbon society” project.

Hiroshi Tsugawa
RIKEN Center for Sustainable Resource Science
hiroshi.tsugawa@riken.jp

Lead developer: Hiroshi Tsugawa (RIKEN)
Main contributors: Diego Pedrosa (UC Davis), Tomas Cajka (Institute of Physiology CAS), Ipputa Tada (NIG), Haruki Uchino (Keio)

alt MS-DIAL screenshot

MS-DIAL tutorial

Chapter 1: General introduction of MS-DIAL
Section 1-1: Downloading file converter to convert a vendor’s format into ABF
Section 1-2: Check the conditions for file conversion
Section 1-3: File conversion
Section 1-4: Start up a project of MS-DIAL
Section 1-5: Centroid or Profile? The content of ABF data
Section 1-6: Database (MSP or Text) for compound identification
Section 1-6-1: MSP format for precursor- and MS/MS library
Section 1-6-2: Adduct ion format
Section 1-6-3: Text format library for retention time and accurate mass search (post identification)
Section 1-6-4: MSP format as GC/MS library
Section 1-6-5: Alkane- or FAME retention time dictionary for the calculation of retention index
Chapter 2: LC/MS/MS (data independent MS/MS) project with MS-DIAL’s LipidBlast database (in silico retention time and MS/MS library for lipids)
Section 2-1: Starting up your project
Section 2-2: Importing ABF files
Section 2-3: Setting parameters
Section 2-3-1: Data collection tab
Section 2-3-2: Peak detection tab
Section 2-3-3: MS2Dec tab
Section 2-3-4: Identification tab
Section 2-3-5: Adduct tab
Section 2-3-6: Alignment tab
Section 2-4: Data curation for the reduction of false positive identifications
Chapter 3: LC/MS or LC/MS/MS (data dependent MS/MS) project with user-defined MS/MS database (MSP format) in MS-DIAL
Section 3-1: Starting up your project
Section 3-2: Importing ABF files
Section 3-3: Setting parameters
Section 3-3-1: Data collection tab
Section 3-3-2: Peak detection tab
Section 3-3-3: MS2Dec tab
Section 3-3-4: Identification tab
Section 3-3-5: Adduct tab
Section 3-3-6: Alignment tab
Section 3-4: Data curation to reduce false positive identifications
Chapter 4: Nominal and accurate GC/MS based metabolic profiling in MS-DIAL
Section 4-1: Starting up your project
Section 4-2: Importing ABF files
Section 4-3: Setting parameters
Section 4-3-1: Data collection tab
Section 4-3-2: Peak detection tab
Section 4-3-3: MS1Dec tab
Section 4-3-4: Identification tab
Section 4-3-5: Alignment tab
Section 4-3-6: Filtering tab
Section 4-4: Data curation for the reduction of false positive identifications
Chapter 5: Graphical user interface of MS-DIAL
Section 5-1: Mouse function
Section 5-2: Overview of the MS-DIAL main window in LC/MS (precursor oriented) project
Section 5-3: Overview of the MS-DIAL main window in GC/MS project
Section 5-4: Compound search for the result curation of peak identification
Section 5-5: Normalization and statistical analysis in MS-DIAL
Section 5-6: Export
Section 5-7: Save
Section 5-8: Open
Section 5-9: Option
Section 5-10: Chromatogram viewers
Section 5-11: Explanation of buttons and tabs of MS-DIAL
Section 5-11-1: Display filters
Section 5-11-2: Tabs of MS-DIAL
Section 5-12: Search
Section 5-12-1: MS/MS fragment searcher
Section 5-12-2: Amalgamation of different polarity peak list
Section 5-13: Pathway map
Chapter 6: Other utilities of MS-DIAL
Section 6-1: Link to MS-FINDER for searching unknowns
Section 6-2: Bridge to MRMPROBS for DIA-MS or GC/MS data sets
Section 6-3: Molecular spectrum networking for metabolite annotations
Chapter 7: MS-DIAL isotope tracking function
Section 7-1: Starting up your project
Section 7-2: Importing ABF files
Section 7-3: Setting parameters
Section 7-3-1: Data collection tab
Section 7-3-2: Peak detection tab and MS2Dec tab
Section 7-3-3: Identification tab
Section 7-3-4: Adduct tab
Section 7-3-5: Alignment tab
Section 7-3-6: Isotope tracking tab
Section 7-4: Data curation for isotope tracking result
Chapter 8: Project for all ion fragmentation with multiple collision energies
Section 8-1: Starting up your project
Section 8-2: Importing ABF files
Section 8-3: Setting parameters
Section 8-3-1: Data collection tab
Section 8-3-2: Peak detection tab
Section 8-3-3: MS2Dec tab
Section 8-3-4: Identification tab
Section 8-3-5: Adduct tab
Section 8-3-6: Alignment tab
Section 8-4: Result checking
Chapter 9: Graphical user interface of MS-DIAL in AIF mode
Section 9-1: Mass spectrum viewers in peak spot and alignment viewer
Section 9-2: MS/MS chromatogram viewer in peak viewer
Section 9-3: Compound search for the curation of peak identification
Section 9-4: Aligned sample table viewer for checking each sample
Section 9-5: Bulk export to MS-FINDER from peak and alignment spot table
Section 9-6: MS viewer for correlation based deconvolution
Chapter 10: LC-Ion mobility project
Section 10-1: Convert a vendor's format into IBF
Section 10-2: Starting up your project
Section 10-3: Importing IBF files
Section 10-4: Setting parameters
Section 10-4-1: Data collection tab
Section 10-4-2: Peak detection tab and MS2Dec tab
Section 10-4-3: Identification tab
Section 10-4-4: Adduct tab
Section 10-4-5: Alignment tab
Section 10-4-6: Mobility tab
Section 10-5: Setting parameters

Chapter 1 General introduction of MS-DIAL

The current MS-DIAL program provides a stream pipeline for untargeted metabolomics. Figure 1 shows the overview of the workflow. (1) The first step of MS-DIAL based metabolomics is to convert your vendor’s format into ABF (analysis base file) format or mzML format by means of the Reifycs file converter or ProteoWizard msconvert, respectively; we describe this in the first section of this chapter. (2) The second step is to choose your project: the current MS-DIAL program supports the pipelines for GC/MS, LC/MS, LC/MS/MS (DDA: data dependent acquisition), and LC/MS/MS (DIA: data independent acquisition (SWATH or All ions)) data sets. After data processing which includes peak picking, deconvolution, compound identification, and peak alignment, MS-DIAL provides several normalization methods (including LOWESS) and a multivariate analysis by principal component analysis (PCA). (3) Finally, for further analysis by other programs, this program can export your result as table format (for SIMCA-P, MetaboAnalyst, and MetFamily etc.), and as several spectral formats including NIST, MassBank, and Mascot formats for compound identifications by MS-FINDER, CSI:FingerID, CFM-ID, MetFrag, and MetFamily etc. For the parameter explanations including the description of MS-DIAL algorithms, see also ‘MS-DIAL mathematics’ which can be downloaded at https://systemsomicslab.github.io/compms/msdial/download/mathematics/MS-DIAL%20FAQ-vs2.pdf.

alt

Section 1-1

Downloading file converter to convert a vendor’s format into ABF

Go to http://www.reifycs.com/AbfConverter/index.html
Check the requirements and license terms, and download the converter. (Figure 2)

* File converter is freely available.

alt

Section 1-2

Check the conditions for file conversion

To convert files of some MS vendors including Bruker, LECO, Shimadzu, Thermo, and Waters, the specific data access library needs to be installed on your PC (Table 1).

Table 1: Summary of PC condition required for file conversion

Vendor	Formats	Required
Agilent	.D	None, but the files from Chemstation should be converted to netCDF
Bruker	.D	CompassXtract
LECO	.PEG	All PEG files should be first converted to netCDF (AIA).
Sciex	.WIFF	None
Shimadzu for GC/MS	.QGD	GCMS solution
Shimadzu for LC/MS	.LCD	LCMS solutions
Thermo	.RAW	MSFileReader
Waters	.RAW	MassLynx Raw Data Reader Interface Library
netCDF	.CDF	Microsoft Visual J# 2.0

FAQ (see the latest at http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/index3.html)

Agilent: Almost all of Agilent’s raw data (.D) can readily be converted without any additional steps. However, the only exception is GCMS data sets obtained from ChemStation, which cannot be directly converted to ABF. Therefore, you have to convert your files into netCDF (AIA) in ChemStation. Then, convert your AIA files into ABF using our file converter.
Bruker: Go to https://www.bruker.com/service/support-upgrades/software-downloads/mass-spectrometry.html. You have to register first and obtain the installer. Finally, download and install the version compatible with your operating system environment (32-bit or 64-bit).
LECO: All of your data has to be converted to netCDF (AIA) first. Then, convert them into ABF using our file converter.
Shimadzu: In case that you had a direct conversion of raw data, please convert your data into the common data format (netCDF or mzML). For GC/MS, convert your data into netCDF using GCMS solution. For LC-ITTOF/MS, convert your data into mzML using ProteoWizard (http://proteowizard.sourceforge.net/index.shtml). Then, convert them into ABF.
Thermo: The following link explains how to install MSFileReader. https://github.com/frallain/pymsfilereader/tree/master/MSFileReader. For GC/MS data, you may have to convert your data into netCDF. Then, convert them into ABF using our converter. We validated the direct conversion of GC-QExactive raw data to ABF, but some GC/MS data (DSQ etc) had to be converted to netCDF first.
Waters: 1. Download MassLyncs Raw Data Reader Interface Library (http://www.waters.com/waters/supportList.htm?cid=511442&filter=documenttype|DWNL&locale=en_US). 2. Unzip the archive file ‘watersrawsdkredist.zip’ and copy ‘MassLynxRaw.dll’ (64-bit) to ‘ABFCvtSvrWtrRw’ folder in the ABF converter. Note that 32–bit environments are not supported yet for file conversion.
NetCDF: When you get an error about the J# dependency problem, download and install the Microsoft Visual J# 2.0 library at https://www.microsoft.com/en-us/download/details.aspx?id=15468.

Section 1-3

File conversion

Start “AnalysisBaseFileConverter.exe”.
Drag & drop MS vendor files into this program.
Click “Convert”.
The ABF files are generated in the same directory as the raw data files. (Figure 3)

alt

Section 1-4

Start up a project of MS-DIAL

This tutorial demonstrates four projects, (1) GC/MS, (2) LC/MS or LC/MS/MS (DDA: data dependent acquisition), (3) LC/MS/MS (data independent acquisition), and (4) LC-Ion mobility for the explanation of parameters and required files. In this section, three projects are summarized and you will find a minimum requirement for these processes. The details for LC/MS/MS (DIA), LC/MS/MS (DDA), GC/MS, and LC-Ion mobility processing are described in Chapter 2, Chapter 3, Chapter 4, and Chapter 10 respectively.

alt alt alt

* Prepare the above dictionary file as tab-delimited text.
* In the case of SWATH data-independent analysis, the experiment file can be made at PeakView (Show->sample information). Do not change the column orders. The word “SCAN” should be kept.

alt

Section 1-5

Centroid or Profile? The content of ABF data

One of the problems of ABF file is that we have to determine the data type, i.e. centroid or profile, by ourselves. If your data type is ‘centroid’, the MS-DIAL program does not have to execute the process of spectral centroiding. On the other hand, MS-DIAL should perform the centroiding process for ‘profile’ mode data. We have validated the types for several instruments as follows.
(what is the difference between centroid and profile? http://blog.acdlabs.com/elucidation/2008/03/what-is-the-dif.html)

✓ SCIEX QTOF: Profile
✓ Thermo Q-Exactive: Profile
✓ Thermo LTQ-Orbitrap: Profile for MS1 and Centroid for MS/MS
✓ All nominal GC/MS data: Centroid

For the other instruments, the data type of ABF depends on your MS settings.

✓ Agilent: The ABF converter will recognize your data type as ‘centroid’ if you set the export type as ‘centroid’ or ‘both’ in your MassHunter software. Otherwise, ‘profile’ is assumed.

The data types of Bruker LC-QTOF and FT-ICR, and Waters LC-Xevo QTOF and Synapt G2 were mostly centroid: you have to select ‘centroid’ as data type in the MS-DIAL program. However, there is no guarantee yet.

✓ mzML: it depends on the export method by the ProteoWizard program etc.

* It’s much better to check your data type in MS-DIAL as follows before you start your real project:

Start MS-DIAL project as ‘Centroid’ mode by a representative file.
See the MS1 and MS2 spectrum in ‘Peak spot viewer’ and check the ‘shape’ of the spectra.
If the shape appears to require centroiding, restart the project in ‘Profile’ mode.

Section 1-6

Database (MSP or Text) for compound identification

The database formats for GC/MS or LC/MS datasets are described in this section. The main difference between GC/MS (EI-MS) and LC/MS (ESI-MS/MS) is the availability of precursor ion and its MS/MS. While a precursor m/z and its MS/MS are mostly available in ESI (or the other soft ionization)-MS/MS, the molecular ion in EI-MS is difficult to detect owning to the hard ionization. Several MSP files are downloadable as a starter kit for MS-DIAL at http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/index.html.

Section 1-6-1

MSP format for precursor- and MS/MS library

MS-DIAL supports the MSP (http://www.nist.gov/srd/upload/NIST1a11Ver2-0Man.pdf) format in ASCII text (below Figure). In addition, the software can utilize “RETENTIONTIME: ”, “PRECURSORTYPE: ”, and “FORMULA:” information for metabolite identification (cases are ignored). Retention time information must be specified in minutes [min] scale when possible. The adduct ion information, i.e. here ‘Precursor type’, will be used for the adduct ion search algorithm (see section 7).

alt Figure: an example of MSP format library

Section 1-6-2

Adduct ion format

Adduct ion information should be formatted as described in this section: [M+Na]+, [M+2H]2+, [M-2H2O+H]+, [2M+FA-H]-, etc.

The parentheses ‘[’ and ‘]’ must be used to bracket the ion information.
The char + and - are required after ‘]’ and the number must be written before + or -.
When you want to define the organic formula like C6H12O5, you have to write it without any replicate elements or parentheses like [M+C2H5COOH-H] or [M+H+(CH3)3SiOH].
The beginning figure of organic formula like ‘2’H2O is recognized as the H2O × 2. Again, do not use 2(H2O) to specify this.
Sequential equations are acceptable: [2M+H-C6H12O5+Na]2+ (very apt.)
MS-DIAL accepts some abbreviations or common organic formulas for adduct types as follows.
For Acetonitrile: ACN, CH3CN
For Methanol: CH3OH
For Isopropanol: IsoProp, C3H7OH
For Dimethyl sulfoxide: DMSO
For Formic acid: FA, HCOOH
For Acetic acid: Hac, CH3COOH
For Trifluoroacetic acid: TFA, CFCOOH

Section 1-6-3

Text format library for retention time and accurate mass search (post identification)

MS-DIAL also supports tab-delimited text format library for peak identification by means of retention time and MS1 accurate mass information. The identification process is performed after completing peak identification based on MSP format library. This is why we call this identification processing “post identification”. The first row should include header information. The first, second, third, and fourth columns should contain the metabolite name, accurate mass [Da], retention time [min], and adduct ion, respectively. This library can be easily created in Microsoft Excel. Save the spreadsheet in “Tab delimited text format”. This option is useful for internal standard identifications etc. (Even if you don’t have MS/MS libraries, the peak identification based on retention time and accurate mass is available from this option.)

* The minimum requirement for this text library is just ‘metabolite name’ and ‘m/z’ information; i.e. the first two columns are required. Retention time and adduct ion fields provide additional information for MS-DIAL in peak identification and adduct ion picking, respectively.

alt
Figure: an example of text library

Section 1-6-4

MSP format as GC/MS library

MS-DIAL supports the MSP format (http://www.nist.gov/srd/upload/NIST1a11Ver2-0Man.pdf) in ASCII text, same as in section 6-1. MS-DIAL accepts two fields for ‘retention’ information as the reference: “RETENTIONTIME: ” or “RT”, and “RETENTIONINDEX” or “RI”. Retention time information must be specified in minute [min] scale when possible.

alt
Figure: an example of MSP format library for GC/MS

Section 1-6-5

Alkane- or FAME retention time dictionary for the calculation of retention index

Retention time and retention index are used for routine identification of metabolites in GC/MS based metabolomics. The current MS-DIAL program has three options for the use of retention information: 1) retention time, 2) alkane mix based retention index, and 3) FAME (fatty acid methyl ester) mix based retention index.

See the experimental details of alkane and FAME mixtures.
Alkanes: http://www.sciencedirect.com/science/article/pii/S1389172311001848
Alkanes: https://en.wikipedia.org/wiki/Kovats_retention_index
FAMEs: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2805091/

In order to calculate the retention indexes of detected peaks, users have to prepare a dictionary including the pairs of the carbon number and its retention time as tab-delimited text (see next page).

FAQ

By alkane mixture, the calculation of retention index is based on Kovats’s method (temperature programmed chromatography).
By FAME mixture, the calculation is based on Fiehn’s method. Importantly, FAMEs of carbon number C8, C9, C10, C12, C14, C16, C18, C20, C22, C24, C26, C28, and C30 (total 13 FAMEs) must be included in this dictionary to calculate Fiehn’s index. A polynomial regression of fifth order is applied to chromatographic peaks between C9 and C28. The peaks between C8 (and its earlier) and C9 and between C28 and C30 (and its later) are interpolated by a linear regression. The original indexes are based on the retention time (sec) * 1000 of authentic standards.

alt
Figure: examples of Alkanes or Fames dictionaries

Chapter 2 LC-MS/MS (data independent MS/MS) project for lipidomics using in silico retention time and theoretical MS/MS library of lipids

A project dealing with data independent MS/MS acquisition in combination with the in silico retention time and MS/MS databases for lipids is demonstrated. The experimental protocol is described in previous research: http://www.nature.com/nmeth/journal/v12/n6/abs/nmeth.3393.html.

This tutorial uses 23 demonstration files which are downloadable from the below link.
http://prime.psc.riken.jp/?action=drop_index

alt

Experiment summary:
Liquid chromatography: total 15 min run per sample with Waters Acquity UPLC CSH C18 column (100×2.1 mm; 1.7 μm).
Mass spectrometer: SWATH method with negative ion mode.
MS1 accumulation time, 100 ms
MS2 accumulation time, 10 ms
Collision energy, 45 V
Collision energy spread, 15 V
Cycle time, 731 ms
Q1 window, 21 Da
Mass range, m/z 100-1250
Modifier, Ammonium formate (HCOONH₄)

Section 2-1

Starting up your project

alt

File -> new project
Set your project file path to the directory of your ABF files
Select your MS method type as ‘SWATH-MS or conventional all-ions method’
Select the experimental file: ABSciex_Experiment_Information_CSH21Da.txt.
Choose data type profile data for both MS1 and MS/MS
Choose negative ion mode
Choose target omics as ‘Lipidomics’
If you select ‘lipidomics’ project, you do not have to prepare NIST MSP format library since MS-DIAL internally contains the theoretical MS/MS spectra of lipids (and actually, the fragment ions are future evaluated by the decision tree algorithm to provide the proper lipid structure representation.). Instead, you should select lipid subclasses needed by your data sets. On the other hand, when you select ‘metabolomics’ project, your own MSP file will be required for compound identification (see Chapter 3).
* see section 4 of chapter 1 as well for more detail

Section 2-2

Importing ABF files

alt

Select ABF files
If the file is a “quality control (QC)” sample, set the type as QC. Also, please set the type as Blank for blank samples used for feature reductions based on the blank’s peaks.
Edit Class ID of the files to divide sample data into each experimental group (Blank, QC, Group A, Group B, etc.). Sample files are allocated to each group and after analysis is finished the bar chart (or box plot) of each group will be appeared in the pane “Bar chart of aligned spot” of the main window (See Section 2-4).

Note:

Please finalize your file name here otherwise you cannot change it anymore (these finalized file names will be appeared in the pane “File navigator” of the main window after finishing this analysis. See Section 2-4 for details).
If you want to define the injection volume of each file, you can additionally type each volume in the column “Inject. volume (μL)”.
Section 2-3

Setting parameters

Section 2-3-1

Data collection tab

alt

Mass accuracy:
After the peak detection algorithm is applied along the MS axis with a very low threshold, MS-DIAL performs spectral centroiding. By default, mass spectrum of ±0.01 and ±0.025 Da range from each peak top is integrated in MS1 and MS2, respectively. Importantly, this MS2 tolerance value is also used to build the MS/MS chromatogram for a certain m/z trace. The MS/MS chromatograms are dedicated to the MS2Dec deconvolution program.

Data collection parameters:
You can set analysis ranges (RT, MS1 and MS/MS axis). In this demonstration, your expected data range is 0.5-10 min for 100-1250 Da.

Isotope recognition:
As long as you focus on small molecule researches (less than 2000 Da), the maximum charged number can be set to 2. On the other hand, the parameter can be changed to 8 or more to process proteome or snRNA research data.
Check Consider Cl and Br elements if you assume that compounds in your samples contain chloride (Cl) or bromide (Br) as the formula element showing the unique isotopic patterns.

Multithreading:
Please set the count of threads that you want to use. You can check the maximum thread counts in resource monitor. (open task manager->open resource monitor)

Execute retention time corrections:
For detail, visit ‘The tutorial and parameter files for MS-DIAL ALF dataprocessing and spectral library construction methodology’ in the website of MS-DIAL shown below(URL: http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/).

alt

Execute retention time corrections: For detail, visit ‘The tutorial and parameter files for MS-DIAL ALF dataprocessing and spectral library construction methodology’ in the website of MS-DIAL shown below(URL: http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/).

alt

Section 2-3-2

Peak detection tab

alt

MS-DIAL provides two simple thresholds; minimum values for peak width and height. Peaks below these thresholds are ignored (see also MS-DIAL mathematics: https://systemsomicslab.github.io/compms/msdial/download/mathematics/MS-DIAL%20FAQ-vs2.pdf).
It is ideal that users put values here based on your own experience that you got looking at the trend of your data. However FYI (based on our experience) the minimum peak height may be set to 500 as a default value for this demo data of Sciex.
Besides, for FT-ICR or Orbirap data, the minimum peak height may be 50,000 or more.
Minimum peak width indicates a threshold of peak width for filtering. See detail by reading the second slide (title ‘In MS-DIAL program’) below.

The width of mass slice is set here. From our experience, 0.1 and 0.05 are suitable for Q-TOF and Orbitrap, respectively.

Smoothing method:
Linear-weighted moving average is used for the peak detection as default to accurately determine the peak left- and right edges.
The recommended smoothing level is 1-3.
If you already know unwanted m/z peaks because of columns or solvent contaminants, you can specify them in the Exclusion mass list.

A part of https://systemsomicslab.github.io/compms/msdial/download/mathematics/MS-DIAL%20FAQ-vs2.pdf

alt
alt alt alt

Section 2-3-3

MS2Dec tab

alt

The sigma window value is highly affected by the resolution of deconvolutions. A higher value (0.7-1.0) will reduce the peak top resolutions, i.e. the number of resolved peaks will be decreased. On the other hand, a lower value (0.1-0.3) may also recognize many noise chromatographic peaks.
You may be able to set a cut off value to reduce the MS noises (see Section 3-3 of Chapter 3).

Exclude after precursor ion:
If you want to remove the product ions after the focused precursor ion (recommended for metabolomics and lipidomics), check this box.

Keep the isotopic ions until:
In fact, the isotopic patterns in MS1 spectra are frequently disturbed. On the other hand, there are some cases that the isotopic patterns in MS/MS spectra are clearer than that of MS1 spectra, which can be used for the accurate annotation of molecular formula. If you set this parameter as 5 Da, the ions until precursor + 5 Da are kept after MS2Dec algorithm is finished.

Keep the isotopic ions w/o MS2Dec:
The MS2Dec algorithm may sometimes erase the precursor’s isotopic ions due to the mathematics issues. Therefore, you can keep the raw MS/MS spectra only for the precursor’s isotopic ions by checking this option.

Section 2-3-4

Identification tab

alt

MSP file:
In the case that you selected ‘lipidomics’ project, select what you want to annotate in your data sets for lipid profiling. Here, for algae lipidomics, tick the above figure’s lipids: FA, LPC, LPE, PA, PC, PE, PG, PI, PS, MGDG, DGDG, and SQDG.
For this tutorial data, in which ammonium formate was used as modifier, select HCOONH4 (ammonium formate) as Solvent type in the window ‘Lipid database setting’, although nowadays CH3COONH4 (ammonium acetate) is basically used as modifier.

Parameters:
If you put retention time (RT) information in your MSP file, set the RT tolerance value . For example, our internal lipid DB includes the predicted RT information optimized for our 15 min LC method. Here RT time tolerance is set to 1.5 min for this demo data. If suitable RT information is unavailable, set the tolerance 100 (default) or larger (larger than your LC time). The two mass tolerances for MS1 and MS2 are required for the compound search and are dependent on your instrument performance.
The cutoff of the identification score should be greater than 70% or 80% to avoid false positives.

Unless you check neither Use retention time for scoring nor Use retention time for filtering, the value of RT tolerance does not mean anything.

Text file and post identification (retention time and accurate mass based) setting:
If you want to perform “post identification” processing, set your text file in Text file. (Tutorial data: Lipid_Nega_IS_PostIdentification_vs1.txt)
The meanings of parameters are the same as MSP based identification explained above.

Relative abundance cut off:
the mass spectrum peak less than the user-defined value will not be used for the MS/MS similarity calculation.

Only report the top hit:
Since some chromatogram peaks will be annotated as the same compound from the identification algorithm, this option allows us to determine only one candidate from such multiple results by means of the identification score.

Section 2-3-5

Adduct tab

alt

Adduct ion setting: You can tick the adduct ions and charge values to be considered.
* see also the Section 1-6-2 of Chapter 1 for the explanation of how to determine your own adduct ion.

Section 2-3-6

Alignment tab

alt

Parameters:
Refult name will be the name of each alignment shown at the pane ‘Alignement navigator’ in the main window.
If you already have a suitable quality control (QC) data, typically a mixed sample data, then specify the QC file in Reference file. All sample data will be aligned to this QC file.
The RT and MS1 tolerances for peak alignment depend on your chromatographic conditions (see MS-DIAL mathematics for details).
If you want to remove specific peaks that are not fully detected in the alignment, specify the peak count filter. For example, the tutorial data include at least 4 biological replicates with the same peak information and the total number of data is 23. Then, you may set the peak count filter as (4/23)∗100 = 17.4 %. This means peaks will be removed when they include missing values for more than 17.4%.
Moreover, in ‘N% detected in at least one group’, the filtering is done within each sample group. If it is set to 100%, the peaks should be detected in all of samples of a class.

Retention time factor and MS1 factor: These values indicate the importance of either RT or MS1 to compare and evaluate the similarity of the spectra among samples based on RT and MS1 tolerance.

Remove features based on blank information:
If you add information of blank samples and want to reflect the blank features information into the result of the analysis, tick this to evaluate the peaks of blank samples.
Checking this box allows you to edit the three checkboxes, Keep ‘Reference matched’ metabolite features, Keep ‘Suggested (without MS2)’ metabolite features, and Keep removable features and assign the tag.

After checking Remove features based on blank information, you will be able to edit the checkbox ‘blank filter’ on the pane ‘Peak spot navigator’ in the main window so that you can check which features were filtered as removable features unless you uncheck Keep removable features and assign the tag.

Keep ‘Reference matched’ metabolite features:
‘Reference matched’ metabolite means the metabolites by reference libraries (MSP or Text library).

Keep ‘Suggested (without MS2)’ metabolite features:
‘Suggested (without MS2)’ means that metabolites are annotated by the MS1 feature. Keep removable features and assign the tag:
If you check this, even though the features do not exceed the blank feature threshold, the peaks will remain in all result features. But you can confirm the result by using “Blank filter” checkbox of MS-DIAL main window. If you uncheck this, removable features will be deleted and thus that blank features will not be included anymore in all analyzed data, meaning you cannot see the blank-oriented peak features in the main window after alignment.

Gap filling by compulsion: If you check this, the peak recognition is performed by the average peak width of samples having the metabolite feature even though no local maximum is observed in the chromatogram. This is validated by default.

Note: When you execute the compound identification, the representative spectra with identification results are automatically determined from one of imported files which has the highest identification score. In the case that an alignment spot is not identified in any samples, the MS/MS spectrum of one sample which has the highest ion abundance in imported files is assigned as the representative spectrum.

Section 2-4

Data curation for the reduction of false positive identifications

MS-DIAL can automatically identify the metabolite peaks by the similarity calculation of retention time, precursor m/z, isotopic ratios, and MS/MS spectrum with the reference databases. However, unfortunately, there are also false positive identifications in the result of peak identifications as well as true positives. Therefore, as an analytical chemist, the result should be manually checked and sometimes some of identified peaks should be curated and modified. Of course, the ultimate goal is the perfect identification without any false positive- and negative identifications.
Practically, what to manually curate the identification result of your representative alignment file since the identification result of its alignment file will be reflected in the final output such as ‘peak height’ matrix etc. Using the GUI of MS-DIAL, you can check if an aligned spot is a false positive/negative identification or not. For further information about GUI of MS-DIAL, see Chapter 5.

alt

Chapter 3 LC/MS or LC/MS/MS (data dependent MS/MS) project with user-defined MS/MS database (MSP format) in MS-DIAL

Here, a project from data dependent MS/MS acquisition in combination with a user-defined MSP library (an integration library of MassBank, GNPS, and Respect) is demonstrated. The experimental protocol is described in the previous research: http://pubs.acs.org/doi/abs/10.1021/acs.jafc.5b04890. The ABF files for this demonstration can be downloaded from our website: http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/index.html.

This section uses total 6 files and the MSP file is contained in the same folder of this demonstration. Also, there is a parameter file (Param.med) in this folder to be used in MS-DIAL for the quick start.

alt

Experiment summary:
Liquid chromatography: total 4 min run per sample with Kinetex C18 2.6 μm (50×1.0 mm).
Solvent A: water with 0.1% acetic acid
Solvent B: acetonitrile with 0.1% acetic acid
Mass spectrometer: data dependent method with positive ion mode.
Collision energy, 35 V
Collision energy spread, 15 V
Cycle time, 125 ms
Mass range, m/z 60-1250