MS-DIAL5 tutorial

Abstract

MS-DIAL was launched as a universal program for untargeted metabolomics that supports multiple instruments (GC/MS, GC/MS/MS, LC/MS, and LC/MS/MS) and MS vendors (Agilent, Bruker, LECO, Sciex, Shimadzu, Thermo, and Waters). Common data formats such as netCDF (AIA) and mzML, can also be managed in our project. In addition, we released several MSP files including both EI- and MS/MS spectra as a ‘start-up kit’. Moreover, MS-DIAL internally has a version of Fiehn lab’s GC/MS database (oriented by FAME RI index), and in silico retention time- and MS/MS database for LC/MS/MS based lipidomics. The isotope labeled tracking can also be executed in LC/MS project.

Keep in touch with us and follow the latest developments

YouTube @msdialproject channel

General introduction of MS-DIAL5

The current MS-DIAL program provides a stream pipeline for untargeted metabolomics, lipidomics and proteomics. In this latest version, the user experience has been greatly improved with a new graphical interface and also due to the fact it is no longer necessary to convert raw MS data from different vendor formats into a common ABF format (which is still supported), but the data can now be imported directly (supported file formats and analytical techniques are summarized in Figure 1).

Figure 1: Overview of supported file formats and analytical techniques

The latest version of MS-DIAL aims to be the most versatile tool for multiomics data analysis and therefore supports direct injection, many separation (GC, LC, CE, SFC), ionization (EI, ESI, MALDI), fragmentation (CID, HCD, ETD, ECD, EIEIO, EID, OAD) and MS/MS (DDA, SWATH, AIF) datasets (which are summarized in Figure 2). In addition, technological advances in fragmentation mechanisms such as electron-impact excitation of ions from organics (EIEIO) have also been taken into account, making it possible to work with these types of data to offer comprehensive structural identification of metabolites, lipids and proteins.

Figure 2: Summary of analytical techniques compatible with MS-DIAL 5

After data processing which includes peak picking, deconvolution, compound identification, and peak alignment, MS-DIAL provides several normalization methods (including LOWESS) and a multivariate analysis by principal component analysis (PCA). Finally, for further analysis by other programs, this program can export your result as table format (for SIMCA-P, MetaboAnalyst, and MetFamily etc.), and as several spectral formats including NIST, MassBank, and Mascot formats for compound identifications by MS-FINDER, CSI:FingerID, CFM-ID, MetFrag, and MetFamily etc. For the parameter explanations including the description of MS-DIAL algorithms, see also ‘MS-DIAL mathematics’ which can be downloaded at http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/MS-DIAL%20FAQ-vs2.pdf

The difference between MS-DIAL version 4 and 5

The difference between MS-DIAL version 4 and 5
	MS-DIAL4	MS-DIAL5
Annotation DB/level	Single for an analysis	Multi DB/level annotation for an analysis
Annotation candidates	Single for an analysis	Multi candicated for each DB
Separation type	LCMS, LCIMMS, GCMS	LCMS, LCIMMS, DIMS, IMMS
Collision type	CID/HCD	CID/HCD, ECD, HotECD, EIEIO, EID, OAD
Target omics	Metabolomics, Lipidomics	Metabolomics, Lipidomics, Proteomics
MS method type	Single for an analysis	Multiple types of MS method (DDA,SWATH,AIF) can be applied to a single analysis.
Undo/Redo support in “Change annotation to Unknown”	Not supported	Supported https://youtu.be/LeI4OpbHV1s

Start up a project of MS-DIAL5

This tutorial demonstrates four projects, (1) GC/MS, (2) LC/MS or LC/MS/MS (DDA: data dependent acquisition), (3) LC/MS/MS (data independent acquisition), and (4) LC-Ion mobility for the explanation of parameters and required files. In this section, three projects are summarized and you will find a minimum requirement for these processes. The details for LC/MS/MS (DIA), LC/MS/MS (DDA), GC/MS, and LC-Ion mobility processing are described in Chapter 2, Chapter 3, Chapter 4, and Chapter 10 respectively. First you need to start a new project in MS-DIAL5, set up a folder to store your project files and point MS-DIAL5 to the folder where your raw data is stored (Figure 3).

Figure 3: Start up a project in MS-DIAL5. First select new project to be created (1), next select a location where your project files will be saved (2). You need to direct MS-DIAL5 to your raw data file location (3) and select the corresponding data file format in the scrolling selection list and select the data files you want to process (4).

After inserting your raw data files, you will be able to further assign identifiers to your measurements by sample type (Sample, Standard, QC, Blank), class ID (according to your experimental setup), and you can also set the batch (if you analyzed multiple batches), analytical order, dilution factor, or possibly exclude some samples from further data processing (Figure 4).

Figure 4: Specifications of raw measurement files. An example of assigning identifiers to your raw measurement files (more details will be described in the following sections).

After assigning identifiers to the raw measurement files, click “Next” to go to the “Measurement parameters” tab. Here you will need to enter the analytical and instrumental parameters that were used to acquire your data (Figure 5).

Figure 5: Measurement parameters. Overview of the parameters that need to be set according to your analytical and instrumental setup.

The following sections will contain information about the data type (centroid and profile), the data format of custom user database (MSP file), and information about the types of adducts. The next chapters will then demonstrate case studies of processing different types of publicly available data in MS-DIAL5.

Centroid or Profile?

In the previous version of MS-DIAL users needed to define the type of imported data (centroid or profile). However in the new version you can choose centroid data by default. MS-DIAL retrieves data as centroid even for vendor formats which were previously considered as profile (e.g. SCIEX or Thermo). The only exception would be regarding Agilent data which can be stored as centroid, profile or both. To make sure what type of data you have, we recommend using our program “raw_data_viewer.exe” (which is available in the MS-DIAL base folder). The raw data viewer shows the spectrum that has been retrieved by the MS-DIAL backend (here you can determine the data type), and users can additionally determine the threshold values for amplitude cut off etc. by looking at the statistics of peak height vs. the frequency (Figure 6 and Figure 7).

Figure 6: Raw data viewer - data upload. After opening the “Raw data viewer.exe” you can browse to the location of your data (1) and load them (single or multiple files - one by one) (2). Next select the file you want to process (3) and click the “Show” button (4).

Figure 7: Raw data viewer - results. After processing the data you will be shown three histograms (for MS1 and MS2 spectrum intensity and peak height). Additionally you can browse to any ions found in your data (in the table provided at the bottom). You can order the ions by multiple parameters (scan start time, MS level, base peak m/z, base peak intensity, etc.). You can zoom the x (m/z) and y (Intensity) axis by using scroll button to inspect if your data are centroid or profile. Using this overview you can also estimate the ideal peak height cut-off for the MS-DIAL processing parameters.

Additionally, you can also watch our tutorial video:

Database (MSP or Text) for compound identification

The database formats for GC/MS or LC/MS datasets are described in this section. The main difference between GC/MS (EI-MS) and LC/MS (ESI-MS/MS) is the availability of precursor ion and its MS/MS. While a precursor m/z and its MS/MS are mostly available in ESI (or the other soft ionization)-MS/MS, the molecular ion in EI-MS is difficult to detect owning to the hard ionization. Several MSP files are downloadable as a starter kit for MS-DIAL at http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/index.html.

MSP format for precursor- and MS/MS library

MS-DIAL supports the MSP (http://www.nist.gov/srd/upload/NIST1a11Ver2-0Man.pdf) format in ASCII text (Figure 8). In addition, the software can utilize “RETENTIONTIME:”, “PRECURSORTYPE:”, and “FORMULA:” information for metabolite identification (cases are ignored). Retention time information must be specified in minutes [min] scale when possible. The adduct ion information, i.e. here ‘Precursor type’, will be used for the adduct ion search algorithm (see section 7).

Figure 8: An example of MSP format library

Adduct ion format

Adduct ion information should be formatted as described in this section: [M+Na]+, [M+2H]2+, [M-2H2O+H]+, [2M+FA-H]-, etc.

The parentheses ‘[’ and ‘]’ must be used to bracket the ion information.
The char + and - are required after ‘]’ and the number must be written before + or -.
When you want to define the organic formula like C6H12O5, you have to write it without any replicate elements or parentheses like [M+C2H5COOH-H] or [M+H+(CH3)3SiOH].
The beginning figure of organic formula like ’2’H2O is recognized as the H2O × 2. Again, do not use 2(H2O) to specify this.
Sequential equations are acceptable: [2M+H-C6H12O5+Na]2+ (very apt.)
MS-DIAL accepts some abbreviations or common organic formulas for adduct types as follows.
For Acetonitrile: ACN, CH3CN
For Methanol: CH3OH
For Isopropanol: IsoProp, C3H7OH
For Dimethyl sulfoxide: DMSO
For Formic acid: FA, HCOOH
For Acetic acid: Hac, CH3COOH
For Trifluoroacetic acid: TFA, CFCOOH

Text format library for retention time and accurate mass search (post identification)

MS-DIAL also supports tab-delimited text format library for peak identification by means of retention time and MS1 accurate mass information. The identification process is performed after completing peak identification based on MSP format library. This is why we call this identification processing “post identification”. The first row should include header information. The first, second, third, and fourth columns should contain the metabolite name, accurate mass [Da], retention time [min], and adduct ion, respectively (Figure 9). This library can be easily created in Microsoft Excel. Save the spreadsheet in “Tab delimited text format”. This option is useful for internal standard identifications etc. (Even if you don’t have MS/MS libraries, the peak identification based on retention time and accurate mass is available from this option.)

* The minimum requirement for this text library is just ‘metabolite name’ and ‘m/z’ information; i.e. the first two columns are required. Retention time and adduct ion fields provide additional information for MS-DIAL in peak identification and adduct ion picking, respectively.

MSP format as GC/MS library

MS-DIAL supports the MSP format (http://www.nist.gov/srd/upload/NIST1a11Ver2-0Man.pdf) in ASCII text, same as in the “MSP format for precursor- and MS/MS library” seciton. MS-DIAL accepts two fields for ‘retention’ information as the reference: “RETENTIONTIME:” or “RT”, and “RETENTIONINDEX” or “RI” (Figure 10). Retention time information must be specified in minute [min] scale when possible.

Figure 10: An example of MSP format library for GC/MS

Alkane- or FAME retention time dictionary for the calculation of retention index

Retention time and retention index are used for routine identification of metabolites in GC/MS based metabolomics. The current MS-DIAL program has three options for the use of retention information: 1) retention time, 2) alkane mix based retention index, and 3) FAME (fatty acid methyl ester) mix based retention index.

See the experimental details of alkane and FAME mixtures.
Alkanes: http://www.sciencedirect.com/science/article/pii/S1389172311001848
Alkanes: https://en.wikipedia.org/wiki/Kovats_retention_index
FAMEs: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2805091/

In order to calculate the retention indexes of detected peaks, users have to prepare a dictionary including the pairs of the carbon number and its retention time as tab-delimited text (Figure 11).

FAQ

By alkane mixture, the calculation of retention index is based on Kovats’s method (temperature programmed chromatography).
By FAME mixture, the calculation is based on Fiehn’s method. Importantly, FAMEs of carbon number C8, C9, C10, C12, C14, C16, C18, C20, C22, C24, C26, C28, and C30 (total 13 FAMEs) must be included in this dictionary to calculate Fiehn’s index. A polynomial regression of fifth order is applied to chromatographic peaks between C9 and C28. The peaks between C8 (and its earlier) and C9 and between C28 and C30 (and its later) are interpolated by a linear regression. The original indexes are based on the retention time (sec) * 1000 of authentic standards.

Figure 11: Examples of FAMEs and Alkanes dictionaries