MS-DIAL5 tutorial
Keep in touch with us and follow the latest developments
YouTube @msdialproject channel |
General introduction of MS-DIAL5
The current MS-DIAL program provides a stream pipeline for untargeted metabolomics, lipidomics and proteomics. In this latest version, the user experience has been greatly improved with a new graphical interface and also due to the fact it is no longer necessary to convert raw MS data from different vendor formats into a common ABF format (which is still supported), but the data can now be imported directly (supported file formats and analytical techniques are summarized in Figure 1).
The latest version of MS-DIAL aims to be the most versatile tool for multiomics data analysis and therefore supports direct injection, many separation (GC, LC, CE, SFC), ionization (EI, ESI, MALDI), fragmentation (CID, HCD, ETD, ECD, EIEIO, EID, OAD) and MS/MS (DDA, SWATH, AIF) datasets (which are summarized in Figure 2). In addition, technological advances in fragmentation mechanisms such as electron-impact excitation of ions from organics (EIEIO) have also been taken into account, making it possible to work with these types of data to offer comprehensive structural identification of metabolites, lipids and proteins.
After data processing which includes peak picking, deconvolution, compound identification, and peak alignment, MS-DIAL provides several normalization methods (including LOWESS) and a multivariate analysis by principal component analysis (PCA). Finally, for further analysis by other programs, this program can export your result as table format (for SIMCA-P, MetaboAnalyst, and MetFamily etc.), and as several spectral formats including NIST, MassBank, and Mascot formats for compound identifications by MS-FINDER, CSI:FingerID, CFM-ID, MetFrag, and MetFamily etc. For the parameter explanations including the description of MS-DIAL algorithms, see also ‘MS-DIAL mathematics’ which can be downloaded at http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/MS-DIAL%20FAQ-vs2.pdf
The difference between MS-DIAL version 4 and 5
MS-DIAL4 | MS-DIAL5 | |
---|---|---|
Annotation DB/level | Single for an analysis | Multi DB/level annotation for an analysis |
Annotation candidates | Single for an analysis | Multi candicated for each DB |
Separation type | LCMS, LCIMMS, GCMS | LCMS, LCIMMS, DIMS, IMMS |
Collision type | CID/HCD | CID/HCD, ECD, HotECD, EIEIO, EID, OAD |
Target omics | Metabolomics, Lipidomics | Metabolomics, Lipidomics, Proteomics |
MS method type | Single for an analysis | Multiple types of MS method (DDA,SWATH,AIF) can be applied to a single analysis. |
Undo/Redo support in “Change annotation to Unknown” | Not supported | Supported https://youtu.be/LeI4OpbHV1s |
Start up a project of MS-DIAL5
This tutorial demonstrates four projects, (1) GC/MS, (2) LC/MS or LC/MS/MS (DDA: data dependent acquisition), (3) LC/MS/MS (data independent acquisition), and (4) LC-Ion mobility for the explanation of parameters and required files. In this section, three projects are summarized and you will find a minimum requirement for these processes. The details for LC/MS/MS (DIA), LC/MS/MS (DDA), GC/MS, and LC-Ion mobility processing are described in Chapter 2, Chapter 3, Chapter 4, and Chapter 10 respectively. First you need to start a new project in MS-DIAL5, set up a folder to store your project files and point MS-DIAL5 to the folder where your raw data is stored (Figure 3).
After inserting your raw data files, you will be able to further assign identifiers to your measurements by sample type (Sample, Standard, QC, Blank), class ID (according to your experimental setup), and you can also set the batch (if you analyzed multiple batches), analytical order, dilution factor, or possibly exclude some samples from further data processing (Figure 4).
After assigning identifiers to the raw measurement files, click “Next” to go to the “Measurement parameters” tab. Here you will need to enter the analytical and instrumental parameters that were used to acquire your data (Figure 5).
The following sections will contain information about the data type (centroid and profile), the data format of custom user database (MSP file), and information about the types of adducts. The next chapters will then demonstrate case studies of processing different types of publicly available data in MS-DIAL5.
Centroid or Profile?
In the previous version of MS-DIAL users needed to define the type of imported data (centroid or profile). However in the new version you can choose centroid data by default. MS-DIAL retrieves data as centroid even for vendor formats which were previously considered as profile (e.g. SCIEX or Thermo). The only exception would be regarding Agilent data which can be stored as centroid, profile or both. To make sure what type of data you have, we recommend using our program “raw_data_viewer.exe” (which is available in the MS-DIAL base folder). The raw data viewer shows the spectrum that has been retrieved by the MS-DIAL backend (here you can determine the data type), and users can additionally determine the threshold values for amplitude cut off etc. by looking at the statistics of peak height vs. the frequency (Figure 6 and Figure 7).
Additionally, you can also watch our tutorial video:
Database (MSP or Text) for compound identification
The database formats for GC/MS or LC/MS datasets are described in this section. The main difference between GC/MS (EI-MS) and LC/MS (ESI-MS/MS) is the availability of precursor ion and its MS/MS. While a precursor m/z and its MS/MS are mostly available in ESI (or the other soft ionization)-MS/MS, the molecular ion in EI-MS is difficult to detect owning to the hard ionization. Several MSP files are downloadable as a starter kit for MS-DIAL at http://prime.psc.riken.jp/Metabolomics_Software/MS-DIAL/index.html.
MSP format for precursor- and MS/MS library
MS-DIAL supports the MSP (http://www.nist.gov/srd/upload/NIST1a11Ver2-0Man.pdf) format in ASCII text (Figure 8). In addition, the software can utilize “RETENTIONTIME:”, “PRECURSORTYPE:”, and “FORMULA:” information for metabolite identification (cases are ignored). Retention time information must be specified in minutes [min] scale when possible. The adduct ion information, i.e. here ‘Precursor type’, will be used for the adduct ion search algorithm (see section 7).
Adduct ion format
Adduct ion information should be formatted as described in this section: [M+Na]+, [M+2H]2+, [M-2H2O+H]+, [2M+FA-H]-, etc.
The parentheses ‘[’ and ‘]’ must be used to bracket the ion information.
The char + and - are required after ‘]’ and the number must be written before + or -.
When you want to define the organic formula like C6H12O5, you have to write it without any replicate elements or parentheses like [M+C2H5COOH-H] or [M+H+(CH3)3SiOH].
The beginning figure of organic formula like ’2’H2O is recognized as the H2O × 2. Again, do not use 2(H2O) to specify this.
Sequential equations are acceptable: [2M+H-C6H12O5+Na]2+ (very apt.)
MS-DIAL accepts some abbreviations or common organic formulas for adduct types as follows.
For Acetonitrile: ACN, CH3CN
For Methanol: CH3OH
For Isopropanol: IsoProp, C3H7OH
For Dimethyl sulfoxide: DMSO
For Formic acid: FA, HCOOH
For Acetic acid: Hac, CH3COOH
For Trifluoroacetic acid: TFA, CFCOOH
Text format library for retention time and accurate mass search (post identification)
MS-DIAL also supports tab-delimited text format library for peak identification by means of retention time and MS1 accurate mass information. The identification process is performed after completing peak identification based on MSP format library. This is why we call this identification processing “post identification”. The first row should include header information. The first, second, third, and fourth columns should contain the metabolite name, accurate mass [Da], retention time [min], and adduct ion, respectively (Figure 9). This library can be easily created in Microsoft Excel. Save the spreadsheet in “Tab delimited text format”. This option is useful for internal standard identifications etc. (Even if you don’t have MS/MS libraries, the peak identification based on retention time and accurate mass is available from this option.)
* The minimum requirement for this text library is just ‘metabolite name’ and ‘m/z’ information; i.e. the first two columns are required. Retention time and adduct ion fields provide additional information for MS-DIAL in peak identification and adduct ion picking, respectively.
MSP format as GC/MS library
MS-DIAL supports the MSP format (http://www.nist.gov/srd/upload/NIST1a11Ver2-0Man.pdf) in ASCII text, same as in the “MSP format for precursor- and MS/MS library” seciton. MS-DIAL accepts two fields for ‘retention’ information as the reference: “RETENTIONTIME:” or “RT”, and “RETENTIONINDEX” or “RI” (Figure 10). Retention time information must be specified in minute [min] scale when possible.
Alkane- or FAME retention time dictionary for the calculation of retention index
Retention time and retention index are used for routine identification of metabolites in GC/MS based metabolomics. The current MS-DIAL program has three options for the use of retention information: 1) retention time, 2) alkane mix based retention index, and 3) FAME (fatty acid methyl ester) mix based retention index.
See the experimental details of alkane and FAME mixtures.
Alkanes: http://www.sciencedirect.com/science/article/pii/S1389172311001848
Alkanes: https://en.wikipedia.org/wiki/Kovats_retention_index
FAMEs: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2805091/
In order to calculate the retention indexes of detected peaks, users have to prepare a dictionary including the pairs of the carbon number and its retention time as tab-delimited text (Figure 11).
FAQ
By alkane mixture, the calculation of retention index is based on Kovats’s method (temperature programmed chromatography).
By FAME mixture, the calculation is based on Fiehn’s method. Importantly, FAMEs of carbon number C8, C9, C10, C12, C14, C16, C18, C20, C22, C24, C26, C28, and C30 (total 13 FAMEs) must be included in this dictionary to calculate Fiehn’s index. A polynomial regression of fifth order is applied to chromatographic peaks between C9 and C28. The peaks between C8 (and its earlier) and C9 and between C28 and C30 (and its later) are interpolated by a linear regression. The original indexes are based on the retention time (sec) * 1000 of authentic standards.