Tutorial

Follow these steps for an example study (n=499) MTBLS1684 which has Agilent QTOF 6550 data collected in the RP-ESI-POS mode.

  1. Download and install the latest version of R Software from https://cran.r-project.org/

  2. Download and install the latest version of RStudio software from https://www.rstudio.com/products/rstudio/download/

  3. Create a directory, "MTBLS1684" in your computer's hard-drive.

  4. Download (.D) files for this case study from https://www.ebi.ac.uk/metabolights/MTBLS1684 and copy the data to the "MTBLS1684" directory.

  5. Download the reference mz and RT files from https://github.com/idslme/IDSL.IPA/blob/main/Reference_masses_peak_annotation.xlsx and copy to the "MTBLS1684" folder.

  6. Use the MS-Convert utility from proteowizard to convert the .D files to mzML format.

  7. The pipeline required 48 parameters divided into 8 sections. For the "MTBLS1684" data, use the below settings in each section to run the pipeline using your computer.

  8. For beginners it is recommended to use the online form version https://ipa.idsl.me/ipa-comprehensive-analysis/online-form to fill these sections.

  9. For R experts, edit the R-script directly in your RStudio IDE.

Note : IDSL.IPA pipeline works for both Linux and windows operating systems.

There are three ways to prepare the parameter input for the IDSL.IPA pipeline, see here https://ipa.idsl.me/ipa-comprehensive-analysis .

Section 1 (Global Parameters)

PARAM0001 (Peak List for individual LC/HRMS files) : YES

PARAM0002 (Aligned peak table ) : YES

PARAM0003 (Gap-filled peak table ) : YES

PARAM0004 (Annotate peak table using a reference database ) : YES

PARAM0005 (Targeted Analysis ) : NO

PARAM0006 (Number of parallel threads) : 8 (change this value to match the number of threads available in your computer). Modern CPUs have two threads per core.

Section 2 (Data Import and Export)

PARAM0007 (Location of the LC/HRMS data) : "full path of the MTBLS1684 directory"

PARAM0008 (List of files) : "All"

Note : provide a semi-colon (;) separated list of file names in case only a subset of files need to be processed.

PARAM0009 (Data format) : "mzML"

Note : IDSL.IPA depends on the mzR package to read the mzML files.

PARAM0010 (Location of the output files) : "full path of the MTBLS1684 directory"

Section 3 (Pairing of potential C12 and C13 peaks)

PARAM0011 (Instrument noise level) : 500

Note : this noise level is used only for the removing noisy C12 peaks.

PARAM0012 (Cutoff for the maximum ratio of putative C12 and C13 peaks) : 90

Section 4 (Chromatographic Peak Detection)

PARAM0013 (Mass tolerance to create EICs) : 0.01

PARAM0014 (RT tolerance to remove redundant peaks) : 0.05

PARAM0015 (Smoothing windows for LOESS) : 12

PARAM0017 (Fronting and tailing peaks resolving factor) : 0.05

PARAM0018 (Rounding factor for m/z values) : 2

Section 5 (Chromatographic Peak Analysis and Data Reduction)

PARAM0019 (perform recursive mass correction) : YES

PARAM0020 (number of extra scan on both sides of the corrected mass ) : 50

PARAM0021 (minimum peak height ) : 1000

PARAM0022 (% cutoff for maximum missing scans) : 30

PARAM0023 (minimum nIsoPairs) : 3

PARAM0024 ( minimum % nIsopairs) : 30

PARAM0025 ( maximum ratio of cumulative C12/C13 ratio) : 80

PARAM0026 (maximum ratio of peak width at half height) : 1

PARAM0027 (minimum signal to noise (local) ) : 2

PARAM0028 ( number of points for data interpolation) : 100

Section 6 (Retention time correction and peak alignment)

PARAM0029 (perform retention time correction ) : YES

PARAM0030 (reference sample list ) : "003.mzML;004.mzML;005.mzML;007.mzML;008.mzML;009.mzML;010.mzML;011.mzML;012.mzML;014.mzML"

Note : the sample list should be separated by a semi-colon (;)

PARAM0031 (minimum % of the recurring peaks in reference samples) : 100

PARAM0032 (Retention time correction method) : "RetentionIndex"

PARAM0033 (Reference peak tolerance for "RetentionIndex" to minimize local RT errors ) : 5

PARAM0034 (Degree for the polynomial regression) : ""

PARAM0035 (mass tolerance for peak alignment) : 0.01

PARAM0036 (RT tolerance for peak alignment ) : 0.05

PARAM0037 (number of m/z slices for parallel computation) : 20

Section 7 (Gap-filling)

PARAM0038 (mass tolerance ) : 0.01

PARAM0039 (RT tolerance) : 0.1

PARAM0040 (extra scans on both side of peak apex for calculating peak area ) : 20

Section 8 (Peak Annotation)

PARAM0041 (reference file location ) : "MTBLS1684"

PARAM0042 (reference file name ): "Reference_masses_peak_annotation.xlsx"

PARAM0043 (mass tolerance) : 0.01

PARAM0044 (RT tolerance) : 0.05

PARAM0045 (Use corrected RT values ) : YES

PARAM0046 (Compound centric annotation ) : YES

PARAM0047 (Sample centric annotation ) : YES

PARAM0048 (Gap-filling for the sample centric annotation ) : YES

Expected results and outcomes : Once the IDSL.IPA R-script completes the calculations, you should get these results ( https://zenodo.org/record/4708401 and https://zenodo.org/record/4708411 )