Getting Started with iPhos
Raw Data Preprocessing
The raw files for the AP-treated and untreated samples generated by mass spectrometers should first be converted into the open file representation format of mzXML, Which is now a routine job for analysing MS data.It can be done by suitable programs developed by the Seattle Proteome Center (SPC). Users can also use the recently updated tool MSConvert provided in the ProteoWizard project, by which a GUI interface was constructed for Microsoft Windows users.
iPhos Peak Extraction Assister
The iPhos Peak Extraction Assister interface consists of three parts:
-
mzXML Directory SelectionTo automate the batch mode peak extraction, users can put all the mzXML files in a folder and specify this folder for iPhos Peak Extraction Assister to perform the automation process.
-
Automate Batch Mode Peak ExtractionFunction to start the batch mode peak extraction using the default peak extraction tool msInspsect, by which the AP-assisted phosphoproteome investigation has been shown to be successful in cell lysates (link).
-
Processing StatusThe total mzXML files detected in the specified mzXML directory and the processing status.
Resulting peak list files
The resulting peak list files after the batch mode peak extraction processing will be put in the folder named "Peak_Lists" within the specified mzXML directory.
iPhos Module-1
The iPhos Module-1 interface consists of 7 parts:
-
Peak Selection ParameterUsers can set the parameter to select the top N intensive peaks according to peak intensities to reduce the number of peak signals. This allows users to customize their inclusion list of phosphopeptide candidates. Users can set the parameter for AP treated data and AP-untreated data, respectively. To determine the appropriate N, the function 'Threshold test' shows the relation between peak numbers and peak intensity and the relation can be used as the determination.
-
Mass Shift Calculation Parameters
-
Mass Delta (Da)79.966 Da is the theoretical molecular weight of phosphate group. The mass shift of some multiples of 79.966 due to phosphate group loss is caused by dephosphorylation with AP treatment.
-
Maximal Modification NumberThe number of allowed phosphate groups loss of phosphopeptides. The default is 5, meaning that up to five phosphate groups may be lost in phosphopeptides.
-
Mass Tolerance (Da)The mass tolerance is the maximum mass difference between the measured mass shifts and the theoretical values. The default value is 0.1 Da.
-
Retention Time Tolerance(min)This measures the difference between the elution times from the reverse-phase column for consequent LC-MS analyses between phosphopeptide and their dephosphorylated forms. The default value is 2 min.
-
Minimum Charge StateThe minimum charge state of extracted peptide peaks is restricted by ionization methods. The default value is 2 if electro-spray ionization is used.
-
-
Headers in the Peak List FilesTo accept results from different peak extraction tools, users must specify the headers used in the peak list file for iPhos Module-1 to recognize and parse the file. The required header columns are as the following:
- m/zThe detected m/z value of the extracted peak.
- Charge stateThe charge state of the extracted peak given by the peak extraction tool.
- Retention timeThe retention time of the extracted peak. This is categorized by the peak extraction tool, either by taking the retention time of the most intensive of scan or the average of the peptide elution time range.
- Peak IntensityThe intensity of the extracted peak signal. This is calculated by the peak extraction tool, either by summing the XIC are of the peak area or by taking the maximum detected intensity of the peak area.
- Scan Number(Optional field) The representing scan of the extracted peak.
- Mass(Optional field) The calculated mass of the extracted peak. This is the value used for finding mass shift caused by the loss of some multiples of phosphate group. If this is not provided by the peak extraction tool, iPhos Module-1 will calculate mass from the m/z value and the charge state using the formula: [Mass] = [m/z] * [charge state] - [H].
-
Input Peak List FilesThe peak list files of replica LC-MS runs (up to 6 replica) can be loaded for the samples with (w) and without (w/o) AP-treatment. The allowed delimiter-separated files types are: .tsv, .csv and .txt. Other file formats should be converted to these types using Excel or other software.
-
Threshold TestiPhos Module-1 can generate an image to display the relation between the threshold intensities and peak numbers (as shown below). This is to ensure that peaks included are above a reasonable intensity threshold to avoid the inclusion of noises.
-
Find Mass ShiftiPhos Module-1 can perform the signal data mining process for phosphopeptide signals based on the parameter settings and the concept of phosphate group loss for phophopeptide signals on AP treatment.
-
ResultsThe information of the phosphopeptide signals are zipped to a compressed file named "phosphopeptide_signal_output_[generation time]." In the compressed file, the following files are included:
- Parameter settings used in iPhos Module-1
- Peak filtered in the samples without AP-treatment: 'file_name.csv'.
- Peak filtered in the samples with AP-treatment: 'file_name.csv'.
- List of phosphopeptide candidates: 'result_[the time users generate the list].csv.' This file is used in iPhos Module-2 to generate customized inclusion lists
Headers in the resulting files
- w_APw_AP means that the samples were AP-treated
- w/o_APwo_AP means that the samples were not AP-treated
- Source FileThis column shows the source tsv file of the peak pairs
- ScanThe number of the full scan MS spectrum in which the features appeared at its highest intensity
- RTThe retention time (RT) in seconds at which the features were detected
- m/zThe detected mass-to-charge ratio of the peptide features
- IntensityThe peak intensity calculated from the raw files by the peak extraction tools
- ChargeThe calculated charge state of the the peptide feature provided by the peak extraction tools
- MassThe calculated mass of the peptide feature provided by the peak extraction tools
- DeltaThe mass shift between the peak from the non-AP treated group and the peak from the AP-treated group
- # of DeltaThe number of removed modification groups
iPhos Module-2
The iPhos Module-2 interface consists of 5 parts:
-
m/z Segment File Formation ParametersiPhos Module-2 can split candidate peaks into several segments based on the peak retention time of the peak signals. Users can customize the number of segments of m/z values of the candidate phosphopeptides for the following targeted LC-MS/MS run. This helps the identification of the the candidate phosphopeptides. The parameters are as the following:
- Peptide Elution Time (min)Users can set the start time and the end time of the linear gradient in the LC system in which the LC mass spectrometer generates confident peptide signals.
- Percentage of Peptides within One Segment File (%)Users can split the candidate peaks into different equal segments. This can enhance the results of the peptide identification in the following targeted LC-MS/MS analyses.
- Extra Retention Time Tolerance of Segment Files (min)Peak elution times sometimes shift between the LC-MS runs and the targeted LC-MS/MS runs because of the bias of nanoLC, temperature and so on. An extra retention time shift tolerance is suggested to avoid the retention time shift effect.
-
Inclusion List Generation ParametersSince replicate m/z values are not allowed in most mass spectrometers, iPhos Module-2 can reduce the replicate peak m/z values according to the round off decimal.
- Round Off Decimal PlaceSet the round-off decimal place to round off the significant digits of m/z values of the candidate phospho-peptides and to reduce the redundant peaks with similar m/z values. Identical m/z values are not allowed in the inclusion list for the targeted LC-MS/MS analyses. This value is decided by the accuracy of the mass spectrometry.
-
.csv Files from iPhos Module-1The .csv files of phosphopeptide candidates produced from the control group and treatment group, respectively, by iPhos Module-1 are combined to generate the combined inclusion lists by iPhos Module-2.
-
Inclusion List GenerationiPhos Module-2 can generate several .csv files based on the parameters of the above. The retention time windows of the targeted m/z values are marked on the file-name for downstream target LC-MS/MS analysis.
-
ResultsThe generated inclusion lists are zipped to a compressed file named "inclusion_list_output_[generation time]." In the compressed file, the parameter setting used in iPhos Module-2, the m/z inclusion lists with retention time windows are included.
iPhos Module-3
The iPhos Module-3 interface consists of 6 parts:
-
Information Integration ParameterTo connect the peptide identification information from major database search engines, such as Mascot, with LC-MS label-free quantification result by any other quantification software, parameters including peptide score cut-off, m/z tolerance (Da) and retention time tolerance (min) should be set.
- peptide scoreThe cut-off value for confident peptide identification scored by the peptide database search engines.
- m/z Tolerance (Da)The allowed m/z difference between the peptide identification entries and the peptide quantification rows.
- Retention Time Tolerance (min)The tolerable retention time difference between the peptide identification entries and the peptide quantification rows.
-
Phosphorylation FilterUsers can choose the type of phosphorylation of interest.
-
Headers in the Quantification ResultsUsers must specify the headers used in the quantification result file for iPhos Module-3 to recognize and parse the file. The required header columns are as the following:
- m/zThe detected m/z value of the quantified peak.
- Charge StateThe charge state of the quantified peak.
- Retention TimeThe retention time of the quantified peak.
- Intensity ratioThe relative peptide abundance calculated by the quantification tool.
- Highest Mean Condition(Optional field) The condition where the peptide obtained higher abundance.
- Statistic p Value(Optional field) The statistical assessment of the provided peptide abundance in the given two different conditions.
-
Input Peptide Identification File and Quantification FilePeptide identification is suggested to be done by the Mascot search engine and the label-free peptide quantification can be done by any quantification software such as msInspect or mzMine. To unify the quantification results from different label-free quantification software package, the users should first transform the result into delimited-separated values (.csv, .tsv or .txt) formats. This can be done by Excel or other programs. Then the users should also specify the intended headers in the "Headers in the Quantification Result" for file headers recognition.
-
Output pTyr Quantitative ResultiPhos Module-3 can connect the result of the phospho-peptide identification and peptide differential quantification.
-
ResultsThe linked phosphopeptide quantification results are zipped to a compressed file named "quantification_output_[generation time]." In the compressed file, the parameter setting used in iPhos Module-3, the linked phosphopeptide identification and quantification results are included.
Headers in the resulting files
- SequenceThe phosphopeptide sequence identified by the targeted LC-MS/MS analysis.
- m/zMass-to-charge ratio of the peptide feature.
- Charge StateThe charge state reported by the database search engine.
- RTThe retention time (RT) in seconds at which the feature was detected.
- ScoreThe confidence score for the peptide feature given by the Mascot/Sequest search engine.
- ModificationThe post-transcriptional modification(PTM) on the identified phosphopeptide sequence.
- Mod LocationThe modification site of the PTMs on the identified phosphopeptide sequence
- ProteinThe original protein where the identified phosphopeptide were derived. When the identified protein was not marked in the original xml file, the original protein of the peptide will be mark 'undefined. Further information of the peptide can be obtained only on the database search engine tool.
- Relative Abundance RatioThe differential expression level of phosphopeptide between the samples in different conditions.
- High Mean ConditionThe higher expression condition of the two given conditions if provided.
- Statistical p valueThe p value of the statistical test on the differential level of the two given conditions if provided.