skills/gptomics/bioskills/bio-proteomics-spectral-libraries

bio-proteomics-spectral-libraries

SKILL.md

Spectral Library Management

Build Library from DDA Data

SpectraST (TPP)

# Build library from search results
spectrast -cNlibrary.splib -cAC search_results.pep.xml

# Filter library for quality
spectrast -cNfiltered.splib -cAQ library.splib

# Convert to other formats
spectrast -cNlibrary.tsv -cM library.splib

EasyPQP (Skyline/OpenMS)

# Build library from search results
easypqp library \
    --in psm_results.tsv \
    --out library.pqp \
    --psmtsv \
    --rt_reference irt.tsv

# Convert to TSV format
easypqp convert \
    --in library.pqp \
    --out library.tsv \
    --format openswath

EncyclopeDIA (Walnut)

# Build chromatogram library from DIA
EncyclopeDIA \
    -i sample1.mzML \
    -i sample2.mzML \
    -l wide_window_library.dlib \
    -f uniprot.fasta \
    -o results

# Search with narrow-window DIA
EncyclopeDIA \
    -i narrow_sample.mzML \
    -l narrow_library.elib \
    -f uniprot.fasta \
    -o search_results

Predicted Libraries

Prosit (Deep Learning)

# Generate predictions via Prosit API
import requests
import pandas as pd

peptides = pd.DataFrame({
    'modified_sequence': ['PEPTIDEK', 'ANOTHERPEPTIDER'],
    'collision_energy': [30, 30],
    'precursor_charge': [2, 2]
})

# Submit to Prosit server
response = requests.post(
    'https://www.proteomicsdb.org/prosit/api/predict',
    json=peptides.to_dict(orient='records')
)

# Parse response to library format
predictions = response.json()

DeepLC Retention Time Prediction

from deeplc import DeepLC

# Initialize predictor
dlc = DeepLC()

# Predict retention times
peptides = ['PEPTIDEK', 'ANOTHERPEPTIDER']
calibration_peptides = ['GAGSSEPVTGLDAK', 'VEATFGVDESNAK']
calibration_rts = [22.4, 33.1]

# Calibrate and predict
dlc.calibrate_preds(
    seq_df=pd.DataFrame({'seq': calibration_peptides, 'rt': calibration_rts})
)
predicted_rts = dlc.make_preds(seq_df=pd.DataFrame({'seq': peptides}))

MS2PIP Fragmentation Prediction

from ms2pip import Predictor

# Initialize predictor
predictor = Predictor(model='HCD2021')

# Predict fragmentation
peptide_df = pd.DataFrame({
    'peptide': ['PEPTIDEK', 'ANOTHERPEPTIDER'],
    'charge': [2, 2],
    'modifications': ['', '']
})

predictions = predictor.predict(peptide_df)

Library Formats

DIA-NN TSV Format

# Required columns
PrecursorMz    ProductMz    Annotation    ProteinId    GeneName
PeptideSequence    ModifiedSequence    PrecursorCharge
FragmentCharge    FragmentType    FragmentSeriesNumber
NormalizedRetentionTime    LibraryIntensity

OpenSWATH TSV Format

import pandas as pd

# Convert to OpenSWATH format
library = pd.DataFrame({
    'PrecursorMz': precursor_mz,
    'ProductMz': product_mz,
    'LibraryIntensity': intensity,
    'NormalizedRetentionTime': rt,
    'PrecursorCharge': charge,
    'ProductCharge': 1,
    'FragmentType': ion_type,  # 'b' or 'y'
    'FragmentSeriesNumber': ion_num,
    'ModifiedPeptideSequence': mod_seq,
    'PeptideSequence': sequence,
    'ProteinId': protein,
    'GeneName': gene,
    'Decoy': 0
})

library.to_csv('library_openswath.tsv', sep='\t', index=False)

Spectronaut Library Format

# Key columns for Spectronaut
ModifiedPeptide    StrippedPeptide    PrecursorCharge
PrecursorMz    iRT    FragmentLossType
FragmentCharge    FragmentType    FragmentNumber
RelativeIntensity    FragmentMz    ProteinGroups
Genes    ProteinIds

Library QC

import pandas as pd

library = pd.read_csv('library.tsv', sep='\t')

# Basic statistics
print(f"Precursors: {library['ModifiedSequence'].nunique()}")
print(f"Proteins: {library['ProteinId'].nunique()}")
print(f"Transitions per precursor: {len(library) / library['ModifiedSequence'].nunique():.1f}")

# RT distribution
import matplotlib.pyplot as plt
rts = library.groupby('ModifiedSequence')['NormalizedRetentionTime'].first()
plt.hist(rts, bins=50)
plt.xlabel('Normalized RT')
plt.ylabel('Precursors')
plt.savefig('rt_distribution.png')

# Charge state distribution
charges = library.groupby('ModifiedSequence')['PrecursorCharge'].first()
print(charges.value_counts())

Merge Libraries

import pandas as pd

# Load libraries
lib1 = pd.read_csv('library1.tsv', sep='\t')
lib2 = pd.read_csv('library2.tsv', sep='\t')

# Concatenate and remove duplicates
# Keep entry with highest total intensity per precursor
combined = pd.concat([lib1, lib2])

# Calculate total intensity per precursor
precursor_intensity = combined.groupby('ModifiedSequence')['LibraryIntensity'].sum()

# Keep best precursor entries
combined['total_int'] = combined['ModifiedSequence'].map(precursor_intensity)
combined = combined.sort_values('total_int', ascending=False)
combined = combined.drop_duplicates(subset=['ModifiedSequence', 'FragmentType', 'FragmentSeriesNumber'])
combined = combined.drop('total_int', axis=1)

combined.to_csv('merged_library.tsv', sep='\t', index=False)

iRT Calibration

# Biognosys iRT peptides for retention time calibration
IRT_PEPTIDES = {
    'LGGNEQVTR': -24.92,
    'GAGSSEPVTGLDAK': 0.00,  # Reference
    'VEATFGVDESNAK': 12.39,
    'YILAGVENSK': 19.79,
    'TPVISGGPYEYR': 28.71,
    'TPVITGAPYEYR': 33.38,
    'DGLDAASYYAPVR': 42.26,
    'ADVTPADFSEWSK': 54.62,
    'GTFIIDPGGVIR': 70.52,
    'GTFIIDPAAVIR': 87.23,
    'LFLQFGAQGSPFLK': 100.00
}

# Convert iRT to normalized RT
def irt_to_nrt(irt, gradient_length=60):
    '''Convert iRT to normalized RT (0-1 scale)'''
    return (irt + 24.92) / 124.92  # Scale to 0-1

Related Skills

  • dia-analysis - Use libraries in DIA workflows
  • peptide-identification - Generate search results for library building
  • data-import - Load MS data for library generation
Weekly Installs
3
Installed on
windsurf2
trae2
opencode2
codex2
claude-code2
antigravity2