CombFold Pipeline

CombFold Pipeline predicts the structure of large protein complexes starting from the sequences of chains in their complex (up to at least 18,000 amino acids and 32 subunits).

Pipeline Structure

This Pipeline uses the following three Machine Learning Capsules:

  1. CombFold - Prepare Fasta

  2. Streamlit ColabFold: AlphaFold2 using MMseqs2

  3. CombFold - Combinatorial Assembly

The Pipeline will look like the following.

Create Data Assets

Create a "json" subfolder inside the /data folder and upload a subunit json (json is described in the CombFold capsules README and on GitHub).

You can Create a Data Asset containing the ColabFold Model from the Code Ocean Bucket or download from the Alphafold Github repository. To use the public bucket, fill in the following information as a new Data Asset:

Bucket Name: codeocean-public-data

Path: models/colabfold

Attach Data Assets

  1. Click Manage Data Assets

  2. Attach the ColabFold Trained Model.

  1. Drag the ColabFold Trained Model Data Asset and the "json" folder onto the Pipeline UI.

Create Pipeline

  1. Create a Pipeline and add the Capsules from Code Ocean Apps: CombFold - prepare fasta, Streamlit ColabFold: AlphaFold2 using MMseqs2, CombFold - Combinatorial Assembly

  1. Connect CombFold - prepare fasta to Streamlit ColabFold: AlphaFold2 using MMseqs2 using Flatten. Set the Source to “capsule/results/fasta_pairs/*”

Flatten passes each output fasta subunit to be processed in parallel by ColabFold.

  1. Connect Streamlit ColabFold: AlphaFold2 using MMseqs2 to CombFold - Combinatorial Assembly using Collect. Set the Source to “capsule/results/*/pdb_files/*”

Collect passes all subunits are passed together downstream for assembly.

  1. Connect "json" using Default to both Comb Fold Capsules.

  2. Connect "ColabFold" to Streamlit ColabFold: AlphaFold2 using MMseqs2 using Collect. Set “capsule/data/colabfold” as the Destination.

  3. [optional] Connect CombFold - Prepare Fasta to Results. Set “pipeline/results/pairs” as the Destination.

  4. [optional] Connect Streamlit ColabFold: AlphaFold2 using MMseqs2 to Results. Set "pipeline/results/ColabFold” as the Destination.

  5. Connect CombFold - Combinatorial Assembly Capsule to Results. Set “pipeline/results/CombFold" as the Destination.

  6. To run the Pipeline, click Reproducible Run in the top right corner of the webpage.

Viewing Outputs

The protein structure can be viewed in the CombFold/make_figure.html file or it can be viewed using the Mol* Viewer for PDB Files in the Apps Library.