Pipeline Overview¶
TCRsift processes single-cell TCR data through a series of steps, each building on the previous.
Pipeline Steps¶
┌─────────────────┐
│ Sample Sheet │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Load Data │ ← CellRanger VDJ + GEX
└────────┬────────┘
│
▼
┌─────────────────┐
│ Phenotype │ ← CD4/CD8 classification
└────────┬────────┘
│
▼
┌─────────────────┐
│ Clonotype │ ← Aggregate by CDR3
└────────┬────────┘
│
▼
┌─────────────────┐
│ Filter │ ← Tiered selection
└────────┬────────┘
│
▼
┌─────────────────┐
│ Annotate │ ← VDJdb, IEDB, CEDAR
└────────┬────────┘
│
▼
┌─────────────────┐
│ Match TIL │ ← Optional TIL matching
└────────┬────────┘
│
▼
┌─────────────────┐
│ Assemble │ ← Full-length sequences
└─────────────────┘
1. Load Data¶
Parses CellRanger VDJ and GEX outputs:
- VDJ:
filtered_contig_annotations.csvfor TCR sequences - GEX:
filtered_feature_bc_matrix/for gene expression
The output is an AnnData object with:
- Cell barcodes as observations
- Gene expression in
X - VDJ annotations in
obscolumns
2. Phenotype Cells¶
Classifies cells as CD4+ or CD8+ based on gene expression:
Classification categories:
| Category | Criteria |
|---|---|
| Confident CD8+ | CD8/CD4 ratio > threshold |
| Confident CD4+ | CD4/CD8 ratio > threshold |
| Likely CD8+ | CD8 > 0 and CD4 = 0 |
| Likely CD4+ | CD4 > 0 and CD8 = 0 |
| Unknown | Similar expression levels |
3. Aggregate Clonotypes¶
Groups cells by CDR3 sequences:
Grouping options:
CDR3ab: Match by both alpha and beta chains (stricter)CDR3b_only: Match by beta chain only (more permissive)
4. Filter Clonotypes¶
Applies tiered filtering to prioritize antigen-specific clones:
See Filtering Strategies for detailed options.
5. Annotate Clonotypes¶
Matches against public TCR databases:
tcrsift annotate -i filtered/tier1.csv -o annotated.csv \
--vdjdb /path/to/vdjdb \
--iedb /path/to/iedb
Annotations include:
- Known epitope specificity
- Viral vs tumor antigens
- Database source
6. Match TIL (Optional)¶
For tumor studies, match culture clonotypes against TIL:
tcrsift match-til -i annotated.csv --til-csv til_clonotypes.csv -o matched.csv
# Multiple TIL samples without a sample sheet:
tcrsift match-til -i annotated.csv -o matched.csv \
--til-sample T1=csv:/path/to/til_t1.csv \
--til-sample T2=h5ad:/path/to/til_t2.h5ad
This identifies clones that:
- Were expanded in culture AND present in tumor
- Are TIL-specific (not in culture)
TIL samples are excluded from culture aggregation/filtering and only used for matching.
For TIL-only analysis (no culture input), use:
tcrsift til-clonotype -o til_clonotypes.csv \
--til-sample T1=csv:/path/to/til_t1.csv \
--til-sample T2=h5ad:/path/to/til_t2.h5ad
For TIL-only 10x VDJ+GEX timepoint prioritization (CD8 + enrichment + immunogenic masks), use:
til-select runs in v2-compatible CSV mode by default, so with the same data/options
it reproduces legacy harmonize_abtcr_timepoints.py CSV outputs. Figures and PDFs may
still differ byte-for-byte.
Expected per-timepoint files in --data-dir:
- consensus_annotations.<TP>.csv
- clonotypes.<TP>.csv
- filtered_contig_annotations.<TP>.csv
- sample_filtered_feature_bc_matrix.<TP>.h5
7. Assemble Full Sequences¶
Build full-length TCR sequences:
tcrsift assemble -i annotated.csv -o sequences.csv \
--include-constant \
--linker T2A \
--fasta sequences.fasta
Output includes:
- Leader peptide (from contigs)
- Variable region (VDJ)
- Constant region (from Ensembl)
- Single-chain construct (beta-T2A-alpha)
Running the Complete Pipeline¶
Use tcrsift run to execute all steps:
tcrsift run \
--sample-sheet samples.yaml \
--output-dir results/ \
--vdjdb /path/to/vdjdb \
--tcell-type cd8 \
--method threshold \
# report generation is enabled by default; use --no-report to disable
This creates a complete output directory with all intermediate files and a summary report.