Long Read Single-Cell Analysis Tutorial

This tutorial will guide you through the automated combined analysis of multiple long read single-cell datasets. Automation includes defaults for GFF merging and reference transcript selection, protein translation and statistical filtering. These steps can be further customized (alternative options, statistical thresholds) through separate calls to the underlying python functions.

Prerequisites

Requirements:

Python 3.11 or higher
AltAnalyze3 installed via pip (pip install altanalyze3)
10x Genomics long read matrices (.mtx) and associated GFF files

Sample Metadata: Ensure that your samples are properly annotated in a metadata file. A sample metadata file may look like this:

uid     gff                   matrix              library       reverse    groups
D001    /Diag1-1/D001.gff     /Diag1-1/sciso      D001-HSC      TRUE       Diagnosis
D001    /Diag1-2/D001.gff     /Diag1-2/sciso      D001-MPP      TRUE       Diagnosis
D002    /Diag2/D002.gff       /Diag2/sciso        D002-HSPC     TRUE       Diagnosis
D003    /Diag3/D003.gff       /Diag3/sciso        D003-HSPC     TRUE       Diagnosis
D004    /Relapse1/D004.gff    /Relapse1/sciso     D004-HSPC     TRUE       Relapse
D005    /Relapse2/D005.gff    /Relapse2/sciso     D005-HSPC     TRUE       Relapse
D006    /Relapse3/D006.gff    /Relapse3/sciso     D006-HSPC     FALSE      Relapse

Note: Multiple sequencing runs or libraries will be combined with the same uid. If the cell barcodes are reverse complemented from the barcode-to-cluster relationships (two column file), enter reverse as True.

Install Dependencies: Use the following command to install dependencies:

pip install altanalyze3

curl -O https://altanalyze.org/isoform/Hs.zip

unzip Hs.zip

Step-by-Step Preprocessing

Prepare Metadata and Cluster Files: You need metadata and barcode-cluster files for cluster-guided analyses. Extract database files from the Hs.zip file. Example:

/path/to/metadata.txt
/path/to/barcode_to_clusters.txt

/path/to/gencode.annotation.gff3
/path/to/Hs_Ensembl-annotations.txt
/path/to/Hs_Ensembl_exon.txt
/path/to/genome.fa

Run Preprocessing Script: In your Python environment or script, run:

import altanalyze3.components.long_read.isoform_matrix as iso
import altanalyze3.components.long_read.isoform_automate as isoa

metadata_file = "/path/to/metadata.txt"
ensembl_exon_dir = "/path/to/Hs_Ensembl_exon.txt"
barcode_cluster_dirs = ["/path/to/barcode_to_clusters.txt"]

sample_dict = isoa.import_metadata(metadata_file)
isoa.pre_process_samples(metadata_file, barcode_cluster_dirs, ensembl_exon_dir)

Note: ensembl_exon_dir, gene_symbol_file, genome_fasta and gencode_gff must pre-downloaded from the Hs.zip above.

Combining Processed Samples: Once preprocessed, combine them using:

import altanalyze3.components.long_read.comparisons as comp
gencode_gff = "/path/to/gencode.annotation.gff3"
genome_fasta = "/path/to/genome.fa"

isoa.combine_processed_samples(
   metadata_file,
   barcode_cluster_dirs,
   ensembl_exon_dir,
   gencode_gff,
   genome_fasta
)

Compute and Annotate Differential Splicing Events and Isoforms: Once preprocessed, combine them using:

gene_symbol_file = "/path/to/Hs_Ensembl-annotations.txt"

# Import all cell clusters in order or replace with a list of select cluster(s)
cluster_order = iso.return_cluster_order(barcode_cluster_dirs)

# Differential analyses to perform
analyses = ['junction', 'isoform', 'isoform-ratio']

condition1 = 'Diagnosis'
condition2 = 'Relapse'
conditions = [(condition1, condition2)]

comp.compute_differentials(
   sample_dict,
   conditions,
   cluster_order,
   gene_symbol_file,
   analyses=analyses
)

Expected Outputs:

gff_output - Directory of isoform exon structure and isoform mappings
sample.h5ad - Anndata for each sample with consensus isoform or junctions IDs
protein_sequences.fasta - protein sequence for consensus isoforms
protein_summary.txt - isoform NMD prediction
isoform_combined_pseudo_cluster_tpm.txt - Cluster-level pseudobulks TPMs
junction_combined_pseudo_cluster_counts.txt - Junction, intron & 3’ end counts
protein_summary.txt - isoform NMD prediction
psi_combined_pseudo_cluster_counts.txt - PSI for junctions in >2 cluster pseudobulks
junction_combined_pseudo_cluster_counts.txt - Junction, intron & 3’ end counts
dPSI-events.txt - Pairwise group Mann-Whitney U differential PSI events
dPSI-cluster/covariate - Pairwise group Mann-Whitney U differential PSI events
diff-cluster/covariate-isoform - Pairwise group Mann-Whitney U differential isoform log2 TPM
diff-cluster/covariate-ratio - Pairwise group Mann-Whitney U differential isoform/gene ratios

Verify Output: Ensure that the processed outputs include files with differential splicing, isoform, and ratio data in the current working directory.

Next Steps

After preprocessing, you are ready to inspect your results in a spreadsheet editor, Perform Secondary Analyses or Visualize Results. See the relevant tutorials for these steps.

Support

For issues, please refer to our GitHub repository: https://github.com/SalomonisLab/altanalyze3