Long Read Single-Cell Analysis Tutorial

This tutorial will guide you through the automated combined analysis of multiple long read single-cell datasets. Automation includes defaults for GFF merging and reference transcript selection, protein translation and statistical filtering. These steps can be further customized (alternative options, statistical thresholds) through separate calls to the underlying python functions.

Prerequisites

Requirements:

  • Python 3.11 or higher

  • AltAnalyze3 installed via pip (pip install altanalyze3)

  • 10x Genomics long read matrices (.mtx) and associated GFF files

Sample Metadata: Ensure that your samples are properly annotated in a metadata file. A sample metadata file may look like this:

uid     gff                   matrix              library       reverse    groups
D001    /Diag1-1/D001.gff     /Diag1-1/sciso      D001-HSC      TRUE       Diagnosis
D001    /Diag1-2/D001.gff     /Diag1-2/sciso      D001-MPP      TRUE       Diagnosis
D002    /Diag2/D002.gff       /Diag2/sciso        D002-HSPC     TRUE       Diagnosis
D003    /Diag3/D003.gff       /Diag3/sciso        D003-HSPC     TRUE       Diagnosis
D004    /Relapse1/D004.gff    /Relapse1/sciso     D004-HSPC     TRUE       Relapse
D005    /Relapse2/D005.gff    /Relapse2/sciso     D005-HSPC     TRUE       Relapse
D006    /Relapse3/D006.gff    /Relapse3/sciso     D006-HSPC     FALSE      Relapse

Note: Multiple sequencing runs or libraries will be combined with the same uid. If the cell barcodes are reverse complemented from the barcode-to-cluster relationships (two column file), enter reverse as True.

Install Dependencies: Use the following command to install dependencies:

pip install altanalyze3

curl -O https://altanalyze.org/isoform/Hs.zip

unzip Hs.zip

Step-by-Step Preprocessing

Prepare Metadata and Cluster Files: You need metadata and barcode-cluster files for cluster-guided analyses. Extract database files from the Hs.zip file. Example:

/path/to/metadata.txt
/path/to/barcode_to_clusters.txt

/path/to/gencode.annotation.gff3
/path/to/Hs_Ensembl-annotations.txt
/path/to/Hs_Ensembl_exon.txt
/path/to/genome.fa

Run Preprocessing Script: In your Python environment or script, run:

import altanalyze3.components.long_read.isoform_matrix as iso
import altanalyze3.components.long_read.isoform_automate as isoa

metadata_file = "/path/to/metadata.txt"
ensembl_exon_dir = "/path/to/Hs_Ensembl_exon.txt"
barcode_cluster_dirs = ["/path/to/barcode_to_clusters.txt"]

sample_dict = isoa.import_metadata(metadata_file)
isoa.pre_process_samples(metadata_file, barcode_cluster_dirs, ensembl_exon_dir)

Note: ensembl_exon_dir, gene_symbol_file, genome_fasta and gencode_gff must pre-downloaded from the Hs.zip above.

Combining Processed Samples: Once preprocessed, combine them using:

import altanalyze3.components.long_read.comparisons as comp
gencode_gff = "/path/to/gencode.annotation.gff3"
genome_fasta = "/path/to/genome.fa"

isoa.combine_processed_samples(
   metadata_file,
   barcode_cluster_dirs,
   ensembl_exon_dir,
   gencode_gff,
   genome_fasta
)

Compute and Annotate Differential Splicing Events and Isoforms: Once preprocessed, combine them using:

gene_symbol_file = "/path/to/Hs_Ensembl-annotations.txt"

# Import all cell clusters in order or replace with a list of select cluster(s)
cluster_order = iso.return_cluster_order(barcode_cluster_dirs)

# Differential analyses to perform
analyses = ['junction', 'isoform', 'isoform-ratio']

condition1 = 'Diagnosis'
condition2 = 'Relapse'
conditions = [(condition1, condition2)]

comp.compute_differentials(
   sample_dict,
   conditions,
   cluster_order,
   gene_symbol_file,
   analyses=analyses
)

Expected Outputs:

  • gff_output - Directory of isoform exon structure and isoform mappings

  • sample.h5ad - Anndata for each sample with consensus isoform or junctions IDs

  • protein_sequences.fasta - protein sequence for consensus isoforms

  • protein_summary.txt - isoform NMD prediction

  • isoform_combined_pseudo_cluster_tpm.txt - Cluster-level pseudobulks TPMs

  • junction_combined_pseudo_cluster_counts.txt - Junction, intron & 3’ end counts

  • protein_summary.txt - isoform NMD prediction

  • psi_combined_pseudo_cluster_counts.txt - PSI for junctions in >2 cluster pseudobulks

  • junction_combined_pseudo_cluster_counts.txt - Junction, intron & 3’ end counts

  • dPSI-events.txt - Pairwise group Mann-Whitney U differential PSI events

  • dPSI-cluster/covariate - Pairwise group Mann-Whitney U differential PSI events

  • diff-cluster/covariate-isoform - Pairwise group Mann-Whitney U differential isoform log2 TPM

  • diff-cluster/covariate-ratio - Pairwise group Mann-Whitney U differential isoform/gene ratios

Verify Output: Ensure that the processed outputs include files with differential splicing, isoform, and ratio data in the current working directory.

Next Steps

After preprocessing, you are ready to inspect your results in a spreadsheet editor, Perform Secondary Analyses or Visualize Results. See the relevant tutorials for these steps.

Support

For issues, please refer to our GitHub repository: https://github.com/SalomonisLab/altanalyze3