Long Read Single-Cell Analysis Tutorial
This tutorial will guide you through the automated combined analysis of multiple long read single-cell datasets. Automation includes defaults for GFF merging and reference transcript selection, protein translation and statistical filtering. These steps can be further customized (alternative options, statistical thresholds) through separate calls to the underlying python functions.
Prerequisites
Requirements:
Python 3.11 or higher
AltAnalyze3 installed via pip (pip install altanalyze3)
10x Genomics long read matrices (.mtx) and associated GFF files
Sample Metadata: Ensure that your samples are properly annotated in a metadata file. A sample metadata file may look like this:
uid gff matrix library reverse groups
D001 /Diag1-1/D001.gff /Diag1-1/sciso D001-HSC TRUE Diagnosis
D001 /Diag1-2/D001.gff /Diag1-2/sciso D001-MPP TRUE Diagnosis
D002 /Diag2/D002.gff /Diag2/sciso D002-HSPC TRUE Diagnosis
D003 /Diag3/D003.gff /Diag3/sciso D003-HSPC TRUE Diagnosis
D004 /Relapse1/D004.gff /Relapse1/sciso D004-HSPC TRUE Relapse
D005 /Relapse2/D005.gff /Relapse2/sciso D005-HSPC TRUE Relapse
D006 /Relapse3/D006.gff /Relapse3/sciso D006-HSPC FALSE Relapse
Note: Multiple sequencing runs or libraries will be combined with the same uid. If the cell barcodes are reverse complemented from the barcode-to-cluster relationships (two column file), enter reverse as True.
Install Dependencies: Use the following command to install dependencies:
pip install altanalyze3
curl -O https://altanalyze.org/isoform/Hs.zip
unzip Hs.zip
Step-by-Step Preprocessing
Prepare Metadata and Cluster Files: You need metadata and barcode-cluster files for cluster-guided analyses. Extract database files from the Hs.zip file. Example:
/path/to/metadata.txt
/path/to/barcode_to_clusters.txt
/path/to/gencode.annotation.gff3
/path/to/Hs_Ensembl-annotations.txt
/path/to/Hs_Ensembl_exon.txt
/path/to/genome.fa
Run Preprocessing Script: In your Python environment or script, run:
import altanalyze3.components.long_read.isoform_matrix as iso
import altanalyze3.components.long_read.isoform_automate as isoa
metadata_file = "/path/to/metadata.txt"
ensembl_exon_dir = "/path/to/Hs_Ensembl_exon.txt"
barcode_cluster_dirs = ["/path/to/barcode_to_clusters.txt"]
sample_dict = isoa.import_metadata(metadata_file)
isoa.pre_process_samples(metadata_file, barcode_cluster_dirs, ensembl_exon_dir)
Note: ensembl_exon_dir, gene_symbol_file, genome_fasta and gencode_gff must pre-downloaded from the Hs.zip above.
Combining Processed Samples: Once preprocessed, combine them using:
import altanalyze3.components.long_read.comparisons as comp
gencode_gff = "/path/to/gencode.annotation.gff3"
genome_fasta = "/path/to/genome.fa"
isoa.combine_processed_samples(
metadata_file,
barcode_cluster_dirs,
ensembl_exon_dir,
gencode_gff,
genome_fasta
)
Compute and Annotate Differential Splicing Events and Isoforms: Once preprocessed, combine them using:
gene_symbol_file = "/path/to/Hs_Ensembl-annotations.txt"
# Import all cell clusters in order or replace with a list of select cluster(s)
cluster_order = iso.return_cluster_order(barcode_cluster_dirs)
# Differential analyses to perform
analyses = ['junction', 'isoform', 'isoform-ratio']
condition1 = 'Diagnosis'
condition2 = 'Relapse'
conditions = [(condition1, condition2)]
comp.compute_differentials(
sample_dict,
conditions,
cluster_order,
gene_symbol_file,
analyses=analyses
)
Expected Outputs:
gff_output - Directory of isoform exon structure and isoform mappings
sample.h5ad - Anndata for each sample with consensus isoform or junctions IDs
protein_sequences.fasta - protein sequence for consensus isoforms
protein_summary.txt - isoform NMD prediction
isoform_combined_pseudo_cluster_tpm.txt - Cluster-level pseudobulks TPMs
junction_combined_pseudo_cluster_counts.txt - Junction, intron & 3’ end counts
protein_summary.txt - isoform NMD prediction
psi_combined_pseudo_cluster_counts.txt - PSI for junctions in >2 cluster pseudobulks
junction_combined_pseudo_cluster_counts.txt - Junction, intron & 3’ end counts
dPSI-events.txt - Pairwise group Mann-Whitney U differential PSI events
dPSI-cluster/covariate - Pairwise group Mann-Whitney U differential PSI events
diff-cluster/covariate-isoform - Pairwise group Mann-Whitney U differential isoform log2 TPM
diff-cluster/covariate-ratio - Pairwise group Mann-Whitney U differential isoform/gene ratios
Verify Output: Ensure that the processed outputs include files with differential splicing, isoform, and ratio data in the current working directory.
Next Steps
After preprocessing, you are ready to inspect your results in a spreadsheet editor, Perform Secondary Analyses or Visualize Results. See the relevant tutorials for these steps.
Support
For issues, please refer to our GitHub repository: https://github.com/SalomonisLab/altanalyze3