Integrates single-cell RNA-seq data directly from SingleCellExperiment or Seurat objects. Supports detection of variable genes , scaling, PCA, neighbor graph construction, clustering, and UMAP embedding, with multiple integration methods.
Usage
DO.Integration(
sce_object,
split_key = "orig.ident",
HVG = FALSE,
scale = FALSE,
pca = FALSE,
neighbors = TRUE,
neighbors_dim = seq_len(50),
clusters = TRUE,
clusters_res = 0.3,
clusters_algorithm = 4,
umap = TRUE,
umap_key = "UMAP",
umap_dim = seq_len(50),
integration_method = "CCAIntegration",
selection_method = "vst",
loess_span = 0.3,
clip_max = "auto",
num_bin = 20,
binning_method = "equal_width",
scale_max = 10,
pca_key = "PCA",
integration_key = "INTEGRATED.CCA",
npcs = 50,
verbose = FALSE
)Arguments
- sce_object
Seurat or SCE Object
- split_key
Character. Column in meta data to split the samples by, default orig.ident
- HVG
Logical. Perform detection of highly variable genes
- scale
Logical. Perform scaling of the expression data
- pca
Logical. Perform principal component analysis
- neighbors
Logical. Perform Nearest-neighbor graph after integration
- neighbors_dim
Numeric range. Dimensions of reduction to use as input
- clusters
Logical. Perform clustering of cells
- clusters_res
Numeric. Value of the resolution parameter, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of communities.
- clusters_algorithm
Numeric. Define the algorithm for clustering, default 4 for "Leiden"
- umap
Logical. Runs the Uniform Manifold Approximation and Projection
- umap_key
Character name for
- umap_dim
Numeric range. Which dimensions to use as input features
- integration_method
Character. Define the integration method, please check what versions are supported in Seurat::IntegrateLayers function
- selection_method
Character. Default "vst". Options: "mean.var.plot", "dispersion"
- loess_span
Numeric. Loess span parameter used when fitting the variance-mean relationship
- clip_max
Character. After standardization values larger than clip.max will be set to clip.max; default is 'auto' which sets this value to the square root of the number of cells
- num_bin
Numeric. Total number of bins to use in the scaled analysis (default is 20)
- binning_method
Character. “equal_width”: each bin is of equal width along the x-axis (default). Options: “equal_frequency”:
- scale_max
Numeric. Max value to return for scaled data. The default is 10.
- pca_key
Character. Key name to save the pca result in
- integration_key
Character. Key name to save the integration result in
- npcs
Numeric. Total Number of PCs to compute and store (50 by default)
- verbose
Logical. Verbosity for all functions
Examples
sce_data <-
readRDS(system.file("extdata", "sce_data.rds", package = "DOtools"))
DO.Integration(
sce_object = sce_data,
split_key = "orig.ident",
HVG = TRUE,
scale = TRUE,
pca = TRUE,
integration_method = "CCAIntegration"
)
#> 2025-10-02 13:45:01 - Splitting object for integration with CCAIntegration by orig.ident
#> 2025-10-02 13:45:01 - Calculating highly variable genes
#> 2025-10-02 13:45:01 - Scaling object
#> 2025-10-02 13:45:01 - Running pca, saved in key: PCA
#> Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.
#> 2025-10-02 13:45:02 - Running integration, saved in key: INTEGRATED.CCA
#> 2025-10-02 13:45:06 - Running Nearest-neighbor graph construction
#> 2025-10-02 13:45:07 - Running cluster detection
#> 2025-10-02 13:45:08 - Creating UMAP
#> class: SingleCellExperiment
#> dim: 800 2807
#> metadata(0):
#> assays(3): counts logcounts scaledata
#> rownames(800): HES4 ISG15 ... SERPINA9 DSG2
#> rowData names(0):
#> colnames(2807): AAACCCAGTGCATTTG-1_1 AAACCCATCTCAACGA-1_1 ...
#> TTTGGAGCAACTGGTT-1_2 TTTGGAGGTTACCTGA-1_2
#> colData names(15): orig.ident nCount_RNA ... annotation_recluster
#> leiden0.3
#> reducedDimNames(3): PCA INTEGRATED.CCA UMAP
#> mainExpName: RNA
#> altExpNames(0):