This document is about dimensionality reduction, visualization and clustering of the Blood2K dataset using scAND. More documentation on imputation and batch effect correction can be found at

Load data

The input of scAND contains three parts:

  1. A pandas.Series() represents the barcode of cells
  2. A pandas.Series() represents the region of peaks
  3. A pandas.DataFrame() contains 3 columns: Peaks, Cells, Counts (The number of Peaks and Cells start as 1). Each row of the DataFrame represents an element in scATAC-seq data.

Estimate the value of beta

We introduced a leave-out imputation strategy for the selection of parameter β . Explicitly, we ranmdomly set 10% (by default) entries to 0 and calculated the L2 distance between the true data and the imputed one with scAND.

For large-scale dataset, we recommended using peaks from a single chromosome.

Run scAND

Run_scAND(Count_df, d, weights, cells, peaks, random_seed=2019, L2_norm=True, Binary=True, Graph_norm=True, return_peaks=False, verbose=True)

The parameters of Run_scAND() function:

  1. Count_df: A pandas.DataFrame() contains 3 columns: Peaks, Cells, Counts (The number of Peaks and Cells start as 1). Each row of the DataFrame represents an element in scATAC-seq data.
  2. d: The dimensions of the low-dimensional representation of scAND.
  3. weights: The parameter beta in scAND model. The input should be a list() or a np.array(). scAND can calculate results of different parameters simultaneously while only adding very little computational complexity.
  4. cells: A pandas.Series() represents the barcode of cells
  5. peaks: A pandas.Series() represents the region of peaks
  6. random_seed: The random seed.
  7. Binary: Logical, should binarization be applied to the scATAC-seq matrix?
  8. Graph_norm: Logical, should graph normalization be applied to the adjacency matrix of network?
  9. return_peaks: Logical, should the scAND representation of peaks be returned?
  10. return_Norm_factor: Logical, should the norm factor of graph normalization process be returned?
  11. verbose: Logical, should the function run silently?

Visualization and Clustering