mosaicmpi.dataset.Dataset.select_hvf

mosaicmpi.dataset.Dataset.select_hvf#

Dataset.select_hvf(stratify_by: str | None = None, stratify_mode: Literal['intersection', 'union'] = 'union', use_normalized=True, max_missingness: float = 0.0, max_cells_proportion: float = 1.0, min_cells_proportion: float = 0.0, min_cells_mean: float = 0.0, min_cells_mean_quantile: float = 0.0, min_features: int = 0, min_raw_sum: float = 0.0, n_splines: int = 5, spline_order: int = 3, score_type: Literal['vscore', 'odscore'] = 'odscore', min_score: float | None = None, top_n: int | None = None, top_quantile: float | None = None, alpha: float | None = None, adjust_pvals: bool = True, feature_list: Collection[str] = None, multiple_threshold_mode: Literal['intersection', 'union'] = 'intersection')#

Select highly variable features (HVFs) for cNMF factorization.

Parameters:
  • stratify_by (str, optional) – model gene-variance relationship separately for each class of samples/cells based on the provided metadata field. For example, you could stratify by Sample ID for single-cell datasets., defaults to None

  • stratify_mode (Literal["intersection", "union"]) – select the union or intersection of gene lists identified from dataset strata, defaults to “union”

  • use_normalized (bool) – model mean and variance of the normalized (rather than raw/count data, if it exists), defaults to True

  • max_missingness (float, optional) – For datasets imputed using mosaicMPI, exclude features with greater than this proportion of imputed values, defaults to 0.0

  • max_cells_proportion (float, optional) – Exclude features with greater than this proportion of positive values, defaults to 1.0

  • min_cells_proportion (float, optional) – Exclude features with less than this proportion of positive values, defaults to 0.0

  • min_cells_proportion – Exclude features with less than mean, defaults to 0.0

  • min_cells_mean_quantile (float, optional) – Exclude features with less than quantile of mean, defaults to 0.0

  • min_features (int, optional) – Exclude samples/cells with fewer than this number of positive features, defaults to 0

  • min_raw_sum (float, optional) – Exclude samples/cells with a summed signal less than this threshold, defaults to 0.0

  • n_splines (int, optional) – Number of splines to use for fitting the Linear GAM, must be greater than spline_order, defaults to 5

  • spline_order (int, optional) – spline order (constant = 0, linear = 1, quadratic = 2, and cubic = 3), defaults to 3

  • score_type (Literal["vscore", "odscore"], optional) – Type of score for calculating overdispersion, defaults to “odscore”

  • min_score (Optional[float], optional) – Minimum score threshold for feature selection, defaults to None

  • top_n (Optional[int], optional) – Number of features to select after ranking features by score, defaults to None

  • top_quantile (Optional[float], optional) – Proportion of top features to select after ranking the score, defaults to None

  • alpha (Optional[float], optional) – Alpha (p-value) threshold for selection of HVFs, defaults to None

  • adjust_pvals (bool, optional) – Adjust p-values using the Benjamini-Hochberg procedure, defaults to True

  • feature_list (Collection[str], optional) – Select features using a custom list of features, defaults to None

  • multiple_threshold_mode (str) – how to combine multiple thresholds, using either “union” or “intersection”, defaults to “intersection”

Raises:
  • ValueError – No HVF selection criteria have been selected.

  • ValueError – The number of modelled features is less than twice the number of splines when computing the odscore

Returns:

Return type: