mosaicmpi.dataset.Dataset.select_hvf#
- Dataset.select_hvf(stratify_by: str | None = None, stratify_mode: Literal['intersection', 'union'] = 'union', use_normalized=True, max_missingness: float = 0.0, max_cells_proportion: float = 1.0, min_cells_proportion: float = 0.0, min_cells_mean: float = 0.0, min_cells_mean_quantile: float = 0.0, min_features: int = 0, min_raw_sum: float = 0.0, n_splines: int = 5, spline_order: int = 3, score_type: Literal['vscore', 'odscore'] = 'odscore', min_score: float | None = None, top_n: int | None = None, top_quantile: float | None = None, alpha: float | None = None, adjust_pvals: bool = True, feature_list: Collection[str] = None, multiple_threshold_mode: Literal['intersection', 'union'] = 'intersection')#
Select highly variable features (HVFs) for cNMF factorization.
- Parameters:
stratify_by (str, optional) – model gene-variance relationship separately for each class of samples/cells based on the provided metadata field. For example, you could stratify by Sample ID for single-cell datasets., defaults to None
stratify_mode (Literal["intersection", "union"]) – select the union or intersection of gene lists identified from dataset strata, defaults to “union”
use_normalized (bool) – model mean and variance of the normalized (rather than raw/count data, if it exists), defaults to True
max_missingness (float, optional) – For datasets imputed using mosaicMPI, exclude features with greater than this proportion of imputed values, defaults to 0.0
max_cells_proportion (float, optional) – Exclude features with greater than this proportion of positive values, defaults to 1.0
min_cells_proportion (float, optional) – Exclude features with less than this proportion of positive values, defaults to 0.0
min_cells_proportion – Exclude features with less than mean, defaults to 0.0
min_cells_mean_quantile (float, optional) – Exclude features with less than quantile of mean, defaults to 0.0
min_features (int, optional) – Exclude samples/cells with fewer than this number of positive features, defaults to 0
min_raw_sum (float, optional) – Exclude samples/cells with a summed signal less than this threshold, defaults to 0.0
n_splines (int, optional) – Number of splines to use for fitting the Linear GAM, must be greater than spline_order, defaults to 5
spline_order (int, optional) – spline order (constant = 0, linear = 1, quadratic = 2, and cubic = 3), defaults to 3
score_type (Literal["vscore", "odscore"], optional) – Type of score for calculating overdispersion, defaults to “odscore”
min_score (Optional[float], optional) – Minimum score threshold for feature selection, defaults to None
top_n (Optional[int], optional) – Number of features to select after ranking features by score, defaults to None
top_quantile (Optional[float], optional) – Proportion of top features to select after ranking the score, defaults to None
alpha (Optional[float], optional) – Alpha (p-value) threshold for selection of HVFs, defaults to None
adjust_pvals (bool, optional) – Adjust p-values using the Benjamini-Hochberg procedure, defaults to True
feature_list (Collection[str], optional) – Select features using a custom list of features, defaults to None
multiple_threshold_mode (str) – how to combine multiple thresholds, using either “union” or “intersection”, defaults to “intersection”
- Raises:
ValueError – No HVF selection criteria have been selected.
ValueError – The number of modelled features is less than twice the number of splines when computing the odscore
- Returns:
- Return type: