mosaicmpi.integration.Integration

Contents

mosaicmpi.integration.Integration#

class mosaicmpi.integration.Integration(datasets: dict[str, mosaicmpi.dataset.Dataset], corr_method: str = 'pearson', max_median_corr: float = 0, negative_corr_quantile: float = 0.95, k_subset: Collection[int] | Dict[str, Collection[int]] = (2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60))#

Integrate multiple datasets together.

Parameters:
  • datasets (dict[str, Dataset]) – dictionary of name: Dataset pairs.

  • corr_method (str, optional) – Correlation method: “pearson”, “spearman”, or “kendall”, defaults to “pearson”

  • max_median_corr (float, optional) – Threshold for rank reduction procedure, relevant only for datasets where programs tend to be highly correlated. This procedure reduces the maximum rank included for a dataset until the median of the correlation distribution is below the threshold. Defaults to 0

  • negative_corr_quantile (float, optional) – Threshold for network-based integration, between 0 and 1, with 1 resulting in fewer edges in the network. Defaults to 0.95

  • k_subset (Union[Collection[int], Dict[str, Collection[int]]], optional) – k-values to use for integration. Either a Collection of integers, or a dict specifying k-values separately for each dataset. Defaults to (2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60)

Attributes

n_datasets

Get the number of datasets in the integration

sample_to_patient

return:

Series with dataset and sample ID index. Values are the patient from which the samples/observations were derived.

selected_k

Gets the values of k selected for integration.

Methods

compute_corr([method, cpus])

Computes correlation matrix of all programs in the integration from all datasets.

compute_pairwise_thresholds([...])

Compute thresholds for each dataset and dataset pair based on the correlation distribution of programs.

filter_programs_rank_reduction([max_median_corr])

Filter programs using the rank-reduction procedure, relevant only for datasets where programs tend to be highly correlated.

get_category_overrepresentation(layer[, ...])

Calculate Pearson residual of chi-squared test, associating programs for each rank (k) to categories of samples/observations.

get_corr_matrix_lowertriangle([...])

Get the lower triangular correlation matrix for building the correlation network.

get_features_overlap_table()

get_hvf_overlap_table()

get_metadata_correlation(layer[, ...])

Calculate correlation of programs usage to numerical metadata across samples/observations.

get_metadata_df([include_categorical, ...])

Get sample/observation metadata for all datasets.

get_node_table()

Get node counts before and after various node and edge filters.

get_programs([type])

Get programs.

get_usages([discretize, normalize])

Calculate usage of each program in each dataset and sample/observation.

select_k_values([k_subset, ...])

Select k-values for integration.