mosaicmpi.integration.Integration#
- class mosaicmpi.integration.Integration(datasets: dict[str, mosaicmpi.dataset.Dataset], corr_method: str = 'pearson', max_median_corr: float = 0, negative_corr_quantile: float = 0.95, k_subset: Collection[int] | Dict[str, Collection[int]] = (2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60))#
Integrate multiple datasets together.
- Parameters:
datasets (dict[str,
Dataset]) – dictionary of name: Dataset pairs.corr_method (str, optional) – Correlation method: “pearson”, “spearman”, or “kendall”, defaults to “pearson”
max_median_corr (float, optional) – Threshold for rank reduction procedure, relevant only for datasets where programs tend to be highly correlated. This procedure reduces the maximum rank included for a dataset until the median of the correlation distribution is below the threshold. Defaults to 0
negative_corr_quantile (float, optional) – Threshold for network-based integration, between 0 and 1, with 1 resulting in fewer edges in the network. Defaults to 0.95
k_subset (Union[Collection[int], Dict[str, Collection[int]]], optional) – k-values to use for integration. Either a Collection of integers, or a dict specifying k-values separately for each dataset. Defaults to (2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60)
Attributes
Get the number of datasets in the integration
- return:
Series with dataset and sample ID index. Values are the patient from which the samples/observations were derived.
Gets the values of k selected for integration.
Methods
compute_corr([method, cpus])Computes correlation matrix of all programs in the integration from all datasets.
compute_pairwise_thresholds([...])Compute thresholds for each dataset and dataset pair based on the correlation distribution of programs.
filter_programs_rank_reduction([max_median_corr])Filter programs using the rank-reduction procedure, relevant only for datasets where programs tend to be highly correlated.
get_category_overrepresentation(layer[, ...])Calculate Pearson residual of chi-squared test, associating programs for each rank (k) to categories of samples/observations.
Get the lower triangular correlation matrix for building the correlation network.
get_metadata_correlation(layer[, ...])Calculate correlation of programs usage to numerical metadata across samples/observations.
get_metadata_df([include_categorical, ...])Get sample/observation metadata for all datasets.
Get node counts before and after various node and edge filters.
get_programs([type])Get programs.
get_usages([discretize, normalize])Calculate usage of each program in each dataset and sample/observation.
select_k_values([k_subset, ...])Select k-values for integration.