mosaicmpi.dataset.Dataset#

class mosaicmpi.dataset.Dataset(adata: AnnData, force_migrate: bool = False)#

Creates a Dataset object from an anndata.AnnData object.

Parameters:

adata (anndata.AnnData) – AnnData object with data
force_migrate (bool, optional) – forces conversion of AnnData objects even when adata.X and adata.raw.X are not linearly scaled relative to each other, defaults to False

Raises:

RuntimeError – Backed-mode Anndata objects cannot be migrated
ValueError – Error is raised when force is False and adata is non-linearly scaled.

Returns:

Object with expression and metadata

Return type:

Dataset

Attributes

`has_cnmf_results`	Test for wehther Dataset contains cNMF results for the dataset
`hvf`	Highly variable features used for cNMF
`hvf_stats`	Outputs the highly variable genes dataframe in pandas-compatible format.
`is_imputed`	Outputs the imputation status of the dataset.
`is_normalized`	Outputs the normalization status of the dataset.
`mosaicmpi_version`	mosaicMPI version used to create the dataset
`patient_id_col`	Outputs the normalization status of the dataset.

Methods

`add_cnmf_results`(cnmf_output_dir, cnmf_name)	After factorization, add completed cNMF results in [cnmf_output_dir]/[cnmf_name] to the dataset object.
`append_to_history`(entry)	Add entry to Dataset history.
`calculate_cnmf_prediction_error`([k])	Calculate cNMF prediction error using the method in the original cNMF package.
`cross_validate_imputation`(imputer[, n_folds])	Perform k-fold cross validation of imputation on the dataset without modifying the data.
`from_anndata`(adata[, force_migrate])	Creates a `Dataset` object from an `anndata.AnnData` object.
`from_df`(data, is_normalized[, sparsify, ...])	Creates a `Dataset` object from a pandas DataFrame.
`from_h5ad`(h5ad_file[, force_migrate, backed])	Creates a `Dataset` object from an AnnData-compatible .h5ad file.
`get_approximation`([k, program_type])	Return the approximated data by multiplying the programs and usage matrices for a given rank (k).
`get_category_overrepresentation`(layer[, ...])	Calculate Pearson residual of chi-squared test, associating programs for each rank (k) to categories of samples/observations.
`get_history`()	Returns timestamped history of Dataset object.
`get_metadata_correlation`(layer[, method])	Calculate correlation of program usage to numerical metadata across samples/observations.
`get_metadata_df`([include_categorical, ...])	Get sample/observation metadata.
`get_printable_metadata_type_summary`()	Return a printable summary of metadata and the types.
`get_programs`([k, type])	Get feature scores for programs.
`get_usages`([k, discretize, normalize])	Generate dataframe of program usage.
`impute_knn`([n_neighbors, weights, ...])	Imputation for completing missing values using k-Nearest Neighbors.
`impute_zeros`([cross_validate, n_folds])	Imputation by filling missing values with zeros.
`initialize_cnmf`(cnmf_output_dir, cnmf_name)	Initialize a cNMF run for subsequent factorization.
`map_gene_ids`(source_species, dest_species, ...)	Map the feature IDs in place for a dataset.
`remove_cnmf_results`()
`remove_unfactorizable_features`()	Removes features with missing values or zero variance from the data matrix.
`remove_unfactorizable_observations`()	Removes observations with all zeros from the data matrix.
`select_hvf`([stratify_by, stratify_mode, ...])	Select highly variable features (HVFs) for cNMF factorization.
`select_hvf_cnmf`([stratify_by, ...])
`select_hvf_stdeconvolve`([stratify_by, ...])
`to_df`([normalized])	Get data matrix as a pd.DataFrame.
`update_obs`(obs)	Update the observation metadata with a new metadata matrix
`validate_cnmf_prediction_errors`([tolerance])	Validate the dataset and cNMF solutions for each rank by comparing the prediction error values stored in the object [self.adata.uns.kvals] to those calculated from the dataset's data matrices [based on self.adata.X and self.adata.varm['cnmf_gep_raw']].
`validate_feature_stats`([tolerance])	Validate the dataset and cNMF solutions for each rank by comparing the calculated feature statistics (mean, SD, variance) stored in the object [self.adata.var] to those calculated from the dataset's data matrices [based on self.adata.X].
`write_h5ad`(filename[, safe_mode])	Write dataset to .h5ad file.

mosaicmpi.dataset.Dataset

Contents

mosaicmpi.dataset.Dataset#