mosaicmpi.dataset.Dataset#
- class mosaicmpi.dataset.Dataset(adata: AnnData, force_migrate: bool = False)#
Creates a
Datasetobject from ananndata.AnnDataobject.- Parameters:
adata (
anndata.AnnData) – AnnData object with dataforce_migrate (bool, optional) – forces conversion of AnnData objects even when adata.X and adata.raw.X are not linearly scaled relative to each other, defaults to False
- Raises:
RuntimeError – Backed-mode Anndata objects cannot be migrated
ValueError – Error is raised when force is False and adata is non-linearly scaled.
- Returns:
Object with expression and metadata
- Return type:
Attributes
Test for wehther Dataset contains cNMF results for the dataset
Highly variable features used for cNMF
Outputs the highly variable genes dataframe in pandas-compatible format.
Outputs the imputation status of the dataset.
Outputs the normalization status of the dataset.
mosaicMPI version used to create the dataset
Outputs the normalization status of the dataset.
Methods
add_cnmf_results(cnmf_output_dir, cnmf_name)After factorization, add completed cNMF results in [cnmf_output_dir]/[cnmf_name] to the dataset object.
append_to_history(entry)Add entry to Dataset history.
Calculate cNMF prediction error using the method in the original cNMF package.
cross_validate_imputation(imputer[, n_folds])Perform k-fold cross validation of imputation on the dataset without modifying the data.
from_anndata(adata[, force_migrate])Creates a
Datasetobject from ananndata.AnnDataobject.from_df(data, is_normalized[, sparsify, ...])Creates a
Datasetobject from a pandas DataFrame.from_h5ad(h5ad_file[, force_migrate, backed])Creates a
Datasetobject from an AnnData-compatible .h5ad file.get_approximation([k, program_type])Return the approximated data by multiplying the programs and usage matrices for a given rank (k).
get_category_overrepresentation(layer[, ...])Calculate Pearson residual of chi-squared test, associating programs for each rank (k) to categories of samples/observations.
Returns timestamped history of Dataset object.
get_metadata_correlation(layer[, method])Calculate correlation of program usage to numerical metadata across samples/observations.
get_metadata_df([include_categorical, ...])Get sample/observation metadata.
Return a printable summary of metadata and the types.
get_programs([k, type])Get feature scores for programs.
get_usages([k, discretize, normalize])Generate dataframe of program usage.
impute_knn([n_neighbors, weights, ...])Imputation for completing missing values using k-Nearest Neighbors.
impute_zeros([cross_validate, n_folds])Imputation by filling missing values with zeros.
initialize_cnmf(cnmf_output_dir, cnmf_name)Initialize a cNMF run for subsequent factorization.
map_gene_ids(source_species, dest_species, ...)Map the feature IDs in place for a dataset.
Removes features with missing values or zero variance from the data matrix.
Removes observations with all zeros from the data matrix.
select_hvf([stratify_by, stratify_mode, ...])Select highly variable features (HVFs) for cNMF factorization.
select_hvf_cnmf([stratify_by, ...])select_hvf_stdeconvolve([stratify_by, ...])to_df([normalized])Get data matrix as a pd.DataFrame.
update_obs(obs)Update the observation metadata with a new metadata matrix
validate_cnmf_prediction_errors([tolerance])Validate the dataset and cNMF solutions for each rank by comparing the prediction error values stored in the object [self.adata.uns.kvals] to those calculated from the dataset's data matrices [based on self.adata.X and self.adata.varm['cnmf_gep_raw']].
validate_feature_stats([tolerance])Validate the dataset and cNMF solutions for each rank by comparing the calculated feature statistics (mean, SD, variance) stored in the object [self.adata.var] to those calculated from the dataset's data matrices [based on self.adata.X].
write_h5ad(filename[, safe_mode])Write dataset to .h5ad file.