mosaicmpi.Dataset

Contents

mosaicmpi.Dataset#

class mosaicmpi.Dataset(adata: AnnData, force_migrate: bool = False)#

Creates a Dataset object from an anndata.AnnData object.

Parameters:
  • adata (anndata.AnnData) – AnnData object with data

  • force_migrate (bool, optional) – forces conversion of AnnData objects even when adata.X and adata.raw.X are not linearly scaled relative to each other, defaults to False

Raises:
  • RuntimeError – Backed-mode Anndata objects cannot be migrated

  • ValueError – Error is raised when force is False and adata is non-linearly scaled.

Returns:

Object with expression and metadata

Return type:

Dataset

Attributes

has_cnmf_results

Test for wehther Dataset contains cNMF results for the dataset

hvf

Highly variable features used for cNMF

hvf_stats

Outputs the highly variable genes dataframe in pandas-compatible format.

is_imputed

Outputs the imputation status of the dataset.

is_normalized

Outputs the normalization status of the dataset.

mosaicmpi_version

mosaicMPI version used to create the dataset

patient_id_col

Outputs the normalization status of the dataset.

Methods

add_cnmf_results(cnmf_output_dir, cnmf_name)

After factorization, add completed cNMF results in [cnmf_output_dir]/[cnmf_name] to the dataset object.

append_to_history(entry)

Add entry to Dataset history.

calculate_cnmf_prediction_error([k])

Calculate cNMF prediction error using the method in the original cNMF package.

cross_validate_imputation(imputer[, n_folds])

Perform k-fold cross validation of imputation on the dataset without modifying the data.

from_anndata(adata[, force_migrate])

Creates a Dataset object from an anndata.AnnData object.

from_df(data, is_normalized[, sparsify, ...])

Creates a Dataset object from a pandas DataFrame.

from_h5ad(h5ad_file[, force_migrate, backed])

Creates a Dataset object from an AnnData-compatible .h5ad file.

get_approximation([k, program_type])

Return the approximated data by multiplying the programs and usage matrices for a given rank (k).

get_category_overrepresentation(layer[, ...])

Calculate Pearson residual of chi-squared test, associating programs for each rank (k) to categories of samples/observations.

get_history()

Returns timestamped history of Dataset object.

get_metadata_correlation(layer[, method])

Calculate correlation of program usage to numerical metadata across samples/observations.

get_metadata_df([include_categorical, ...])

Get sample/observation metadata.

get_printable_metadata_type_summary()

Return a printable summary of metadata and the types.

get_programs([k, type])

Get feature scores for programs.

get_usages([k, discretize, normalize])

Generate dataframe of program usage.

impute_knn([n_neighbors, weights, ...])

Imputation for completing missing values using k-Nearest Neighbors.

impute_zeros([cross_validate, n_folds])

Imputation by filling missing values with zeros.

initialize_cnmf(cnmf_output_dir, cnmf_name)

Initialize a cNMF run for subsequent factorization.

map_gene_ids(source_species, dest_species, ...)

Map the feature IDs in place for a dataset.

remove_cnmf_results()

remove_unfactorizable_features()

Removes features with missing values or zero variance from the data matrix.

remove_unfactorizable_observations()

Removes observations with all zeros from the data matrix.

select_hvf([stratify_by, stratify_mode, ...])

Select highly variable features (HVFs) for cNMF factorization.

select_hvf_cnmf([stratify_by, ...])

select_hvf_stdeconvolve([stratify_by, ...])

to_df([normalized])

Get data matrix as a pd.DataFrame.

update_obs(obs)

Update the observation metadata with a new metadata matrix

validate_cnmf_prediction_errors([tolerance])

Validate the dataset and cNMF solutions for each rank by comparing the prediction error values stored in the object [self.adata.uns.kvals] to those calculated from the dataset's data matrices [based on self.adata.X and self.adata.varm['cnmf_gep_raw']].

validate_feature_stats([tolerance])

Validate the dataset and cNMF solutions for each rank by comparing the calculated feature statistics (mean, SD, variance) stored in the object [self.adata.var] to those calculated from the dataset's data matrices [based on self.adata.X].

write_h5ad(filename[, safe_mode])

Write dataset to .h5ad file.