mowgli.models#

class mowgli.models.MowgliModel(latent_dim: int = 50, highly_variable: bool = True, use_mod_weight: bool = False, h_regularization: float = {'adt': 0.01, 'atac': 0.1, 'prot': 0.01, 'rna': 0.01}, w_regularization: float = 0.001, eps: float = 0.05, cost: str = 'cosine', pca_cost: bool = False, cost_path: dict = None)#

Bases: object

The Mowgli model, which performs integrative NMF with an Optimal Transport loss.

Parameters:
  • latent_dim (int, optional) – The latent dimension of the model. Defaults to 15.

  • highly_variable (bool, optional) – Whether to use highly variable features. Defaults to True. For now, only True is supported.

  • use_mod_weight (bool, optional) – Whether to use a different weight for each modality and each cell. If True, the weights are expected in the mod_weight obs field of each modality. Defaults to False.

  • h_regularization (float, optional) – The entropy parameter for the dictionary. Defaults to 0.01 for RNA and ADT and 0.1 for ATAC. If needed, other modalities should be specified by the user. We advise setting values between 0.001 (biological signal driven by very few features) and 1.0 (very diffuse biological signals).

  • w_regularization (float, optional) – The entropy parameter for the embedding. As with h_regularization, small values mean sparse vectors. Defaults to 1e-3.

  • eps (float, optional) – The entropy parameter for epsilon transport. Large values decrease importance of individual genes. Defaults to 5e-2.

  • cost (str, optional) – The function used to compute an emprical ground cost. All metrics from Scipy’s cdist are allowed. Defaults to ‘cosine’.

  • pca_cost (bool, optional) – If True, the emprical ground cost will be computed on PCA embeddings rather than raw data. Defaults to False.

  • cost_path (dict, optional) – Will look for an existing cost as a .npy file at this path. If not found, the cost will be computed then saved there. Defaults to None.

build_optimizer(params, lr: float, optim_name: str) Optimizer#

Generates the optimizer. The PyTorch LBGS implementation is parametrized following the discussion in https://discuss.pytorch.org/ t/unclear-purpose-of-max-iter-kwarg-in-the-lbfgs-optimizer/65695.

Parameters:
  • params (Iterable of Tensors) – The parameters to be optimized.

  • lr (float) – Learning rate of the optimizer.

  • optim_name (str) – Name of the optimizer, among ‘lbfgs’, ‘sgd’, ‘adam’

Returns:

The optimizer.

Return type:

torch.optim.Optimizer

init_parameters(mdata: MuData, dtype: dtype, device: device, force_recompute: bool = False, normalize_rows: bool = False) None#

Initialize parameters based on input data.

Parameters:
  • mdata (md.MuData) – The input MuData object.

  • dtype (torch.dtype) – The dtype to work with.

  • device (torch.device) – The device to work on.

  • force_recompute (bool, optional) – Whether to recompute the ground cost. Defaults to False.

loss_fn_h() Tensor#

Computes the loss for the update of H.

Returns:

The loss.

Return type:

torch.Tensor

loss_fn_w() Tensor#

Return the loss for the optimization of W

Returns:

The loss

Return type:

torch.Tensor

optimize(loss_fn: Callable, max_iter: int, history: List, tol: float, pbar, device: str) None#

Optimize a given function.

Parameters:
  • loss_fn (Callable) – The function to optimize.

  • max_iter (int) – The maximum number of iterations.

  • history (List) – A list to append the losses to.

  • tol (float) – The tolerance before early stopping.

  • pbar (A tqdm progress bar) – The progress bar.

  • device (str) – The device to work on.

total_dual_loss() Tensor#

Compute the total dual loss. This is only used by the user and for, early stopping, not by the optimization algorithm.

Returns:

The loss

Return type:

torch.Tensor

train(mdata: MuData, max_iter_inner: int = 1000, max_iter: int = 100, device: device = 'cpu', dtype: dtype = torch.float64, lr: float = 1, optim_name: str = 'lbfgs', tol_inner: float = 1e-12, tol_outer: float = 0.0001, normalize_rows: bool = False) None#

Train the Mowgli model on an input MuData object.

Parameters:
  • mdata (md.MuData) – The input MuData object.

  • max_iter_inner (int, optional) – How many iterations for the inner optimization loop (optimizing H, or W). Defaults to 1_000.

  • max_iter (int, optional) – How many interations for the outer optimization loop (how many successive optimizations of H and W). Defaults to 100.

  • device (torch.device, optional) – The device to work on. Defaults to ‘cpu’.

  • dtype (torch.dtype, optional) – The dtype to work with. Defaults to torch.double.

  • lr (float, optional) – The learning rate for the optimizer. The default is set for LBFGS and should be changed otherwise. Defaults to 1.

  • optim_name (str, optional) – The optimizer to use (lbfgs, sgd or adam). LBFGS is advised, but requires more memory. Defaults to “lbfgs”.

  • tol_inner (float, optional) – The tolerance for the inner iterations before early stopping. Defaults to 1e-12.

  • tol_outer (float, optional) – The tolerance for the outer iterations before early stopping. Defaults to 1e-4.