API Reference#

This is the API

diemtype#

diempy.diemtype.flip_polarity(diemMatrix, newPolarity, oldPolarity=None)[source]#

Flip the polarity of the Diem Matrix according to the polarity array. diem matrix is an array with inds as rows and sites as columns polarity is a numpy array of shape nMarkers, with entries 0 or 1, where 1 indicates that the marker should be flipped.

Parameters:
  • diemMatrix (np.ndarray) – The Diem Matrix to be flipped.

  • newPolarity (np.ndarray) – Array indicating the desired polarity for each marker (0 or 1).

  • oldPolarity (np.ndarray, optional) – Array indicating the current polarity for each marker (0 or 1). If oldPolarity = None, it flips the markers where newPolarity is 1.

Returns:

The Diem Matrix with updated polarity.

Return type:

np.ndarray

diempy.diemtype.load_DiemType(pcklPath)[source]#

Load a dict of objects that make a DiemType, then construct that DiemType from the loaded data. This helps ensure that, so long as variables are not renamed or removed, DiemType objects can be saved and loaded even if the class definition changes. attributes can be added, and we could later add code to handle removed attributes if needed.

Parameters:

pcklPath (str) – Path to the pickle file.

Returns:

The loaded DiemType object.

Return type:

DiemType

diempy.diemtype.save_DiemType(diemTypeObj, pcklPath)[source]#

Save the dictionary of variables for a DiemType object to a pickle file. File extensions something like my_name.diemtype.dict.pkl

Parameters:
  • diemTypeObj (DiemType) – The DiemType object to be saved.

  • pcklPath (str) – Path to save the pickle file.

class diempy.diemtype.DiemType(DMBC, indNames, chrPloidies, chrNames, posByChr, chrLengths, exclusionsByChr=None, indExclusions=None)[source]#

Class describing the raw data for state matrices, and functions for thresholding, kernel smoothing, etc.

Parameters:
  • DMBC (List[np.ndarray]) – State matrix by chromosome. For arrays, each row is an individual, each column a marker.

  • indNames (np.ndarray) – Names of individuals, same order as DMBC.

  • chrPloidies (List[np.ndarray]) – for each chromosome, the Ploidy of each individual, same order as DMBC.

  • chrNames (np.ndarray) – Names of chromosomes as ordered in DMBC.

  • posByChr (List[np.ndarray]) – Positions of markers by chromosome.

  • chrLengths (List[int]) – Lengths of chromosomes.

  • exclusionsByChr (List[np.ndarray], optional) – List of arrays of positions indicating which sites to exclude for each chromosome when polarizing. If None, includes all sites. If some sites are excluded, but a given chromosome has no exclusions, that list entry should be None.

  • indExclusions (np.ndarray, optional) – array of names of individuals to exclude when polarizing. If None, includes all individuals.

Variables:
  • DMBC – List[np.ndarray]. State matrix by chromosome.

  • indNames – np.ndarray. Names of individuals.

  • chrPloidies – List[np.ndarray]. For each chromosome, the ploidy of each individual.

  • chrNames – np.ndarray. Names of chromosomes.

  • posByChr – List[np.ndarray]. Positions of markers by chromosome.

  • chrLengths – List[int]. Lengths of chromosomes.

  • MapBC – List[np.ndarray]. Genetic map positions by chromosome, computed on initialization.

  • HIs – np.ndarray. Heterozygosity indices, to be computed.

  • PolByChr – List[np.ndarray]. Polymorphism matrix by chromosome, to be computed.

  • initialPolByChr – List[np.ndarray]. Initial polarity by chromosome for EM start (random or test), to be computed.

  • DIByChr – List[np.ndarray]. Diagnostic index by chromosome, to be computed.

  • SupportByChr – List[np.ndarray]. Support values by chromosome, to be computed.

  • threshold – float. Threshold value to be set.

  • smoothScale – float. Scale for kernel smoothing, to be set.

  • contigMatrix – np.ndarray dtype=object. Matrix of Contig objects, to be created.

  • siteExclusionsByChr – List[np.ndarray]. List of arrays of positions indicating which sites to exclude for each chromosome when polarizing.

  • indExclusions – np.ndarray. Array of names of individuals to exclude when polarizing.

  • relativeRecRateDict – dict. Dictionary of relative recombination rates by chromosome name.

apply_threshold(threshold, sort_by_HI=False)[source]#

Apply a threshold to the diagnostic indices and update to remove sites below threshold di. Returns a copy of the modified instance.

Parameters:

threshold (float) – Threshold value for diagnostic index. Sites with DI below this value will be removed.

Returns:

A new DiemType instance with sites below the threshold removed.

Return type:

DiemType

copy()[source]#

Create a deep copy of the current instance.

create_contig_matrix(includeSingle=True)[source]#

Create a matrix of Contig objects from the current DiemType instance and store it in self.contigMatrix.

Parameters:

includeSingle (bool, optional) – If True, includes contigs with a single marker. Default is True.

get_intervals_of_state(state, individualSubset=None, chromosomeSubset=None)[source]#

Get intervals of a specified state for given individuals and chromosomes.

Parameters:
  • state (int) – The state to find intervals for (0, 1, 2, or 3).

  • individualSubset (List[int], optional) – List of individual indices to include. If None, includes all individuals.

  • chromosomeSubset (List[int], optional) – List of chromosome indices to include. If None, includes all chromosomes.

Returns:

A list of Interval objects for the specified state across the specified individuals and chromosomes.

Return type:

List[Interval]

intervals_to_bed(outputDir)[source]#

Export intervals of each state to BED files for each chromosome and state.

Parameters:

outputDir (str) – Directory to save the BED files.

polarize(ncores=None, boolTestData=False, maxItt=500, epsilon=0.99999, sort_by_HI=False)[source]#

Polarize the state matrices by initializing test polarities and running the EM algorithm. Does not change self, but rather returns a polarized copy. Note that it will use the individual and site exclusions defined in self.

Parameters:
  • ncores (int) – number of cores to use for parallel processing. Default is None, which uses all available cores.

  • boolTestData (bool) – if True, initializes polarity using test data method. If False, initializes polarity randomly.

  • maxItt (int, optional) – Maximum number of iterations for the EM algorithm. Default is 500.

  • epsilon (float, optional) – Convergence threshold for the EM algorithm. Default is 0.99

Returns:

A new DiemType instance with polarized data.

Return type:

DiemType

smooth(scale, reSort=False, reSmooth=False, parallel=True)[source]#

Smooth and return a copy of the state matrices using a Laplace kernel . defaults to NOT resorting by hybrid index. This allows for direct comparison to pre-smoothed data. May later resort using self.sort() on resulting data.

Parameters:
  • scale (float) – Scale parameter for the Laplace kernel smoothing.

  • reSort (bool, optional) – If True, resorts individuals by hybrid index after smoothing. Default is False.

  • reSmooth (bool, optional) – If True, allows re-smoothing even if smoothing has already been done. Default is False.

  • parallel (bool, optional) – If True, uses parallel processing for smoothing. Default is True.

Returns:

A new DiemType instance with smoothed state matrices.

Return type:

DiemType

sort(newHIs=None)[source]#

Sort DMBC and individuals (and their ploidies) by hybrid index.

contigs and intervals#

class diempy.contigs.Contig(chrName=None, indName=None, intervalList=None)[source]#

Represents a contiguous sequence of genomic intervals for a specific individual and chromosome.

Parameters:
  • chrName (str) – Chromosome name.

  • indName (str) – Individual name.

  • intervalList (list) – List of Interval objects.

Variables:
  • chr (str) – Chromosome name.

  • ind (str) – Individual name.

  • num_intervals (int) – Number of intervals.

  • intervals (list) – List of Interval objects.

class diempy.contigs.Interval(chrName, indName, idxl, idxr, l, r, state)[source]#

Represents a genomic interval for a specific individual and chromosome.

Parameters:
  • chrName (str) – Chromosome name.

  • indName (str) – Individual name.

  • idxl (int) – Left index (inclusive).

  • idxr (int) – Right index (inclusive). So slice of state matrix would be [idxl:idxr+1]

  • l (float) – Left position (physical).

  • r (float) – Right position (physical).

  • state (int) – State of the interval.

Variables:
  • chrName (str) – Chromosome name.

  • indName (str) – Individual name.

  • idxl (int) – Left index (inclusive).

  • idxr (int) – Right index (inclusive). So slice of state matrix would be [idxl:idxr+1]

  • l (float) – Left position (physical).

  • r (float) – Right position (physical).

  • state (int) – State of the interval.

diempy.contigs.export_contigs_to_ind_bed_files(diemType, outputDir)[source]#

Exports contig intervals to BED files for each individual.

Parameters:
  • diemType (DiemType) – DiemType object containing contig data.

  • outputDir (str) – Directory where BED files will be saved.

plots submodule#

class diempy.plots.GenomeMultiSummaryPlot(dPol, chrom_indices, max_cols=3, *, prefill_cache=False, prefill_step=None, cache_tol=None, progress=None)[source]#

Plots genome summaries per chromosome with DI filtering and interactive widgets. These summaries include HI, HOM1, HET, HOM2, and U proportions per individual. Cursor hover displays individual IDs. Reorder button sorts individuals by global HI given current DI filter.

Drop-in extension:
  • optional incremental cache prefill using StatewiseDIIncrementalCache

  • DI slider uses cached results (nearest match within tol)

  • keeps existing hover behaviour and plot style

Parameters:
  • dPol – DiemType object containing genomic data.

  • chrom_indices – List of chromosome indices to plot.

  • max_cols – max subplot columns.

  • prefill_cache – precompute incremental cache over DI grid.

  • prefill_step – DI step for grid (defaults to span/200).

  • cache_tol – nearest-cache tolerance (defaults to prefill_step/2).

  • progress – “text” | “none”

class diempy.plots.GenomeSummaryPlot(dPol, *, prefill_cache=False, prefill_step=None, cache_tol=None, progress=None)[source]#

Plots genome summaries with DI filtering and interactive widgets.

These summaries include HI, HOM1, HET, HOM2, and U proportions per individual. Cursor hover displays individual IDs. Reorder button sorts individuals by HI given current DI filter.

Parameters:

dPol – DiemType object containing genomic data.

Drop-in extension:
  • optional cache prefill over a DI grid (text progress)

  • DI slider uses cached results (nearest match within tol)

  • PREFILL uses StatewiseDIIncrementalCache (incremental, sorted-DI sweep)

class diempy.plots.GenomicContributionsPlot(dPol, chrom_indices=None, *, prefill_cache=False, prefill_step=None, cache_tol=None, progress=None)[source]#

Plots per-chromosome genomic contributions (HOM1, HET, HOM2, U, excluded) with DI filtering and interactive widgets.

Drop-in extension:
  • optional incremental cache prefill using StatewiseDIIncrementalCache

  • DI slider uses cached statewise snapshots (nearest within tol)

  • output unchanged

Uses statewise_genomes_summary_given_DI (or cached equivalent).

Args: dPol: DiemType object containing genomic data.

class diempy.plots.GenomicDeFinettiPlot(dPol, *, prefill_cache=False, prefill_step=None, cache_tol=None, progress=None)[source]#

Plots a genomic de Finetti plot with DI filtering and interactive widgets. Cursor hover displays individual IDs.

c.f. Figure 2, Figure 4: Petružela, J., Nürnberger, B., Ribas, A., Koutsovoulos, G., Čížková, D., Fornůsková, A., Aghová, T., Blaxter, M., de Bellocq, J.G. and Baird, S.J.E. (2025), Comparative Genomic Analysis of Co-Occurring Hybrid Zones of House Mouse Parasites Pneumocystis murina and Syphacia obvelata Using Genome Polarisation. Mol Ecol, 34: e70044. https://doi.org/10.1111/mec.70044

Figure 4: Ebdon, S., Laetsch, D. R., Vila, R., Baird, S. J. E., & Lohse, K. (2025). Genomic regions of current low hybridisation mark long-term barriers to gene flow in scarce swallowtail butterflies. PLoS Genetics, 21(4), 30. doi:https://doi.org/10.1371/journal.pgen.1011655

Drop-in extension:
  • optional incremental cache prefill using StatewiseDIIncrementalCache

  • DI slider uses cached results (nearest match within tol)

  • keeps output + hover behaviour the same

Uses:
  • summaries_from_statewise_counts(statewise counts)

  • StatewiseDIIncrementalCache (fast prefill)

Parameters:

dPol – DiemType object containing genomic data.

class diempy.plots.GenomicMultiDeFinettiPlot(dPol, chrom_indices, max_cols=3, *, prefill_cache=False, prefill_step=None, cache_tol=None, progress=None)[source]#

Multiple de Finetti plots, one per chromosome, all controlled by a shared DI slider and size slider.

Uses statewise_genomes_summary_given_DI

c.f. Figure 2, Figure 4: Petružela, J., Nürnberger, B., Ribas, A., Koutsovoulos, G., Čížková, D., Fornůsková, A., Aghová, T., Blaxter, M., de Bellocq, J.G. and Baird, S.J.E. (2025), Comparative Genomic Analysis of Co-Occurring Hybrid Zones of House Mouse Parasites Pneumocystis murina and Syphacia obvelata Using Genome Polarisation. Mol Ecol, 34: e70044. https://doi.org/10.1111/mec.70044

Figure 4: Ebdon, S., Laetsch, D. R., Vila, R., Baird, S. J. E., & Lohse, K. (2025). Genomic regions of current low hybridisation mark long-term barriers to gene flow in scarce swallowtail butterflies. PLoS Genetics, 21(4), 30. doi:https://doi.org/10.1371/journal.pgen.1011655

Drop-in extension:
  • optional incremental cache prefill using StatewiseDIIncrementalCache

  • DI slider uses cached results (nearest match within tol)

  • output + hover semantics unchanged

Uses statewise_genomes_summary_given_DI + summaries_from_statewise_counts.

Parameters:
  • dPol – DiemType object containing genomic data.

  • chrom_indices – List of chromosome indices to plot.

diempy.plots.diemIrisFromPlotPrep(prepped, chrom_indices=None)[source]#

Uses per-chromosome diemDITgenomes_ordered as the canonical form.

  • If chrom_indices is None:

    flatten to whole-genome rings for WheelDiagram

  • If chrom_indices is provided:

    pass per-chromosome structure through unchanged (_restrict_chromosomes will select + pack)

diempy.plots.diemLongFromPlotPrep(prepped, chrom_indices=None)[source]#

Uses per-chromosome diemDITgenomes_ordered as the canonical form.

  • If chrom_indices is None:

    flatten to whole-genome rings for WheelDiagram

  • If chrom_indices is provided:

    pass per-chromosome structure through unchanged (_restrict_chromosomes will select + pack)

class diempy.plots.diemMultiPairsPlot(dPol, chrom_indices, DIthreshold=-inf, max_cols=3, figsize=(12, 8), prefill_cache=True, prefill_step=None, cache_tol=None, progress=None, cache_statewise_for_HI=True, row_hspace=0.6, col_wspace=0.35)[source]#

Multi-chromosome version of diemPairsPlot.

One brick heatmap per chromosome, ordered by global Hybrid Index, arranged in a grid. The top-right grid cell contains the shared colour key.

Widgets:
  • DI slider (updates matrices, keeps current order)

  • Reorder by HI button (recomputes order at current DI)

  • Label font size slider (all subplots)

Optional:
  • prefill caching (pairwise per chromosome + optional HI/statewise)

class diempy.plots.diemPairsPlot(dPol, DIthreshold=-inf, figsize=(9, 6), chrom_indices=None, prefill_cache=True, prefill_step=None, cache_tol=None, progress=None, cache_statewise_for_HI=True)[source]#
Pairwise distance plot using BRICK rectangles (no imshow), now with:
  • DI slider

  • Reorder by HI button

  • optional incremental cache prefill (pairwise + optional HI/statewise)

This is the end of the API documentation