coresg_graphhdbscan.core.CoreSGHDBSCAN¶

class coresg_graphhdbscan.core.CoreSGHDBSCAN(min_samples_list, metric='euclidean', eps=1e-12, min_cluster_size=None, save_models=False)[source]¶

Bases: object

CoreSG-based hierarchical density clustering backend.

This class implements the lower-level CoreSG-HDBSCAN pipeline operating on feature vectors or distance representations.

Workflow¶

Compute the full distance matrix once.
Compute self-inclusive core distances for all values in min_samples_list.
Build the CORE-SG graph from: - the kmax nearest-neighbor graph with ties - the MST on the complete MRD graph for kmax
Precompute a sparse neighbor table for fast edge distance lookup.
For each m: - compute MRD edge weights - build the sparse weighted graph - compute the MST - build the single-linkage tree - condense the tree and extract clusters

param min_samples_list:: List of min_samples values to evaluate.
type min_samples_list:: list[int]
param metric:: Distance metric mode.
type metric:: str, default=”euclidean”
param eps:: Numerical tolerance used in graph construction.
type eps:: float, default=1e-12
param min_cluster_size:: Minimum cluster size. If None, the package default behavior is used.
type min_cluster_size:: int or None, default=None

__init__(min_samples_list, metric='euclidean', eps=1e-12, min_cluster_size=None, save_models=False)¶

Parameters:

min_samples_list (List[int])
metric (str)
eps (float)
min_cluster_size (int | None)
save_models (bool)

Return type:

None

Methods

`__init__`(min_samples_list[, metric, eps, ...])
`fit`(X)
`fit_from_distance_matrix`(D)	Build CORE-SG from a precomputed distance matrix D (NxN).
`model`(min_samples)
`plot_condensed_tree`(m[, figsize])
`run`([cluster_selection_method, ...])	Run Core-SG clustering for all requested `min_samples` values.

Attributes

`A_knn_`
`D_`
`N_`
`X_`
`dst_no_self_`
`dst_with_self_`
`edges_ut_`
`eps`
`idx_no_self_`
`idx_with_self_`
`kmax_`
`metric`
`min_cluster_size`
`save_models`
`min_samples_list`
`core_`
`msts_`
`mst_times_`
`models_`
`condensed_trees_`
`labels_by_m_`
`times_`

min_samples_list: List[int]¶

metric: str = 'euclidean'¶

eps: float = 1e-12¶

min_cluster_size: int | None = None¶

save_models: bool = False¶

X_: numpy.ndarray | None = None¶

N_: int | None = None¶

D_: numpy.ndarray | None = None¶

core_: Dict[int, numpy.ndarray]¶

kmax_: int | None = None¶

edges_ut_: numpy.ndarray | None = None¶

idx_with_self_: numpy.ndarray | None = None¶

dst_with_self_: numpy.ndarray | None = None¶

idx_no_self_: numpy.ndarray | None = None¶

dst_no_self_: numpy.ndarray | None = None¶

A_knn_: scipy.sparse.csr_matrix | None = None¶

msts_: Dict[int, Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]]¶

mst_times_: Dict[int, float]¶

models_: Dict[int, CoreSGModel]¶

condensed_trees_: Dict[int, object]¶

labels_by_m_: Dict[int, numpy.ndarray]¶

times_: Dict[int, float]¶

fit(X)[source]¶

Parameters:: X (numpy.ndarray)
Return type:: CoreSGHDBSCAN

fit_from_distance_matrix(D)[source]¶

Build CORE-SG from a precomputed distance matrix D (NxN).

D[i,j] is the base dissimilarity between points i and j.
We compute self-inclusive core distances and kmax-NNG from D.
We build CORE-SG edges via kmax-NNG ∪ MST_kmax (on MRD_kmax).

After this, you can call self.run(…) exactly as usual.

Parameters:: D (numpy.ndarray)
Return type:: CoreSGHDBSCAN

model(min_samples)[source]¶

run(cluster_selection_method='eom', allow_single_cluster=False, match_reference_implementation=False, cluster_selection_epsilon=0.0)[source]¶

Run Core-SG clustering for all requested min_samples values.

Stores¶

models_dict: Saved per-m models when save_models=True.
condensed_trees_dict: Condensed tree objects for all fitted m values.
labels_by_m_dict: Stored labels for all fitted m values.

Parameters:

cluster_selection_method (str)
allow_single_cluster (bool)
match_reference_implementation (bool)
cluster_selection_epsilon (float)

Return type:

CoreSGHDBSCAN

plot_condensed_tree(m, figsize=(8, 5))[source]¶

Parameters:: m (int)

Parameters:

min_samples_list (List[int])
metric (str)
eps (float)
min_cluster_size (int | None)
save_models (bool)