coresg_graphhdbscan.core.CoreSGHDBSCAN

class coresg_graphhdbscan.core.CoreSGHDBSCAN(min_samples_list, metric='euclidean', eps=1e-12, min_cluster_size=None, save_models=False)[source]

Bases: object

CoreSG-based hierarchical density clustering backend.

This class implements the lower-level CoreSG-HDBSCAN pipeline operating on feature vectors or distance representations.

Workflow

  1. Compute the full distance matrix once.

  2. Compute self-inclusive core distances for all values in min_samples_list.

  3. Build the CORE-SG graph from: - the kmax nearest-neighbor graph with ties - the MST on the complete MRD graph for kmax

  4. Precompute a sparse neighbor table for fast edge distance lookup.

  5. For each m: - compute MRD edge weights - build the sparse weighted graph - compute the MST - build the single-linkage tree - condense the tree and extract clusters

param min_samples_list:

List of min_samples values to evaluate.

type min_samples_list:

list[int]

param metric:

Distance metric mode.

type metric:

str, default=”euclidean”

param eps:

Numerical tolerance used in graph construction.

type eps:

float, default=1e-12

param min_cluster_size:

Minimum cluster size. If None, the package default behavior is used.

type min_cluster_size:

int or None, default=None

__init__(min_samples_list, metric='euclidean', eps=1e-12, min_cluster_size=None, save_models=False)
Parameters:
Return type:

None

Methods

__init__(min_samples_list[, metric, eps, ...])

fit(X)

fit_from_distance_matrix(D)

Build CORE-SG from a precomputed distance matrix D (NxN).

model(min_samples)

plot_condensed_tree(m[, figsize])

run([cluster_selection_method, ...])

Run Core-SG clustering for all requested min_samples values.

Attributes

min_samples_list: List[int]
metric: str = 'euclidean'
eps: float = 1e-12
min_cluster_size: int | None = None
save_models: bool = False
X_: numpy.ndarray | None = None
N_: int | None = None
D_: numpy.ndarray | None = None
core_: Dict[int, numpy.ndarray]
kmax_: int | None = None
edges_ut_: numpy.ndarray | None = None
idx_with_self_: numpy.ndarray | None = None
dst_with_self_: numpy.ndarray | None = None
idx_no_self_: numpy.ndarray | None = None
dst_no_self_: numpy.ndarray | None = None
A_knn_: scipy.sparse.csr_matrix | None = None
msts_: Dict[int, Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]]
mst_times_: Dict[int, float]
models_: Dict[int, CoreSGModel]
condensed_trees_: Dict[int, object]
labels_by_m_: Dict[int, numpy.ndarray]
times_: Dict[int, float]
fit(X)[source]
Parameters:

X (numpy.ndarray)

Return type:

CoreSGHDBSCAN

fit_from_distance_matrix(D)[source]

Build CORE-SG from a precomputed distance matrix D (NxN).

  • D[i,j] is the base dissimilarity between points i and j.

  • We compute self-inclusive core distances and kmax-NNG from D.

  • We build CORE-SG edges via kmax-NNG ∪ MST_kmax (on MRD_kmax).

After this, you can call self.run(…) exactly as usual.

Parameters:

D (numpy.ndarray)

Return type:

CoreSGHDBSCAN

model(min_samples)[source]
run(cluster_selection_method='eom', allow_single_cluster=False, match_reference_implementation=False, cluster_selection_epsilon=0.0)[source]

Run Core-SG clustering for all requested min_samples values.

Stores

models_dict

Saved per-m models when save_models=True.

condensed_trees_dict

Condensed tree objects for all fitted m values.

labels_by_m_dict

Stored labels for all fitted m values.

Parameters:
  • cluster_selection_method (str)

  • allow_single_cluster (bool)

  • match_reference_implementation (bool)

  • cluster_selection_epsilon (float)

Return type:

CoreSGHDBSCAN

plot_condensed_tree(m, figsize=(8, 5))[source]
Parameters:

m (int)

Parameters: