coresg_graphhdbscan.core.CoreSGHDBSCAN¶
- class coresg_graphhdbscan.core.CoreSGHDBSCAN(min_samples_list, metric='euclidean', eps=1e-12, min_cluster_size=None, save_models=False)[source]¶
Bases:
objectCoreSG-based hierarchical density clustering backend.
This class implements the lower-level CoreSG-HDBSCAN pipeline operating on feature vectors or distance representations.
Workflow¶
Compute the full distance matrix once.
Compute self-inclusive core distances for all values in
min_samples_list.Build the CORE-SG graph from: - the kmax nearest-neighbor graph with ties - the MST on the complete MRD graph for kmax
Precompute a sparse neighbor table for fast edge distance lookup.
For each
m: - compute MRD edge weights - build the sparse weighted graph - compute the MST - build the single-linkage tree - condense the tree and extract clusters
- param min_samples_list:
List of
min_samplesvalues to evaluate.- type min_samples_list:
list[int]
- param metric:
Distance metric mode.
- type metric:
str, default=”euclidean”
- param eps:
Numerical tolerance used in graph construction.
- type eps:
float, default=1e-12
- param min_cluster_size:
Minimum cluster size. If
None, the package default behavior is used.- type min_cluster_size:
int or None, default=None
- __init__(min_samples_list, metric='euclidean', eps=1e-12, min_cluster_size=None, save_models=False)¶
Methods
__init__(min_samples_list[, metric, eps, ...])fit(X)Build CORE-SG from a precomputed distance matrix D (NxN).
model(min_samples)plot_condensed_tree(m[, figsize])run([cluster_selection_method, ...])Run Core-SG clustering for all requested
min_samplesvalues.Attributes
- X_: numpy.ndarray | None = None¶
- D_: numpy.ndarray | None = None¶
- core_: Dict[int, numpy.ndarray]¶
- edges_ut_: numpy.ndarray | None = None¶
- idx_with_self_: numpy.ndarray | None = None¶
- dst_with_self_: numpy.ndarray | None = None¶
- idx_no_self_: numpy.ndarray | None = None¶
- dst_no_self_: numpy.ndarray | None = None¶
- A_knn_: scipy.sparse.csr_matrix | None = None¶
- msts_: Dict[int, Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]]¶
- models_: Dict[int, CoreSGModel]¶
- labels_by_m_: Dict[int, numpy.ndarray]¶
- fit(X)[source]¶
- Parameters:
X (numpy.ndarray)
- Return type:
- fit_from_distance_matrix(D)[source]¶
Build CORE-SG from a precomputed distance matrix D (NxN).
D[i,j] is the base dissimilarity between points i and j.
We compute self-inclusive core distances and kmax-NNG from D.
We build CORE-SG edges via kmax-NNG ∪ MST_kmax (on MRD_kmax).
After this, you can call self.run(…) exactly as usual.
- Parameters:
D (numpy.ndarray)
- Return type:
- run(cluster_selection_method='eom', allow_single_cluster=False, match_reference_implementation=False, cluster_selection_epsilon=0.0)[source]¶
Run Core-SG clustering for all requested
min_samplesvalues.Stores¶
- models_dict
Saved per-
mmodels whensave_models=True.- condensed_trees_dict
Condensed tree objects for all fitted
mvalues.- labels_by_m_dict
Stored labels for all fitted
mvalues.
- Parameters:
- Return type: