GraphHDBSCAN* documentationΒΆ
GraphHDBSCAN* is a graph-based, density-based hierarchical clustering method that can effectively work on very high-dimensional data. It builds on the well-known HDBSCAN* algorithm by operating on a sparse graph representation of the data.
A key advantage is its ability to recover interpretable density-based hierarchies that support visualization without requiring dimensionality reduction techniques like UMAP, t-SNE, or PCA.
GraphHDBSCAN* can also produce high-quality flat partitions with clusters of arbitrary shapes and varying densities, while detecting noise and outliers. It includes an optional label-propagation approach to assign cluster labels to noise points.
It leverages the theoretical CORE-SG graph sparsification machinery, enabling efficient simultaneous computation of multiple hierarchies for exploration, effectively eliminating the need for manual hyperparameter tuning of the clustering algorithm itself.
Contents
- Overview
- Installation and quick start
- Parameter selection
- Usage
- Basic workflow
- Minimal example
- fit vs. fit_predict
- Single
min_samplesvalue - Multiple
min_samplesvalues in one run - Inspecting the hierarchy
- Choosing a graph-construction backend
- Using different distance metrics
- Using precomputed graphs
- Connectivity handling
- Noise reassignment
- Useful outputs after fitting
- Common usage patterns
- Working with fitted results
- Practical notes
- Related pages
- API reference
- Examples
- References
- Third-party notices