Topological data analysis (TDA) relies heavily on the mature C++ libraries PHAT, Dionysus, and GUDHI. While these libraries have interfaces to Python and, through the TDA package, R, they have been developed primarily by and for statistical topologists. As TDA matures and standard workflows emerge, the need arises for more accessible and modular implementations. The SciKit-TDA project, an extension of SciPy, is underway in Python for this purpose. The tdaverse collection is intended to meet these needs in R through a tidyverse lens.
The tidyverse consists of numerous R packages that are built upon a shared set of syntactic and grammatical conventions and designed to interface naturally with each other. With its sibling collections r-lib and tidymodels, it provides a comprehensive toolkit for building advanced data analysis and modeling pipelines. The goal of tdaverse is to provide the data structures, computational engines, statistical models, and visualization tools needed to efficiently explore and analyze topological data in R and to integrate these tasks into tidyverse workflows.
Packages published on CRAN
simplextree is an R package aimed at simplifying computation for simplicial complexes. The package provides R bindings to a simplex tree data structure implemented in C++11 and exported as an Rcpp module. Instances can be created from abstract or geometric data and exported and imported via serialization, and they can be efficiently inspected, queried, modified, and traversed using both Rcpp and S3 methods. The underlying library implementation also exports a C++ header, which can be specified as a dependency and used in other packages via Rcpp attributes.
simplextree will interface with other packages for various tasks: to sample geometric complexes based on arbitrary manifolds with tdaunif, to construct and update the nerves of mappers in Mapper, and to perform computations involving simplicial complexes stored in other formats.
ripserr ports the Ripser and Cubical Ripser persistent homology computational engines from C++ via Rcpp. It can be used as a convenient and efficient tool in TDA pipelines involving point cloud data (Risper) or image and volume data (Cubical Ripser).
ripserr is designed as a minimal standalone package and will be called to compute persistence data when underlying simplicial filtrations are not needed.
Persistent homology can be used in hypothesis testing to compare the topological structure of two point clouds. TDAstats uses a permutation test in conjunction with the Wasserstein metric for nonparametric statistical inference.
TDAstats was originally designed with three goals in mind: the calculation, statistical inference, and visualization of persistent homology. Since its release, calculation has been moved to engine ports like risperr and ggplot2-style visualization has been moved to ggtda. Ongoing development of TDAstats will focus on statistical inference.
Methods for detecting topological structure from point cloud data sets are often validated by applying them to point clouds sampled from spaces with known topology. Functions that generate such samples are therefore valuable to developers of topological–statistical software. The goal of tdaunif is to assemble a comprehensive collection of such samplers for convenient use.
In addition testing TDA software, tdaunif will be used with simplextree to generate geometric random simplicial complexes and on its own as an educational tool for the study of ≥3-dimensional manifolds.
Packages in development
The ggtda package provides ggplot2 layers (statistical transformations and graphical elements) and themes for publication-quality plots of data arising from topological objects and models. Persistent homology can be computed for continuous functions and Reeb graphs as well as point clouds, and ggtda layers are in development for numerous plot types that have been proposed to gain insight from persistence data. In addition, ggtda also provides layers to conveniently plot ball covers, Vietoris–Rips complexes, and Čech complexes for 2-dimensional point clouds.
The landmark package provides functions to calculate landmark sets for finite metric spaces using the maxmin procedure (for fixed-radius balls) or an adaptation of it for rank data (for roughly fixed-cardinality nearest neighborhoods). These procedures can also return membership lists for the covers centered at these landmark sets. These covering method engines will be invoked by Mapper and other arbitrary cover–based constructions.
The Mapper package provides a set of tools for computing the mapper construction.
Previous versions of this package included the simplex tree class and the maxmin procedure, which have been spun off and expanded as the simplextree and landmark packages.
To learn more and contribute to package design or development, please visit the GitHub repositories and consider commenting on or creating an issue!
- Raoul R. Wadhwa (Cleveland Clinic Lerner College of Medicine, Case Western Reserve University)
- Matt Piekenbrock (Department of Computational Mathematics, Science, and Engineering, Michigan State University)
- Jason Cory Brunson (Laboratory for Systems Medicine, Division of Pulmonary, Critical Care, and Sleep Medicine, University of Florida)