TALL is distributed on CRAN and GitHub. The entire installation — including the Shiny interface, analytical modules, and on‑demand language models — fits in a standard R package with MIT‑compatible dependencies.
The official release, indexed on CRAN and RDocumentation. Versioned, metadata‑rich, MIT‑licensed.
# Install TALL from CRAN
install.packages("tall")
# Launch the Shiny interface
library(tall)
tall()
Latest features not yet on CRAN. Requires build tools: Rtools on Windows, Xcode Command Line Tools on macOS.
# Install build tools check
if (!require("pak", quietly = TRUE))
install.packages("pak")
# Install development version
if (!require("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("massimoaria/tall")
library(tall); tall()
The TALL package targets modern R versions. Compatible with the latest CRAN release.
Download RA friendly IDE for R — Positron and VS Code with the R extension also work well.
Download RStudioNeeded at install time and for on‑demand language‑model downloads. Once cached, models work offline.
TALL preprocessing scales linearly with corpus size. Rcpp‑accelerated modules — multi‑word extraction, Reinert clustering, collocation measures — keep things fast at scale. Memory is the practical ceiling: reference‑vocabulary tasks like keyness analysis are the primary driver.
Benchmarks reported on Apple M4 Pro · 48 GB RAM · macOS 15 · R 4.5.2.
| Corpus size | Recommended RAM | Typical hardware | Runtime |
|---|---|---|---|
| Small < 10⁵ tokens |
1–2 GB | Standard laptop | < 10 s preprocessing |
| Medium 10⁵–10⁶ tokens |
~ 5 GB | Research workstation | ~ 1 min preprocessing |
| Large > 10⁶ tokens |
10+ GB | Server / high‑memory desktop | ~ 7 min · 5.3M tokens |
For corpora exceeding 1M tokens, institutional Shiny Server deployment is recommended — the same codebase runs unchanged.
Since its February 2025 CRAN release, TALL has been downloaded more than 4,000 times — sustained monthly growth, with rapid early adoption between April and June 2025. Currently used in postgraduate and doctoral programs at several Italian universities.
Try one of the built‑in example datasets — BBC News, US Airline Tweets — before loading your own corpus.