Version 1.0.0 · Released April 2026

Text analysis,
for all of us.

An interactive R‑Shiny application that unifies import, cleaning, pre‑processing, statistical analysis and visualisation of textual data — designed so researchers without programming skills can move from raw text to interpretable results in minutes.

Install TALL Explore the workflow Read the paper

§ Abstract

A transparent, code‑free platform for turning unstructured text into structured, reproducible insight.

Analyzing unstructured textual data is essential across disciplines — yet often requires programming skills many researchers lack. TALL bridges the gap by integrating data import, cleaning, pre‑processing, statistical analysis, and visualisation into a single Shiny application.

It supports tokenisation, lemmatisation, and Part‑of‑Speech tagging across 56 languages, and offers topic modeling, correspondence and cluster analysis, co‑occurrence networks, polarity detection, word embeddings, and text summarisation — all documented, exportable, and reproducible by design.

The framework is FAIR‑compliant, open‑source under the MIT license, and available on CRAN. Performance‑critical routines are implemented in C++ via Rcpp, enabling scalable analysis of corpora spanning four orders of magnitude in size on standard research hardware.

A native integration with Google Gemini — TALL AI — produces context‑aware natural‑language explanations alongside each numerical output, while preserving strict data minimisation: raw textual content never leaves the user's environment.

#R‑Shiny #NLP #text mining #open science #FAIR

§ 2 · Modules

A complete analytical suite —
everything you need, in one place.

Fourteen integrated modules covering exploratory, inferential and interpretive methods. Each can be run independently or chained into a continuous analytical narrative; intermediate results are exportable at every step.

See all modules

Tokenisation, PoS tagging & lemmatisation

Universal‑Dependencies‑based pipelines for 56 languages, implemented via udpipe, with models cached locally for offline use.

udpipe UD v2.15 87 models

Multi‑word expression detection

Four collocation algorithms — including RAKE and Pointwise Mutual Information — for identifying salient n‑grams and candidate keyphrases. C++‑accelerated.

RAKE PMI IS‑score

Topic modeling v1.0

LDA with CTM and STM methods, automatic K selection, coherence & exclusivity diagnostics. Parallelised through future.

LDACTMSTM

Network analysis & community detection

Co‑occurrence networks with Louvain community detection, dependency‑based word graphs, and syntactic relation filters via igraph and visNetwork.

Louvaindependencyigraph

Word embeddings

Train Word2Vec representations directly on your corpus; visualise semantic neighbourhoods via UMAP.

Word2VecUMAP

Polarity & emotion detection NRC

Sentiment dashboards using Hu‑Liu, Loughran‑McDonald, and NRC Emotion Lexicon (8 emotions: anger, anticipation, disgust, fear, joy, sadness, surprise, trust).

Hu‑LiuLoughran‑McDonaldNRC EmoLex

Summarisation, SVO triplets & more

Extractive (TextRank) and abstractive summarisation, Subject‑Verb‑Object triplet extraction from dependency trees, syntactic complexity, noun phrase extraction, Reinert clustering, correspondence analysis, keyness.

TextRankSVOReinertCAkeyness

§ 3 · Release notes

What's new in version 1.0.0

The milestone release that accompanies our SoftwareX paper: seven new analytical modules, overhauled AI integration, and a documented performance story.

New

SVO triplets

Extract Subject‑Verb‑Object structures from dependency trees to reveal who does what to whom across your corpus.

New

Syntactic complexity

Dependency‑tree metrics for measuring and comparing sentence construction across documents and groups.

New

Emotion analysis

Eight‑dimensional emotion profiles via the NRC Emotion Lexicon — well beyond binary polarity.

New

Noun phrase extraction

Identify and rank noun phrases to complement single‑word frequency analysis with concept‑level views.

Updated

Topic modeling

CTM and STM methods added, plus automatic K selection and diagnostics based on coherence & exclusivity.

Updated

Network analysis

Dependency‑based word networks with filters over syntactic relations (subject, object, modifier…).

§ 4 · Comparison

Why TALL, specifically.

Commercial platforms are often proprietary and costly; open‑source alternatives are frequently limited by legacy interfaces or rigid data formats. TALL extends the open‑source ecosystem by combining the flexibility of R packages with Shiny's interactivity — every analytical step is documented, reproducible and exportable.

Feature	TALL	IRaMuTeQ	KH Coder	quanteda	AntConc
Multilingual support
Languages supported	56	~10	13	—	—
Pre‑trained models	87	None	None	None	None
PoS tagging & lemmatisation	✓	✓	✓	—	—
Analytical methods
Topic modeling (LDA/CTM/STM)	✓	—	—	Partial	—
Word embeddings	✓	—	—	—	—
Sentiment analysis	✓	—	—	—	—
Reinert clustering	✓	✓	—	—	—
AI & reproducibility
AI‑assisted interpretation	✓	—	—	—	—
FAIR‑compliant	✓	Partial	Partial	✓	—
Last major update	2026	2020	2017	2025	2024

Adapted from Aria et al. (2026), Table 1. Full comparison available in the paper.

§ 5 · Cite

Published in SoftwareX

The full description of TALL — architecture, benchmarks, illustrative examples — is published as an open‑access Elsevier SoftwareX article. Please cite it if you use TALL in your research.

Reference · 2026

M. Aria, M. Spano, L. D'Aniello, C. Cuccurullo, M. Misuraca

TALL: Text Analysis for All — an interactive R‑Shiny application for exploring, modeling, and visualizing textual data

SoftwareX, 2026 · K‑Synth & University of Naples Federico II

DOI 10.1016/j.softx.2026.102590

Read on SoftwareX CRAN GitHub Documentation

❦

Ready to analyse your first corpus?

Install from CRAN See worked examples