Version 1.0.0 · Released April 2026

Text analysis,
for all of us.

An interactive R‑Shiny application that unifies import, cleaning, pre‑processing, statistical analysis and visualisation of textual data — designed so researchers without programming skills can move from raw text to interpretable results in minutes.

§ Abstract

A transparent, code‑free platform for turning unstructured text into structured, reproducible insight.

Analyzing unstructured textual data is essential across disciplines — yet often requires programming skills many researchers lack. TALL bridges the gap by integrating data import, cleaning, pre‑processing, statistical analysis, and visualisation into a single Shiny application.

It supports tokenisation, lemmatisation, and Part‑of‑Speech tagging across 56 languages, and offers topic modeling, correspondence and cluster analysis, co‑occurrence networks, polarity detection, word embeddings, and text summarisation — all documented, exportable, and reproducible by design.

The framework is FAIR‑compliant, open‑source under the MIT license, and available on CRAN. Performance‑critical routines are implemented in C++ via Rcpp, enabling scalable analysis of corpora spanning four orders of magnitude in size on standard research hardware.

A native integration with Google Gemini — TALL AI — produces context‑aware natural‑language explanations alongside each numerical output, while preserving strict data minimisation: raw textual content never leaves the user's environment.

#R‑Shiny #NLP #text mining #open science #FAIR
§ 1 · The workflow

From raw text to interpretable findings in one continuous pipeline.

Full feature tour
TALL end-to-end workflow from data import through pre-processing to analysis and visualization
Figure 1 TALL's modular architecture organises a complete text‑analytics pipeline into three coherent layers — Data Import & Management, Pre‑processing, and Analysis — connected by a shared data schema and an AI‑assisted interpretation layer.
§ 2 · Modules

A complete analytical suite —
everything you need, in one place.

Fourteen integrated modules covering exploratory, inferential and interpretive methods. Each can be run independently or chained into a continuous analytical narrative; intermediate results are exportable at every step.

See all modules
01

Tokenisation, PoS tagging & lemmatisation

Universal‑Dependencies‑based pipelines for 56 languages, implemented via udpipe, with models cached locally for offline use.

udpipe UD v2.15 87 models
02

Multi‑word expression detection

Four collocation algorithms — including RAKE and Pointwise Mutual Information — for identifying salient n‑grams and candidate keyphrases. C++‑accelerated.

RAKE PMI IS‑score
03

Topic modeling v1.0

LDA with CTM and STM methods, automatic K selection, coherence & exclusivity diagnostics. Parallelised through future.

LDACTMSTM
04

Network analysis & community detection

Co‑occurrence networks with Louvain community detection, dependency‑based word graphs, and syntactic relation filters via igraph and visNetwork.

Louvaindependencyigraph
05

Word embeddings

Train Word2Vec representations directly on your corpus; visualise semantic neighbourhoods via UMAP.

Word2VecUMAP
06

Polarity & emotion detection NRC

Sentiment dashboards using Hu‑Liu, Loughran‑McDonald, and NRC Emotion Lexicon (8 emotions: anger, anticipation, disgust, fear, joy, sadness, surprise, trust).

Hu‑LiuLoughran‑McDonaldNRC EmoLex
07

Summarisation, SVO triplets & more

Extractive (TextRank) and abstractive summarisation, Subject‑Verb‑Object triplet extraction from dependency trees, syntactic complexity, noun phrase extraction, Reinert clustering, correspondence analysis, keyness.

TextRankSVOReinertCAkeyness
§ 3 · Release notes

What's new in version 1.0.0

The milestone release that accompanies our SoftwareX paper: seven new analytical modules, overhauled AI integration, and a documented performance story.

New

SVO triplets

Extract Subject‑Verb‑Object structures from dependency trees to reveal who does what to whom across your corpus.

New

Syntactic complexity

Dependency‑tree metrics for measuring and comparing sentence construction across documents and groups.

New

Emotion analysis

Eight‑dimensional emotion profiles via the NRC Emotion Lexicon — well beyond binary polarity.

New

Noun phrase extraction

Identify and rank noun phrases to complement single‑word frequency analysis with concept‑level views.

Updated

Topic modeling

CTM and STM methods added, plus automatic K selection and diagnostics based on coherence & exclusivity.

Updated

Network analysis

Dependency‑based word networks with filters over syntactic relations (subject, object, modifier…).

§ 4 · Comparison

Why TALL, specifically.

Commercial platforms are often proprietary and costly; open‑source alternatives are frequently limited by legacy interfaces or rigid data formats. TALL extends the open‑source ecosystem by combining the flexibility of R packages with Shiny's interactivity — every analytical step is documented, reproducible and exportable.

Feature TALL IRaMuTeQ KH Coder quanteda AntConc
Multilingual support
Languages supported 56 ~10 13
Pre‑trained models 87 None None None None
PoS tagging & lemmatisation
Analytical methods
Topic modeling (LDA/CTM/STM) Partial
Word embeddings
Sentiment analysis
Reinert clustering
AI & reproducibility
AI‑assisted interpretation
FAIR‑compliant Partial Partial
Last major update 2026 2020 2017 2025 2024

Adapted from Aria et al. (2026), Table 1. Full comparison available in the paper.

§ 5 · Cite

Published in SoftwareX

The full description of TALL — architecture, benchmarks, illustrative examples — is published as an open‑access Elsevier SoftwareX article. Please cite it if you use TALL in your research.

Reference · 2026
M. Aria, M. Spano, L. D'Aniello, C. Cuccurullo, M. Misuraca
TALL: Text Analysis for All — an interactive R‑Shiny application for exploring, modeling, and visualizing textual data
SoftwareX, 2026 · K‑Synth & University of Naples Federico II

Ready to analyse your first corpus?