Use cases

Four corpora.
One workflow.

Four worked examples drawn from the paper and supplementary materials — thematic analysis of news, sentiment detection on social media, science mapping of academic literature, and topic modeling of encyclopedic text. All executed end‑to‑end in TALL, without writing code.

Case · 01

Thematic analysis on BBC News.

A collection of 386 short news articles from the BBC News corpus, retrieved from the BBC website (Greene & Cunningham, 2005). After loading, TALL applies standard pre‑processing — tokenisation and lemmatisation via the EN GUM model — retaining adjectives, nouns, proper nouns, and verbs.

English GUM model 386 docs Louvain
Community detection on BBC News corpus
Figure · Network Five dense sub‑graphs from community detection on the entertainment sub‑corpus: Awards, Music, Industry, Media, General News.
Step 1 · Import

Built‑in BBC News dataset loaded with TALL AI context automatically populated.

BBC News import
Step 2 · Tokenise & PoS

EN GUM model selected. Adjectives, nouns, proper nouns, verbs retained.

Tokenisation
Step 3 · Multi‑word

RAKE extracts salient collocations; manual curation preserves domain phrases.

Multi-words
Step 4 · Overview

Descriptive statistics, lexical richness, and TF‑IDF for discriminating terms.

Word cloud
Step 5 · Words in Context

Concordance view surfaces co‑text environments for salient terms.

Words in context
Step 6 · Topic modeling

LDA recovers thematic structure across the full news collection.

Topic modeling
TALL AI · Interpretation

"The entertainment news focuses heavily on film awards and the movie industry. Music‑related news forms a significant portion of the content — organised around performance, industry and sales."

Polarity analysis on US Airline Tweets
Figure · Polarity Top word distributions by polarity category. Negative tweets cluster around service disruptions (delay, cancel, miss) while positive tweets express gratitude (thank, appreciate).
Case · 02

Sentiment on
airline tweets.

14,640 tweets directed at six major US airlines during February 2015 (Figure Eight / Kaggle). TALL detects platform‑specific entities — hashtags, mentions, emojis — and runs polarity detection via the Hu‑Liu opinion lexicon.

127
unique emojis
1.9k
hashtags
930
mentions
28%
negative sentiment
Step 1 · Import

Built‑in corpus loaded; TALL AI receives pre‑populated dataset context.

Import
Step 2 · EWT tokenisation

EN EWT model handles social‑media text. Hashtags, mentions, emojis retained as entities.

Tokenisation
Step 3 · Special entities

Automatic tagging of hashtags, mentions, emojis with frequency ranking.

Special entities
Step 4 · Words in Context

Concordance for @AmericanAirlines reveals customer‑service discourse patterns.

Words in context
Step 5 · Polarity

Hu‑Liu lexicon · donut chart with five levels from Very Negative to Very Positive.

Polarity
Step 6 · TALL AI

Automated key takeaways & managerial implications.

AI polarity interpretation
TALL AI · Key takeaways

"The dominance of delay and cancel in negative tweets suggests airlines should focus on improving their processes for handling flight disruptions. Both positive and negative tweets highlight the importance of customer service — the presence of gratitude‑related terms in positive sentiment indicates that effective customer service responses can mitigate dissatisfaction and generate positive outcomes."

Case · 03

Science mapping from abstracts.

TALL natively ingests biblioshiny files from the bibliometrix package. Scientific abstracts flow into the same pre‑processing pipeline — multilingual tokenisation, lemmatisation, co‑word analysis — and can be layered with citation‑based metrics for hybrid semantic/bibliometric analysis.

A natural fit for systematic literature reviews, conceptual‑structure analyses, and policy‑oriented research synthesis.

biblioshiny import co‑word analysis citation filters
Bibliometrix import
Bibliometrix filter
Bibliometrix co-word analysis
Bibliometrix AI interpretation

Figure · Bibliometrix Biblioshiny ingestion, corpus filtering, co‑word analysis, and TALL AI interpretation of the emerging conceptual structure.

Wikipedia import
Wikipedia overview
Wikipedia multiwords
Wikipedia co-word
Wikipedia topic modeling

Figure · Wikipedia An encyclopedic corpus processed through import, overview, multi‑word extraction, co‑word analysis, and topic modeling — ending with TALL AI's topic interpretation.

Case · 04

Topic discovery on Wikipedia.

An encyclopedic corpus pushes TALL through the full analytical narrative: from basic frequency and TF‑IDF to multi‑word extraction, co‑word networks, and finally LDA topic modeling with TALL AI producing concise narrative labels for each topic.

Ideal for digital humanities, educational‑content audits, and cross‑linguistic cultural analysis given TALL's 56‑language coverage.

topic modeling co‑occurrence multi‑word AI topic labels
§ · Where TALL is used

Across disciplines where text is evidence.

Social & political sciences

Large‑scale thematic and sentiment analyses of institutional reports, parliamentary proceedings, and online debates.

Digital humanities

Multilingual architecture for cross‑linguistic corpus studies and computational cultural analysis across 56 languages.

Bibliometrics

Integration with bibliometrix combines semantic and citation‑based analyses for science mapping and literature reviews.

Healthcare research

Analyses of patient feedback, clinical records and satisfaction surveys (Valentino et al., 2025).

Education

Currently used in postgraduate and doctoral programs at several Italian universities for teaching data‑driven text methods.

Industry & market research

Sentiment tracking and theme discovery across customer feedback, reviews, and social media at scale.

Your corpus is next.