π§ͺ Use Cases
This page showcases practical examples of how TALL can be applied to real-world text corpora. Each case follows some phases of TALL workflow β from data import and preprocessing to insight generation and interpretation β enhanced by TALL AI.
1. π¬ BBC News
Goal
Explore themes, vocabulary, and summarize information in entertainment news articles from the BBC.
π Dataset
A curated collection of 386 short news stories from the Entertainment section of BBC News (in English).
π Workflow
- Import the dataset directly from TALLβs built-in sample collections and write a brief description of data in the TALL AI box.
- Preprocess using English-specific NLP pipeline: Tokenization & PoS tagging using the right language model.
- Multi-word automatically created using the RAKE algorithm. Then, all generated multiwords are included in the dataset.
- PoS Selection, including verbs, nouns, proper nouns, adjectives, and multi-words
- Lexical Exploration visualizing vocabulary through word clouds
- Use Word in context for
"million dollar baby"
term and ask an to TALL AI an interpretation of results.
Topic Modeling
- Apply LDA (Latent Dirichlet Allocation) to detect latent topics and then ask to TALL AI a label for each topic.
Summarization, using TextRank to generate concise summaries of a document providing the most relevant sentence
2. π Bibliometrix Abstracts
Goal
Analyze the conceptual landscape of scientific literature that references the Bibliometrix R package.
π Dataset
A corpus of 444 scientific abstracts that cite Bibliometrix, enriched with metadata such as authors, publication year, and journal name. The abstracts have already been tokenized and POS tagged using tall.
π Workflow
- Import the .tall file. If the dataset has already been processed and exported from TALL, re-importing the .tall file will automatically restore the session and display a summary of all previously completed analytical steps.
- Filter the abstracts to include only article published between 2017 and 2021
- Lexical and Structural Analysis performing a Co-Ward network to detect conceptual clusters and ask to TALL AI the interpretation.
3. βοΈ US Airlines Tweets
Goal: Understand customer feedback and emotional tone in airline-related conversations on Twitter.
π Dataset
14.640 tweets mentioning major U.S. airlines, collected in February 2015. The dataset includes tweet content, airline names, and metadata such as time and location.
π Workflow
- Import the raw CSV file directly into TALL
- Preprocess the corpus using a domain-specific PoS tagging model trained on social media language
- Tag special entities such as
@mentions
,#hashtags
, and emojis for semantic enrichment
- Build an Ego Network around #fail hashtag, to identify co-occurring complaint patterns
- Perform Sentiment Analysis using the NRC Emotion Lexicon to detect emotional polarity and dominant sentiments (e.g., anger, trust, fear)
4. π§Ύ Wikipedia Pages
Goal: Discover sub-themes and semantic structures within machine learning content.
π Dataset
A collection of 15 Wikipedia pages related to machine learning, retrieved directly via TALLβs import interface.
π Workflow
- Import Wikipedia articles from the integrated TALL module about
machine learning
.
- Generate multi-word expressions using the RAKE algorithm to extract domain-relevant collocations.
- Explore lexical insights, including document and sentence length, word frequency distribution, and keyword clouds.
- Build a co-word network to visualize thematic associations, with TALL AI support for identifying latent sub-themes in machine learning discourse.
- Apply topic modeling (LDA) to extract six key topics and their representative terms, enriched by TALL AI interpretation and summary.
β¨ Your Own Use Case
TALL is flexible and scalable for use in:
- Social science research
- Digital humanities
- Public policy analysis
- Journalism and media studies
- Education and learning analytics
- Marketing and brand sentiment monitoring
Have a use case to share? Contribute on GitHub or get in touch via the About page.