Data Description
Overview
The table below provides an overview of the scientific production retrieved for Italian Academic Health Science Centres (AHSCs) from the OpenAlex and Dimensions databases. It summarizes the number and percentage of outputs across five categories: publications, clinical trials, grants, patents, and datasets, grouped by AHSC type.
AHSC | OpenAlex | Dimensions | ||||
---|---|---|---|---|---|---|
Publications | Altmetric | Patents | Clinical Trials | Grants | Datasets | |
N (%) | N (%) | N (%) | N (%) | N (%) | N (%) | |
AOU | 60819 (26.54) | 17954 (24.26) | 109 (13.64) | 6446 (40.85) | 116 (13) | 2887 (28.44) |
AOU_SSN | 30151 (13.16) | 8895 (12.02) | 0 | 1584 (10.04) | 18 (2.02) | 363 (3.58) |
IRCCS | 138179 (60.30) | 47156 (63.72) | 690 (86.36) | 7749 (49.11) | 758 (84.98) | 6900 (67.98) |
Total | 229149 | 74005 | 799 | 15779 | 892 | 10150 |
This section provides detailed metadata documentation for the variables used in the SciK-Health project datasets. Each block corresponds to a thematic group of indicators derived from OpenAlex and other integrated sources. The indicators are designed to support reproducible and comparative research on Italian AHSCs. They are measured by AHC
(the short name of the hospital/university institution) and PY
(Publication Year).
📊 Bibliometric Dimension
These indicators describe the scientific production volume, authorship patterns, and collaboration dynamics of AHSCs.
They are primarily computed using publication metadata from OpenAlex and enriched with journal quality metrics from SCOPUS.
🔹 Productivity
Variable | Description |
---|---|
nTotDoc |
Total number of documents/articles published in the year |
meanAUPerArt |
Average number of authors per article |
meanAU_IN_PerArt |
Average number of internal authors (AU_IN ) per article |
meanArtAU_IN |
Average number of articles per internal author |
meanArtPer_AU_IN_FIRST |
Average number of articles per internal author as first author |
meanArtPer_AU_IN_LAST |
Average number of articles per internal author as last author |
meanArtPer_AU_IN_CORR |
Average number of articles per internal author as corresponding author |
ArtOA |
Percentage of Open Access articles |
singleAuArt |
Number of single-author articles |
🔹 Impact
Variable | Description |
---|---|
Q1_nSJR |
Number of articles published in Q1 journals (quartile 1) according to SJR* |
Q2_nSJR |
Number of articles published in Q2 journals according to SJR |
Q3_nSJR |
Number of articles published in Q3 journals according to SJR |
Q4_nSJR |
Number of articles published in Q4 journals according to SJR |
Q1_percSJR |
Percentage of articles published in Q1 journals |
Q2_percSJR |
Percentage of articles published in Q2 journals |
Q3_percSJR |
Percentage of articles published in Q3 journals |
Q4_percSJR |
Percentage of articles published in Q4 journals |
meanTCperArt |
Average number of citations per article |
mean_NTC |
Average Normalized Total Citations (NTC) per institution and year** |
meanCitationTrend |
Average citation trend over time*** |
* Indicator based on data from SCOPUS. Journals assigned to multiple subject categories were selected based on the best percentile.
** Measures the average citation performance of an institution’s publications in a given year, normalized to the global average citation rate for that year. Enables cross-institution comparison.
*** Citation trend is a metric assigned to articles by OpenAlex starting from 2012.
🔹 Collaboration
Variable | Description |
---|---|
InternCoauthorship |
Percentage of articles with international co-authors |
AffCoauthorship |
Percentage of articles with co-authors affiliated with different institutions |
🔹 Talent
These indicators focus on the concentration and distribution of scientific contributions across internal and external authors.
They provide insights into the presence of high-performing individuals, institutional equity, and authorship leadership roles.
Variable | Description |
---|---|
GC_AUIN_wFractional |
Gini concentration index (fractional) on internal authors* |
GC_AUIN_FIRSTname |
Gini concentration index on internal authors in first position (first name)** |
GC_AUIN_LASTname |
Gini concentration index on internal authors in last position** |
GC_AUIN_CORRESP |
Gini concentration index on internal corresponding authors** |
top_AU |
Most frequent authors (internal and external) (names, _freq : frequencies) |
top_AU_wFract |
Most frequent authors (internal and external) by fractional frequency |
top_AU_freq_wFract |
Fractional frequencies associated with each author in top_AU_wFract |
top_AUIN |
Most frequent internal authors (names, _freq : frequencies) |
top_AUIN_wFract |
Most frequent internal authors by fractional frequency |
top_AUIN_freq_wFract |
Fractional frequencies associated with each internal author in top_AUIN_wFract |
topAUbyCit |
Most cited (internal and external) authors (names; _TC total citations value) |
topAUINbyCit |
Most cited internal authors (names; _TC total citations value) |
topAUINbyCit_norm_byYear |
Most cited internal authors by normalized citations per year*** |
topAUINbyCit_TC_norm_byYear |
Normalized citations associated with most cited internal authors in topAUINbyCit_norm_byYear |
* The weight assigned to each author is fractional (considering both publication count and co-authorship).
** Gini concentration computed using the number of published articles per author as weight.
*** Top cited authors, where citation counts are normalized by dividing each article’s citation count by the average citations of all articles published in the same year.
🔹 Topics
This group of indicators summarizes the conceptual and thematic content of the scientific output produced by AHSCs.
It includes hierarchical concept levels (from general disciplines to specific biomedical entities), MeSH terms, authors’ keywords, and text-mined bigrams from titles and abstracts.
Variable | Description |
---|---|
topConcept_L0 |
Most frequent level 0 concepts (general disciplines, e.g., MEDICINE, BIOLOGY) |
topFreq_L0 |
Frequencies associated with top level 0 concepts |
topConcept_L1 |
Most frequent level 1 concepts (subdisciplines or specializations, e.g., CARDIOLOGY, SURGERY) |
topFreq_L1 |
Frequencies associated with top level 1 concepts |
topConcept_L2 |
Most frequent level 2 concepts (more specific topics) |
topFreq_L2 |
Frequencies associated with top level 2 concepts |
topConcept_L3 |
Most frequent level 3 concepts (more detailed, e.g., related to diseases or methods) |
topFreq_L3 |
Frequencies associated with top level 3 concepts |
topConcept_L4 |
Most frequent level 4 concepts (highest specificity, e.g., biological entities, anatomical structures) |
topFreq_L4 |
Frequencies associated with top level 4 concepts |
topMESH |
Most frequent MeSH terms associated with the articles (names; _n frequency) |
topprimary_topic.display_name |
Most frequent primary topics (names; _n frequency) |
topDE |
Most frequent authors’ keywords (names; _n frequency) |
topTI_TM |
Most frequent bigrams extracted from titles (names; _n frequency) |
topAB_TM |
Most frequent bigrams extracted from abstracts (names; _n frequency) |
💡 Industrial Impact & Research Innovation
This section includes a description of the key indicators measured on patents collected from the Dimensions database. These variables describe yearly statistics on patent status, classification, inventorship, and jurisdiction, along with metadata from original documentation.
Variable | Description |
---|---|
n_patentsDistinct |
Number of distinct patents (unique Family ID) per year. |
nPatentStatus_ |
Number of patents by legal status relates to information on the events during the lifetime of a patent application (e.g., Abandoned, Pending, Expired). |
nPatentPubType_Application |
Number of patents of type ‘Application’. |
nPatentPubType_Grant |
Number of patents of type ‘Grant’. |
nPatentPubType_N/A |
Number of patents with unspecified publication type. |
avg_citations_Patent |
Average number of citations received by patents per year. |
avg_Inventors_Patent |
Average number of inventors per patent per year. |
inventor_per_AUIN_100 |
Average number of inventors listed in patents per 100 internal authors (AUIN). |
avg_InstitutesInvolved_Patent |
Average number of institutes involved per patent per year. |
nPatent_singleInventor |
Number of patents with a single inventor. |
n_Patents_CoAff |
Number of patents with institutional co-affiliations (institutions involved > 1). |
n_Jurisdiction_distinct |
Number of distinct jurisdictions where patents were granted. |
nJurisd_ |
Number of patents granted in one of the Jurisdictions (AU = Australia, US = United States, CH = Switzerland, etc.). |
nPatentKC_ |
Number of patents by Kind Code (a standardized alphanumeric code used to classify and categorize patents based on their type and purpose) from DOCDB. |
top_Inventor names_Patent |
Most frequent inventors of the patent (names; _n frequency). |
top_Original assignee names_Patent |
Most frequent organisations that first owned the patent, as they appear in the original document (names; _n frequency). |
top_Fields of Research (ANZSRC 2020)_Patent |
Most frequent Fields of Research covers all areas of research and emerging areas of study from the Australian and New Zealand Standard Research Classification (ANZSRC) (names; _n frequency). |
top_RCDC Categories_Patent |
Most frequent Research, Condition, and Disease Categorization (RCDC) categories used by the NIH (names; _n frequency). |
top_HRCS RAC Categories_Patent |
Most frequent Health Research Classification System (HRCS) subdivided into the Research Activity Classifications (RAC) (names; _n frequency). |
top_IPCR_Patent |
Most frequent ICRP Cancer Types (names; _n frequency). |
top_CPC_Patent |
Most frequent Cooperative Patent Classification Categorization (names; _n frequency). |
🧪 Clinical Research Activity
This section provides indicators derived from clinical trials and dataset data. These indicators support analysis of research activity using data exported from Dimensions.
🔹 Clinical Trials
Variable | Description |
---|---|
n_clTrial |
Number of clinical trials started in the given year. |
avg_durationYear_clTrial |
Average duration of clinical trials (in years). |
avg_participants_clTrial |
Average number of participants per trial. |
clTrial_perAUIN_100 |
Average number of clinical trials per 100 internal authors (AUIN) |
nClTrial_w_AltmScore |
Number of trials with an Altmetric score. |
cl_AltmS_rel |
Percentage of clinical trials with an Altmetric score among all clinical trials started in the year. |
avg_sponsor_clTrial |
Average number of sponsors/collaborators per trial. |
nclTrialPhase_ |
Number of clinical trials by Phase (e.g., Phase 1, Phase 1/2 combined). |
nclTrialPhase_N/A |
Trials with no phase declared. N/A (not applicable) typically indicates a single-arm trial. |
nclTrialPhase_Post Authorisation Studies |
Number of post-authorisation studies. |
nclTrial_StudyType_ |
Number of clinical trials by study type (e.g., interventional, observational). |
nclTrial_Gender_ |
Number of clinical trials by gender subjects included (All, Male, Female). |
nclTrial_Registry_ClinicalTrials.gov |
Number of trials registered in the ClinicalTrials.gov registry. |
top_funder_country |
Most frequent funder countries (names, _n : frequency). |
top_Fields of Research (ANZSRC 2020)_clTrial |
Most frequent fields of research (names, _n : frequency). |
top_RCDC Categories_clTrial |
Most frequent RCDC categories (names, _n : frequency). |
top_HRCS HC Categories_clTrial |
Most frequent HRCS health categories (names, _n : frequency). |
top_HRCS RAC Categories_clTrial |
Most frequent HRCS research activity categories (names, _n : frequency). |
top_Cancer Types_clTrial |
Most frequent cancer types (names, _n : frequency). |
top_CSO Categories_clTrial |
Most frequent CSO categories (names, _n : frequency). |
🔹 Datasets
Variable | Description |
---|---|
n_dataset |
Number of of free-available datasets published in the year. |
top_Repository_Dataset |
Most frequent dataset repositories (names, _n : frequency) |
top_Fields of Research (ANZSRC 2020)_Dataset |
Most frequent fields of research (names, _n : frequency). |
top_RCDC Categories_Dataset |
Most frequent RCDC categories (names, _n : frequency). |
top_HRCS HC Categories_Dataset |
Most frequent HRCS health categories (names, _n : frequency). |
top_HRCS RAC Categories_Dataset |
Most frequent HRCS research activity categories (names, _n : frequency). |
top_Health Research Areas_Dataset |
Most frequent health research areas (names, _n : frequency) |
top_Broad Research Areas_Dataset |
Most frequent broad research areas (names, _n : frequency) |
top_Cancer Types_Dataset |
Most frequent cancer types addressed in datasets (names, _n : frequency) |
top_Sustainable Development Goals_Dataset |
Most frequently addressed Sustainable Development Goals (names, _n : frequencies) |
💶 Funding and Grants
This section presents indicators related to competitive funding obtained by Italian AHSCs, covering financial aspects, international collaborations, outputs, and key actors involved in funded research projects.
Variable | Description |
---|---|
n_grants |
Number of grants. |
avg_funding_EUR |
Average total funding per grant (in EUR). |
avg_funding_EUR_perOrganization |
Average funding per research organization involved in a grant (in EUR). |
avg_durationYear_grant |
Average duration of grants (in years). |
avg_Organization_per_grant |
Average number of research organizations involved per grant. |
avg_countries_per_grant |
Average number of distinct countries involved per grant. |
perc_grant_internationalColl |
Percentage of grants with international collaboration. |
avg_ResultingPublications_per_grant |
Average number of resulting publications per grant. |
totFundingOrgbyActivedGrant |
Total amount of funding (in EUR) available to the institution in a given year, based on grants active in that year. |
avgPubByActivedFundingGrant_1000 |
Number of publications produced per 1000 EUR of funding available to the institution in a given year, based on grants active in that year. |
top_Research Organization - standardized_Grants |
Most frequent research organizations (names; _n frequency). |
top_Funder_Grants |
Most frequent funding bodies (names; _n frequency). |
top_Funder Group_Grants |
Most frequent funder groups (names; _n frequency). |
top_Program_Grants |
Most frequent funding programs (names; _n frequency). |
top_Fields of Research (ANZSRC 2020)_Grants |
Most frequent research fields based on ANZSRC 2020 (names; _n frequency). |
top_RCDC Categories_Grants |
Most RCDC categories (names; _n frequency). |
top_HRCS HC Categories_Grants |
Most frequent HRCS Health Categories (names; _n frequency). |
top_HRCS RAC Categories_Grants |
Most frequent HRCS Research Activities Categories (names; _n frequency). |
top_Cancer Types_Grants |
Most frequent studied cancer types (names; _n frequency). |
top_CSO Categories_Grants |
Most frequent Common Scientific Outline (CSO) categories (names; _n frequency). |
top_Units of Assessment_Grants |
Most frequent Units of Assessment (names; _n frequency). |
top_Sustainable Development Goals_Grants |
Most frequently addressed Sustainable Development Goals (names; _n frequency). |