Alterism – Analyzing image description usage in mastodon.social
Abstract¶
The present report aims at analysing the use of alt text (image description) in social media posts published on the Fediverse instance mastodon.social [4], on the basis of tthe dataset mastodon.social alt text use by client app [5] by Stefan Bohacek [6], published on Kaggle under the MIT license [7].
+Abstract¶
The present report aims at analysing the use of alt text (image description) in social media posts published on the Fediverse instance mastodon.social [4], on the basis of tthe dataset mastodon.social alt text use by client app [5] by Stefan Bohacek [6], published on Kaggle under the MIT license [7].
This project was conceived and carried out by the group “Alterism” [3], formed by Cristal Rivera [1] and Tommaso Marmo [2], in the context of the Introduction to Data Science [8] course of the Artificial Intelligence and Sustainable Societies master [9].
Our research focused on understanding image description usage, analyzing it in detail using data science techniquies, and comparing it with literature concerning related topics.
Abstract¶
Project Domain¶
Definitions¶
-
+
- Alt-text: also known as “image description”, it is a critical accessibility feature, providing written descriptions for visual content (Voykinska et al., 2016). It is important for several reasons, among which the following are the three main ones.
- Fundamental for users who are blind or visually impaired.
- Central to the concept of Universal Design (Garofolo et al., 2022). @@ -7538,15 +7539,15 @@
Project DomainContextual information¶
- Global alt text usage within the dataset.
- Pinpointing the most popular clients being used. @@ -7555,16 +7556,16 @@
- Bar charts and boxplots will be used to examine the distribution of variables and detect outliers.
- Correlation analysis: relationships between variables will be explored using both Pearson and Spearman correlation coefficients. For example, correlations between posts (the number of statuses posted) and alt text usage percentage (percentage of posts with image descriptions) will help assess how variables might influence each other.
- Scatter Plots: relationships between variables such as posts and alt text usage percentage, providing a graphical representation of trends.
Project Domain¶Definitions¶
Why accessibility?¶
Although often praised in the public opinion, accessibility stays a critical yet thorny and overlooked issue, especially in the context of social media (Brady & Bigham, 2014). As our team is particularly concerned with the topic, we chose to develop our project at the intersection of data science and accessibility in social media.
+Contextual information¶Why accessibility?¶
Although often praised in the public opinion, accessibility stays a critical yet thorny and overlooked issue, especially in the context of social media (Brady & Bigham, 2014). As our team is particularly concerned with the topic, we chose to develop our project at the intersection of data science and accessibility in social media.
It is one of our chief intentions to highlight how visually disabled social media users are often being discriminated by abled users, even if unwillingly, because of the absence of image description. Our analysis will expose that the addition of image descriptions to social media posts is a virtuous but rare practice, instead of the norm.
-Why the Fediverse?¶
We opted to base our analysis on a Fediverse instance because of a combination of three reasons. Firstly, because the current policies of the most popular social networks make it extremely difficult and costly to access and analyze data accurately (Graham, 2023). Secondly, because decentralized social networks are gaining popularity and adoption, and they pride themselves on being more ethical and human-centric
(Zulli et al., 2020). Lastly, but most importantly, because Tommaso is a passionate long-term Fediverse user and instance administrator, and Cristal is very curious about the characteristics and potentiality of this network.
Why focusing on clients?¶
Clients are interfaces through which users access social media platforms. The diversity of clients is of paramount importance in Mastodon, given its open and interoperable nature. This allows any client to interact with the software, provided that it adheres to the application programming interface (API) standards.
+Why the Fediverse?¶
We opted to base our analysis on a Fediverse instance because of a combination of three reasons. Firstly, because the current policies of the most popular social networks make it extremely difficult and costly to access and analyze data accurately (Graham, 2023). Secondly, because decentralized social networks are gaining popularity and adoption, and they pride themselves on being more ethical and human-centric
(Zulli et al., 2020). Lastly, but most importantly, because Tommaso is a passionate long-term Fediverse user and instance administrator, and Cristal is very curious about the characteristics and potentiality of this network.
Why focusing on clients?¶
Clients are interfaces through which users access social media platforms. The diversity of clients is of paramount importance in Mastodon, given its open and interoperable nature. This allows any client to interact with the software, provided that it adheres to the application programming interface (API) standards.
As they are quintessential to the fruition of any content on Mastodon, clients significantly influence the user experience, including the process of adding alt text.
While investigating the specifics of client design is outside the scope of this analysis, we aim to uncover trends, recurrencies, and relevant insights starting from the information available in the dataset. Furthermore, by pinpointing the most popular clients and their features, we are going to observe how clients design and options could incentivize users to write image descriptions.
-Mastodon users’ information and growth¶
According to Dixon (2023), federated Mastodon servers had over ten million registered users collectively, as of March 2023. Mastodon, which shares similar micro-blogging features with respect to Twitter, gained roughly 500 thousand users within ten days of Elon Musk’s Twitter takeover on October 27th, 2022 (Bin Zia et al., 2023).
+Mastodon users’ information and growth¶
According to Dixon (2023), federated Mastodon servers had over ten million registered users collectively, as of March 2023. Mastodon, which shares similar micro-blogging features with respect to Twitter, gained roughly 500 thousand users within ten days of Elon Musk’s Twitter takeover on October 27th, 2022 (Bin Zia et al., 2023).
Since Mastodon is a decentralized software, it is harder to track its users with respect to centralized networks. Nevertheless, community projects similar to the-federation.inforef++ attempt to keep track of users’ statistics in real time.
-Project scope and objectives¶
The main research question of this project is to analyse alt text usage in Mastodon.social in relation to the clients used to publish posts on the platform. In particular, we will develop our analysis to address the following:
+Project scope and objectives¶
The main research question of this project is to analyse alt text usage in Mastodon.social in relation to the clients used to publish posts on the platform. In particular, we will develop our analysis to address the following:
Project scope and objectivesMethods¶
To analyze the usage of alt text in Mastodon posts across different clients, the following methods and algorithms will be applied:
-Exploratory Data Analysis (EDA)¶
EDA will be essential to explore the data and understand its structure.
-Descriptive Statistics¶
Key statistical measures such as mean, median, standard deviation, and percentiles will be calculated to summarize and understand the distribution of posts in different clients. These metrics will help identify central tendencies and variability in the data, providing insights into patterns and anomalies.
-Visualizations¶
-
+
Methods¶
To analyze the usage of alt text in Mastodon posts across different clients, the following methods and algorithms will be applied:
+Exploratory Data Analysis (EDA)¶
EDA will be essential to explore the data and understand its structure.
+Descriptive Statistics¶
Key statistical measures such as mean, median, standard deviation, and percentiles will be calculated to summarize and understand the distribution of posts in different clients. These metrics will help identify central tendencies and variability in the data, providing insights into patterns and anomalies.
+Visualizations¶
These steps align with Tukey’s (1977) focus on EDA as a fundamental process to understand data before applying advanced techniques.
-Data Preprocessing and Feature Engineering¶
Feature transformation: certain features will be transformed using logarithmic transformations. Logarithmic scale allows the visualization of the broad range of post counts. Transformations will help improve model performance by addressing skewness or heteroscedasticity in the data (Field, 2013).
+Data Preprocessing and Feature Engineering¶
Feature transformation: certain features will be transformed using logarithmic transformations. Logarithmic scale allows the visualization of the broad range of post counts. Transformations will help improve model performance by addressing skewness or heteroscedasticity in the data (Field, 2013).
Data Preprocessing and Featu
Dataset structure and description¶
-
+
- The source of the dataset comes from Bohacek (2024), retrieved on Kaggle: mastodon.social alt text use by client app.
- All the posts under analysis come exclusively from the Fediverse instance mastodon.social.
Dataset structure and description¶
@@ -7617,7 +7618,7 @@ Dataset structure and description
-Dataset import and preview¶
Import and print dataset assigning the variables data
and original_data
, so that the original dataset can be preserved, while we operate on the data
variable.
+Dataset import and preview¶
Import and print dataset assigning the variables data
and original_data
, so that the original dataset can be preserved, while we operate on the data
variable.
Dataset import and preview¶
Import and print dataset assigning the variables data
and original_data
, so that the original dataset can be preserved, while we operate on the data
variable.
Dataset import and preview¶
Import and print dataset assigning the variables data
and original_data
, so that the original dataset can be preserved, while we operate on the data
variable.
Dataset import and preview
-Dataset dictionary¶
All columns correspond to relevant data, and to achieve this we will be going through each column, one by one, to understand its meaning and rename it in more explicatory name.
+Dataset dictionary¶
All columns correspond to relevant data, and to achieve this we will be going through each column, one by one, to understand its meaning and rename it in more explicatory name.
Dataset dictionary¶
All columns correspond to relevant data, and to achieve this we will be going through each column, one by one, to understand its meaning and rename it in more explicatory name.
+Dataset dictionary¶
All columns correspond to relevant data, and to achieve this we will be going through each column, one by one, to understand its meaning and rename it in more explicatory name.
Dataset dictionary
-Data analysis and results¶
In this section, we undertake a comprehensive exploration of the dataset to uncover patterns, trends, and insights that can inform our understanding of the data and guide subsequent analysis of the mastodon data set. The primary focus is on performing Exploratory Data Analysis (EDA) to assess the dataset's analyis to identify relationships between variables, and detect potential anomalies. By applying statistical techniques and visualizations, we aim to analyze both individual variables and their interactions, uncovering meaningful clusters, trends, and dependencies. This stage also involves identifying feature importance, analyzing variability, and validating insights within the broader context of alt text usage on mastodon contributing to a deeper understanding of its role in promoting accessibility.
+Data analysis and results¶
In this section, we undertake a comprehensive exploration of the dataset to uncover patterns, trends, and insights that can inform our understanding of the data and guide subsequent analysis of the mastodon data set. The primary focus is on performing Exploratory Data Analysis (EDA) to assess the dataset's analyis to identify relationships between variables, and detect potential anomalies. By applying statistical techniques and visualizations, we aim to analyze both individual variables and their interactions, uncovering meaningful clusters, trends, and dependencies. This stage also involves identifying feature importance, analyzing variability, and validating insights within the broader context of alt text usage on mastodon contributing to a deeper understanding of its role in promoting accessibility.
@@ -7990,7 +7991,7 @@ Data analysis and results
-Data exploration¶
All columns represent relevant data. To ensure clarity and usability, we will review each column individually, analyze its meaning, and assign it a more descriptive and explanatory name.
+Data exploration¶
All columns represent relevant data. To ensure clarity and usability, we will review each column individually, analyze its meaning, and assign it a more descriptive and explanatory name.
@@ -8246,7 +8247,7 @@ Data exploration
@@ -8377,7 +8378,7 @@ Refine the dataset for analysis
-Exclude posts from unknown client(s)¶
Based on our initial observation, we recognize that the third most used client is labelled as “unknown”. Since our project stongly focuses on the impact of specific clients on the use of alt text, we arbitrarily choose to drop (exclude) from the dataset posts that come from an undefined client.
+Exclude posts from unknown client(s)¶
Based on our initial observation, we recognize that the third most used client is labelled as “unknown”. Since our project stongly focuses on the impact of specific clients on the use of alt text, we arbitrarily choose to drop (exclude) from the dataset posts that come from an undefined client.
@@ -8402,7 +8403,7 @@ Exclude posts from unknown client(
@@ -8451,7 +8452,7 @@ Count total clients
@@ -8542,7 +8543,7 @@ Checking values
-Global analysis¶
In this section, we will begin analyzing the overall use of alt text across all clients, independent of specific client information. This will provide a broader understanding of the general trends and patterns in the use of alt text within the dataset.
+Global analysis¶
In this section, we will begin analyzing the overall use of alt text across all clients, independent of specific client information. This will provide a broader understanding of the general trends and patterns in the use of alt text within the dataset.
@@ -8676,7 +8677,7 @@ Global analysis
-Analyzing client data¶
To visualize the distribution of posts across clients, we will plot the data to review potential outliers or skewed distributions. This will help us understand how the total number of posts is spread across each client and identify any patterns or irregularities in the data.
+Analyzing client data¶
To visualize the distribution of posts across clients, we will plot the data to review potential outliers or skewed distributions. This will help us understand how the total number of posts is spread across each client and identify any patterns or irregularities in the data.
@@ -8843,7 +8844,7 @@ Analyzing client data
@@ -8992,7 +8993,7 @@ Correlation coefficients
-Find the mode of posts per client¶
As the graph above shows, most of the clients have one post. Therefore, the most frequent total number of posts clients have, the mode, should be equal to one.
+Find the mode of posts per client¶
As the graph above shows, most of the clients have one post. Therefore, the most frequent total number of posts clients have, the mode, should be equal to one.
@@ -9031,7 +9032,7 @@ Find the mode of posts per client
-Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
+Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
@@ -9160,7 +9161,7 @@ Grouping by client popularity
@@ -9296,7 +9297,7 @@ Client-specific alt text analysis
-Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
+Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
@@ -9497,7 +9498,7 @@ Alt text usage in top 5 clients
-Exploring alt text features of the top 5 clients¶
Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
+Exploring alt text features of the top 5 clients¶Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
The most popular client is “Web”, the Web interface integrated in Mastodon’s source code [10] by default. It is what users access to by accessing the URL of their instance. Being accessible via browser, instances administrators can change the user interface to encourage alt text usage, for example by adding CSS snippets [11]. The administrators of mastodon.social do not seem to have adopted any practice to encourage alt text in addition to the default Web interface settings.
It is possible to enable a warning notification before posting an image without any description from the Web interface, but only in Mastodon Glitch Edition, an experimental fork of Mastodon, with more features (image 2).
-Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
+Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
-Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
+Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
There are two possibilities explaining the lack of image descriptions: either the software does not support the inclusion of alt text, or images originally posted on other platforms almost never have an image description. The second option is the most likely, as only one post out of 5806 has alt text, meaning that the software supported alt text posting at least once.
@@ -9530,7 +9531,7 @@ Bots¶
Lastly, Abov
-Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
+Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
Several analytical approaches were initially considered to understand patterns in the dataset, including linear regression, logistic regression, and k-nearest neighbors (KNN). However, these methods were not considered for the following reasons:
- Linear Regression: The correlation analysis between the percentage of posts with alt text and the total status count revealed a very weak relationship (Pearson correlation: -0.06, Spearman correlation: 0.03). This indicates a lack of linear association, a prerequisite for linear regression. Applying this method would not yield significant insights or could lead to overfitting (not being able to generalize) or underfitting (trying to build a linear model in nonlinear data).
@@ -9547,11 +9548,11 @@ Note on Analytical Methods
-Conclusion and future development¶
Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
+Conclusion and future development¶Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
Concerning the environment of the Fediverse, even though they supposedly are more sensitive to ethical issues, this analysis showed without any doubt that the use of image descriptions is still too limited.
-Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
+Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
Furthermore, our findings align with broader statistics, which indicate that much of the data on accessibility is either incomplete or underutilized. This underscores the pressing need for improved data collection practices and the prioritization of accessibility in digital content creation. By addressing these initial hurdles, future analyses can leverage more robust data to apply sophisticated techniques and generate actionable insights, ultimately contributing to a more inclusive digital landscape.
-Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
+Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
@@ -9742,7 +9743,7 @@ Appendix¶
-References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
+References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
- Bin Zia, H., He, J., Raman, A., Castro, I., Sastry, N. & Tyson, G. (2023). Flocking to Mastodon: Tracking the Great Twitter Migration. https://doi.org/10.48550/arXiv.2302.14294
- Brady E. & Bigham, J.P. (2014). How companies engage customers around accessibility on social media. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility (ASSETS '14). Association for Computing Machinery, New York, NY, USA, 51–58. https://doi.org/10.1145/2661334.2661355
Data analysis and results¶
In this section, we undertake a comprehensive exploration of the dataset to uncover patterns, trends, and insights that can inform our understanding of the data and guide subsequent analysis of the mastodon data set. The primary focus is on performing Exploratory Data Analysis (EDA) to assess the dataset's analyis to identify relationships between variables, and detect potential anomalies. By applying statistical techniques and visualizations, we aim to analyze both individual variables and their interactions, uncovering meaningful clusters, trends, and dependencies. This stage also involves identifying feature importance, analyzing variability, and validating insights within the broader context of alt text usage on mastodon contributing to a deeper understanding of its role in promoting accessibility.
+Data analysis and results¶
In this section, we undertake a comprehensive exploration of the dataset to uncover patterns, trends, and insights that can inform our understanding of the data and guide subsequent analysis of the mastodon data set. The primary focus is on performing Exploratory Data Analysis (EDA) to assess the dataset's analyis to identify relationships between variables, and detect potential anomalies. By applying statistical techniques and visualizations, we aim to analyze both individual variables and their interactions, uncovering meaningful clusters, trends, and dependencies. This stage also involves identifying feature importance, analyzing variability, and validating insights within the broader context of alt text usage on mastodon contributing to a deeper understanding of its role in promoting accessibility.
Data exploration¶
All columns represent relevant data. To ensure clarity and usability, we will review each column individually, analyze its meaning, and assign it a more descriptive and explanatory name.
+Data exploration¶
All columns represent relevant data. To ensure clarity and usability, we will review each column individually, analyze its meaning, and assign it a more descriptive and explanatory name.
Data exploration
@@ -8377,7 +8378,7 @@ Refine the dataset for analysis
-Exclude posts from unknown client(s)¶
Based on our initial observation, we recognize that the third most used client is labelled as “unknown”. Since our project stongly focuses on the impact of specific clients on the use of alt text, we arbitrarily choose to drop (exclude) from the dataset posts that come from an undefined client.
+Exclude posts from unknown client(s)¶
Based on our initial observation, we recognize that the third most used client is labelled as “unknown”. Since our project stongly focuses on the impact of specific clients on the use of alt text, we arbitrarily choose to drop (exclude) from the dataset posts that come from an undefined client.
@@ -8402,7 +8403,7 @@ Exclude posts from unknown client(
@@ -8451,7 +8452,7 @@ Count total clients
@@ -8542,7 +8543,7 @@ Checking values
-Global analysis¶
In this section, we will begin analyzing the overall use of alt text across all clients, independent of specific client information. This will provide a broader understanding of the general trends and patterns in the use of alt text within the dataset.
+Global analysis¶
In this section, we will begin analyzing the overall use of alt text across all clients, independent of specific client information. This will provide a broader understanding of the general trends and patterns in the use of alt text within the dataset.
@@ -8676,7 +8677,7 @@ Global analysis
-Analyzing client data¶
To visualize the distribution of posts across clients, we will plot the data to review potential outliers or skewed distributions. This will help us understand how the total number of posts is spread across each client and identify any patterns or irregularities in the data.
+Analyzing client data¶
To visualize the distribution of posts across clients, we will plot the data to review potential outliers or skewed distributions. This will help us understand how the total number of posts is spread across each client and identify any patterns or irregularities in the data.
@@ -8843,7 +8844,7 @@ Analyzing client data
@@ -8992,7 +8993,7 @@ Correlation coefficients
-Find the mode of posts per client¶
As the graph above shows, most of the clients have one post. Therefore, the most frequent total number of posts clients have, the mode, should be equal to one.
+Find the mode of posts per client¶
As the graph above shows, most of the clients have one post. Therefore, the most frequent total number of posts clients have, the mode, should be equal to one.
@@ -9031,7 +9032,7 @@ Find the mode of posts per client
-Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
+Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
@@ -9160,7 +9161,7 @@ Grouping by client popularity
@@ -9296,7 +9297,7 @@ Client-specific alt text analysis
-Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
+Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
@@ -9497,7 +9498,7 @@ Alt text usage in top 5 clients
-Exploring alt text features of the top 5 clients¶
Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
+Exploring alt text features of the top 5 clients¶Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
The most popular client is “Web”, the Web interface integrated in Mastodon’s source code [10] by default. It is what users access to by accessing the URL of their instance. Being accessible via browser, instances administrators can change the user interface to encourage alt text usage, for example by adding CSS snippets [11]. The administrators of mastodon.social do not seem to have adopted any practice to encourage alt text in addition to the default Web interface settings.
It is possible to enable a warning notification before posting an image without any description from the Web interface, but only in Mastodon Glitch Edition, an experimental fork of Mastodon, with more features (image 2).
-Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
+Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
-Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
+Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
There are two possibilities explaining the lack of image descriptions: either the software does not support the inclusion of alt text, or images originally posted on other platforms almost never have an image description. The second option is the most likely, as only one post out of 5806 has alt text, meaning that the software supported alt text posting at least once.
@@ -9530,7 +9531,7 @@ Bots¶
Lastly, Abov
-Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
+Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
Several analytical approaches were initially considered to understand patterns in the dataset, including linear regression, logistic regression, and k-nearest neighbors (KNN). However, these methods were not considered for the following reasons:
- Linear Regression: The correlation analysis between the percentage of posts with alt text and the total status count revealed a very weak relationship (Pearson correlation: -0.06, Spearman correlation: 0.03). This indicates a lack of linear association, a prerequisite for linear regression. Applying this method would not yield significant insights or could lead to overfitting (not being able to generalize) or underfitting (trying to build a linear model in nonlinear data).
@@ -9547,11 +9548,11 @@ Note on Analytical Methods
-Conclusion and future development¶
Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
+Conclusion and future development¶Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
Concerning the environment of the Fediverse, even though they supposedly are more sensitive to ethical issues, this analysis showed without any doubt that the use of image descriptions is still too limited.
-Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
+Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
Furthermore, our findings align with broader statistics, which indicate that much of the data on accessibility is either incomplete or underutilized. This underscores the pressing need for improved data collection practices and the prioritization of accessibility in digital content creation. By addressing these initial hurdles, future analyses can leverage more robust data to apply sophisticated techniques and generate actionable insights, ultimately contributing to a more inclusive digital landscape.
-Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
+Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
@@ -9742,7 +9743,7 @@ Appendix¶
-References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
+References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
- Bin Zia, H., He, J., Raman, A., Castro, I., Sastry, N. & Tyson, G. (2023). Flocking to Mastodon: Tracking the Great Twitter Migration. https://doi.org/10.48550/arXiv.2302.14294
- Brady E. & Bigham, J.P. (2014). How companies engage customers around accessibility on social media. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility (ASSETS '14). Association for Computing Machinery, New York, NY, USA, 51–58. https://doi.org/10.1145/2661334.2661355
Exclude posts from unknown client(s)¶
Based on our initial observation, we recognize that the third most used client is labelled as “unknown”. Since our project stongly focuses on the impact of specific clients on the use of alt text, we arbitrarily choose to drop (exclude) from the dataset posts that come from an undefined client.
+Exclude posts from unknown client(s)¶
Based on our initial observation, we recognize that the third most used client is labelled as “unknown”. Since our project stongly focuses on the impact of specific clients on the use of alt text, we arbitrarily choose to drop (exclude) from the dataset posts that come from an undefined client.
Exclude posts from unknown client(
@@ -8451,7 +8452,7 @@ Count total clients
@@ -8542,7 +8543,7 @@ Checking values
-Global analysis¶
In this section, we will begin analyzing the overall use of alt text across all clients, independent of specific client information. This will provide a broader understanding of the general trends and patterns in the use of alt text within the dataset.
+Global analysis¶
In this section, we will begin analyzing the overall use of alt text across all clients, independent of specific client information. This will provide a broader understanding of the general trends and patterns in the use of alt text within the dataset.
@@ -8676,7 +8677,7 @@ Global analysis
-Analyzing client data¶
To visualize the distribution of posts across clients, we will plot the data to review potential outliers or skewed distributions. This will help us understand how the total number of posts is spread across each client and identify any patterns or irregularities in the data.
+Analyzing client data¶
To visualize the distribution of posts across clients, we will plot the data to review potential outliers or skewed distributions. This will help us understand how the total number of posts is spread across each client and identify any patterns or irregularities in the data.
@@ -8843,7 +8844,7 @@ Analyzing client data
@@ -8992,7 +8993,7 @@ Correlation coefficients
-Find the mode of posts per client¶
As the graph above shows, most of the clients have one post. Therefore, the most frequent total number of posts clients have, the mode, should be equal to one.
+Find the mode of posts per client¶
As the graph above shows, most of the clients have one post. Therefore, the most frequent total number of posts clients have, the mode, should be equal to one.
@@ -9031,7 +9032,7 @@ Find the mode of posts per client
-Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
+Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
@@ -9160,7 +9161,7 @@ Grouping by client popularity
@@ -9296,7 +9297,7 @@ Client-specific alt text analysis
-Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
+Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
@@ -9497,7 +9498,7 @@ Alt text usage in top 5 clients
-Exploring alt text features of the top 5 clients¶
Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
+Exploring alt text features of the top 5 clients¶Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
The most popular client is “Web”, the Web interface integrated in Mastodon’s source code [10] by default. It is what users access to by accessing the URL of their instance. Being accessible via browser, instances administrators can change the user interface to encourage alt text usage, for example by adding CSS snippets [11]. The administrators of mastodon.social do not seem to have adopted any practice to encourage alt text in addition to the default Web interface settings.
It is possible to enable a warning notification before posting an image without any description from the Web interface, but only in Mastodon Glitch Edition, an experimental fork of Mastodon, with more features (image 2).
-Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
+Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
-Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
+Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
There are two possibilities explaining the lack of image descriptions: either the software does not support the inclusion of alt text, or images originally posted on other platforms almost never have an image description. The second option is the most likely, as only one post out of 5806 has alt text, meaning that the software supported alt text posting at least once.
@@ -9530,7 +9531,7 @@ Bots¶
Lastly, Abov
-Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
+Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
Several analytical approaches were initially considered to understand patterns in the dataset, including linear regression, logistic regression, and k-nearest neighbors (KNN). However, these methods were not considered for the following reasons:
- Linear Regression: The correlation analysis between the percentage of posts with alt text and the total status count revealed a very weak relationship (Pearson correlation: -0.06, Spearman correlation: 0.03). This indicates a lack of linear association, a prerequisite for linear regression. Applying this method would not yield significant insights or could lead to overfitting (not being able to generalize) or underfitting (trying to build a linear model in nonlinear data).
@@ -9547,11 +9548,11 @@ Note on Analytical Methods
-Conclusion and future development¶
Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
+Conclusion and future development¶Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
Concerning the environment of the Fediverse, even though they supposedly are more sensitive to ethical issues, this analysis showed without any doubt that the use of image descriptions is still too limited.
-Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
+Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
Furthermore, our findings align with broader statistics, which indicate that much of the data on accessibility is either incomplete or underutilized. This underscores the pressing need for improved data collection practices and the prioritization of accessibility in digital content creation. By addressing these initial hurdles, future analyses can leverage more robust data to apply sophisticated techniques and generate actionable insights, ultimately contributing to a more inclusive digital landscape.
-Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
+Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
@@ -9742,7 +9743,7 @@ Appendix¶
-References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
+References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
- Bin Zia, H., He, J., Raman, A., Castro, I., Sastry, N. & Tyson, G. (2023). Flocking to Mastodon: Tracking the Great Twitter Migration. https://doi.org/10.48550/arXiv.2302.14294
- Brady E. & Bigham, J.P. (2014). How companies engage customers around accessibility on social media. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility (ASSETS '14). Association for Computing Machinery, New York, NY, USA, 51–58. https://doi.org/10.1145/2661334.2661355
Checking values
-Global analysis¶
In this section, we will begin analyzing the overall use of alt text across all clients, independent of specific client information. This will provide a broader understanding of the general trends and patterns in the use of alt text within the dataset.
+Global analysis¶
In this section, we will begin analyzing the overall use of alt text across all clients, independent of specific client information. This will provide a broader understanding of the general trends and patterns in the use of alt text within the dataset.
@@ -8676,7 +8677,7 @@ Global analysis
-Analyzing client data¶
To visualize the distribution of posts across clients, we will plot the data to review potential outliers or skewed distributions. This will help us understand how the total number of posts is spread across each client and identify any patterns or irregularities in the data.
+Analyzing client data¶
To visualize the distribution of posts across clients, we will plot the data to review potential outliers or skewed distributions. This will help us understand how the total number of posts is spread across each client and identify any patterns or irregularities in the data.
@@ -8843,7 +8844,7 @@ Analyzing client data
@@ -8992,7 +8993,7 @@ Correlation coefficients
-Find the mode of posts per client¶
As the graph above shows, most of the clients have one post. Therefore, the most frequent total number of posts clients have, the mode, should be equal to one.
+Find the mode of posts per client¶
As the graph above shows, most of the clients have one post. Therefore, the most frequent total number of posts clients have, the mode, should be equal to one.
@@ -9031,7 +9032,7 @@ Find the mode of posts per client
-Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
+Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
@@ -9160,7 +9161,7 @@ Grouping by client popularity
@@ -9296,7 +9297,7 @@ Client-specific alt text analysis
-Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
+Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
@@ -9497,7 +9498,7 @@ Alt text usage in top 5 clients
-Exploring alt text features of the top 5 clients¶
Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
+Exploring alt text features of the top 5 clients¶Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
The most popular client is “Web”, the Web interface integrated in Mastodon’s source code [10] by default. It is what users access to by accessing the URL of their instance. Being accessible via browser, instances administrators can change the user interface to encourage alt text usage, for example by adding CSS snippets [11]. The administrators of mastodon.social do not seem to have adopted any practice to encourage alt text in addition to the default Web interface settings.
It is possible to enable a warning notification before posting an image without any description from the Web interface, but only in Mastodon Glitch Edition, an experimental fork of Mastodon, with more features (image 2).
-Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
+Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
-Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
+Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
There are two possibilities explaining the lack of image descriptions: either the software does not support the inclusion of alt text, or images originally posted on other platforms almost never have an image description. The second option is the most likely, as only one post out of 5806 has alt text, meaning that the software supported alt text posting at least once.
@@ -9530,7 +9531,7 @@ Bots¶
Lastly, Abov
-Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
+Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
Several analytical approaches were initially considered to understand patterns in the dataset, including linear regression, logistic regression, and k-nearest neighbors (KNN). However, these methods were not considered for the following reasons:
- Linear Regression: The correlation analysis between the percentage of posts with alt text and the total status count revealed a very weak relationship (Pearson correlation: -0.06, Spearman correlation: 0.03). This indicates a lack of linear association, a prerequisite for linear regression. Applying this method would not yield significant insights or could lead to overfitting (not being able to generalize) or underfitting (trying to build a linear model in nonlinear data).
@@ -9547,11 +9548,11 @@ Note on Analytical Methods
-Conclusion and future development¶
Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
+Conclusion and future development¶Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
Concerning the environment of the Fediverse, even though they supposedly are more sensitive to ethical issues, this analysis showed without any doubt that the use of image descriptions is still too limited.
-Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
+Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
Furthermore, our findings align with broader statistics, which indicate that much of the data on accessibility is either incomplete or underutilized. This underscores the pressing need for improved data collection practices and the prioritization of accessibility in digital content creation. By addressing these initial hurdles, future analyses can leverage more robust data to apply sophisticated techniques and generate actionable insights, ultimately contributing to a more inclusive digital landscape.
-Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
+Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
@@ -9742,7 +9743,7 @@ Appendix¶
-References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
+References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
- Bin Zia, H., He, J., Raman, A., Castro, I., Sastry, N. & Tyson, G. (2023). Flocking to Mastodon: Tracking the Great Twitter Migration. https://doi.org/10.48550/arXiv.2302.14294
- Brady E. & Bigham, J.P. (2014). How companies engage customers around accessibility on social media. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility (ASSETS '14). Association for Computing Machinery, New York, NY, USA, 51–58. https://doi.org/10.1145/2661334.2661355
Global analysis¶
In this section, we will begin analyzing the overall use of alt text across all clients, independent of specific client information. This will provide a broader understanding of the general trends and patterns in the use of alt text within the dataset.
+Global analysis¶
In this section, we will begin analyzing the overall use of alt text across all clients, independent of specific client information. This will provide a broader understanding of the general trends and patterns in the use of alt text within the dataset.
Analyzing client data¶
To visualize the distribution of posts across clients, we will plot the data to review potential outliers or skewed distributions. This will help us understand how the total number of posts is spread across each client and identify any patterns or irregularities in the data.
+Analyzing client data¶
To visualize the distribution of posts across clients, we will plot the data to review potential outliers or skewed distributions. This will help us understand how the total number of posts is spread across each client and identify any patterns or irregularities in the data.
Analyzing client data
@@ -8992,7 +8993,7 @@ Correlation coefficients
-Find the mode of posts per client¶
As the graph above shows, most of the clients have one post. Therefore, the most frequent total number of posts clients have, the mode, should be equal to one.
+Find the mode of posts per client¶
As the graph above shows, most of the clients have one post. Therefore, the most frequent total number of posts clients have, the mode, should be equal to one.
@@ -9031,7 +9032,7 @@ Find the mode of posts per client
-Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
+Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
@@ -9160,7 +9161,7 @@ Grouping by client popularity
@@ -9296,7 +9297,7 @@ Client-specific alt text analysis
-Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
+Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
@@ -9497,7 +9498,7 @@ Alt text usage in top 5 clients
-Exploring alt text features of the top 5 clients¶
Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
+Exploring alt text features of the top 5 clients¶Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
The most popular client is “Web”, the Web interface integrated in Mastodon’s source code [10] by default. It is what users access to by accessing the URL of their instance. Being accessible via browser, instances administrators can change the user interface to encourage alt text usage, for example by adding CSS snippets [11]. The administrators of mastodon.social do not seem to have adopted any practice to encourage alt text in addition to the default Web interface settings.
It is possible to enable a warning notification before posting an image without any description from the Web interface, but only in Mastodon Glitch Edition, an experimental fork of Mastodon, with more features (image 2).
-Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
+Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
-Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
+Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
There are two possibilities explaining the lack of image descriptions: either the software does not support the inclusion of alt text, or images originally posted on other platforms almost never have an image description. The second option is the most likely, as only one post out of 5806 has alt text, meaning that the software supported alt text posting at least once.
@@ -9530,7 +9531,7 @@ Bots¶
Lastly, Abov
-Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
+Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
Several analytical approaches were initially considered to understand patterns in the dataset, including linear regression, logistic regression, and k-nearest neighbors (KNN). However, these methods were not considered for the following reasons:
- Linear Regression: The correlation analysis between the percentage of posts with alt text and the total status count revealed a very weak relationship (Pearson correlation: -0.06, Spearman correlation: 0.03). This indicates a lack of linear association, a prerequisite for linear regression. Applying this method would not yield significant insights or could lead to overfitting (not being able to generalize) or underfitting (trying to build a linear model in nonlinear data).
@@ -9547,11 +9548,11 @@ Note on Analytical Methods
-Conclusion and future development¶
Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
+Conclusion and future development¶Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
Concerning the environment of the Fediverse, even though they supposedly are more sensitive to ethical issues, this analysis showed without any doubt that the use of image descriptions is still too limited.
-Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
+Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
Furthermore, our findings align with broader statistics, which indicate that much of the data on accessibility is either incomplete or underutilized. This underscores the pressing need for improved data collection practices and the prioritization of accessibility in digital content creation. By addressing these initial hurdles, future analyses can leverage more robust data to apply sophisticated techniques and generate actionable insights, ultimately contributing to a more inclusive digital landscape.
-Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
+Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
@@ -9742,7 +9743,7 @@ Appendix¶
-References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
+References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
- Bin Zia, H., He, J., Raman, A., Castro, I., Sastry, N. & Tyson, G. (2023). Flocking to Mastodon: Tracking the Great Twitter Migration. https://doi.org/10.48550/arXiv.2302.14294
- Brady E. & Bigham, J.P. (2014). How companies engage customers around accessibility on social media. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility (ASSETS '14). Association for Computing Machinery, New York, NY, USA, 51–58. https://doi.org/10.1145/2661334.2661355
Find the mode of posts per client¶
As the graph above shows, most of the clients have one post. Therefore, the most frequent total number of posts clients have, the mode, should be equal to one.
+Find the mode of posts per client¶
As the graph above shows, most of the clients have one post. Therefore, the most frequent total number of posts clients have, the mode, should be equal to one.
Find the mode of posts per client
-Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
+Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
@@ -9160,7 +9161,7 @@ Grouping by client popularity
@@ -9296,7 +9297,7 @@ Client-specific alt text analysis
-Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
+Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
@@ -9497,7 +9498,7 @@ Alt text usage in top 5 clients
-Exploring alt text features of the top 5 clients¶
Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
+Exploring alt text features of the top 5 clients¶Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
The most popular client is “Web”, the Web interface integrated in Mastodon’s source code [10] by default. It is what users access to by accessing the URL of their instance. Being accessible via browser, instances administrators can change the user interface to encourage alt text usage, for example by adding CSS snippets [11]. The administrators of mastodon.social do not seem to have adopted any practice to encourage alt text in addition to the default Web interface settings.
It is possible to enable a warning notification before posting an image without any description from the Web interface, but only in Mastodon Glitch Edition, an experimental fork of Mastodon, with more features (image 2).
-Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
+Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
-Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
+Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
There are two possibilities explaining the lack of image descriptions: either the software does not support the inclusion of alt text, or images originally posted on other platforms almost never have an image description. The second option is the most likely, as only one post out of 5806 has alt text, meaning that the software supported alt text posting at least once.
@@ -9530,7 +9531,7 @@ Bots¶
Lastly, Abov
-Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
+Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
Several analytical approaches were initially considered to understand patterns in the dataset, including linear regression, logistic regression, and k-nearest neighbors (KNN). However, these methods were not considered for the following reasons:
- Linear Regression: The correlation analysis between the percentage of posts with alt text and the total status count revealed a very weak relationship (Pearson correlation: -0.06, Spearman correlation: 0.03). This indicates a lack of linear association, a prerequisite for linear regression. Applying this method would not yield significant insights or could lead to overfitting (not being able to generalize) or underfitting (trying to build a linear model in nonlinear data).
@@ -9547,11 +9548,11 @@ Note on Analytical Methods
-Conclusion and future development¶
Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
+Conclusion and future development¶Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
Concerning the environment of the Fediverse, even though they supposedly are more sensitive to ethical issues, this analysis showed without any doubt that the use of image descriptions is still too limited.
-Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
+Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
Furthermore, our findings align with broader statistics, which indicate that much of the data on accessibility is either incomplete or underutilized. This underscores the pressing need for improved data collection practices and the prioritization of accessibility in digital content creation. By addressing these initial hurdles, future analyses can leverage more robust data to apply sophisticated techniques and generate actionable insights, ultimately contributing to a more inclusive digital landscape.
-Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
+Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
@@ -9742,7 +9743,7 @@ Appendix¶
-References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
+References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
- Bin Zia, H., He, J., Raman, A., Castro, I., Sastry, N. & Tyson, G. (2023). Flocking to Mastodon: Tracking the Great Twitter Migration. https://doi.org/10.48550/arXiv.2302.14294
- Brady E. & Bigham, J.P. (2014). How companies engage customers around accessibility on social media. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility (ASSETS '14). Association for Computing Machinery, New York, NY, USA, 51–58. https://doi.org/10.1145/2661334.2661355
Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
+Grouping by client popularity¶
The results of the observations on the most and least popular clients brought us to divide the analysis in two groups, and explore the use of alt text separately.
Client-specific alt text analysis
-Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
+Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
@@ -9497,7 +9498,7 @@ Alt text usage in top 5 clients
-Exploring alt text features of the top 5 clients¶
Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
+Exploring alt text features of the top 5 clients¶Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
The most popular client is “Web”, the Web interface integrated in Mastodon’s source code [10] by default. It is what users access to by accessing the URL of their instance. Being accessible via browser, instances administrators can change the user interface to encourage alt text usage, for example by adding CSS snippets [11]. The administrators of mastodon.social do not seem to have adopted any practice to encourage alt text in addition to the default Web interface settings.
It is possible to enable a warning notification before posting an image without any description from the Web interface, but only in Mastodon Glitch Edition, an experimental fork of Mastodon, with more features (image 2).
-Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
+Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
-Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
+Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
There are two possibilities explaining the lack of image descriptions: either the software does not support the inclusion of alt text, or images originally posted on other platforms almost never have an image description. The second option is the most likely, as only one post out of 5806 has alt text, meaning that the software supported alt text posting at least once.
@@ -9530,7 +9531,7 @@ Bots¶
Lastly, Abov
-Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
+Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
Several analytical approaches were initially considered to understand patterns in the dataset, including linear regression, logistic regression, and k-nearest neighbors (KNN). However, these methods were not considered for the following reasons:
- Linear Regression: The correlation analysis between the percentage of posts with alt text and the total status count revealed a very weak relationship (Pearson correlation: -0.06, Spearman correlation: 0.03). This indicates a lack of linear association, a prerequisite for linear regression. Applying this method would not yield significant insights or could lead to overfitting (not being able to generalize) or underfitting (trying to build a linear model in nonlinear data).
@@ -9547,11 +9548,11 @@ Note on Analytical Methods
-Conclusion and future development¶
Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
+Conclusion and future development¶Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
Concerning the environment of the Fediverse, even though they supposedly are more sensitive to ethical issues, this analysis showed without any doubt that the use of image descriptions is still too limited.
-Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
+Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
Furthermore, our findings align with broader statistics, which indicate that much of the data on accessibility is either incomplete or underutilized. This underscores the pressing need for improved data collection practices and the prioritization of accessibility in digital content creation. By addressing these initial hurdles, future analyses can leverage more robust data to apply sophisticated techniques and generate actionable insights, ultimately contributing to a more inclusive digital landscape.
-Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
+Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
@@ -9742,7 +9743,7 @@ Appendix¶
-References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
+References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
- Bin Zia, H., He, J., Raman, A., Castro, I., Sastry, N. & Tyson, G. (2023). Flocking to Mastodon: Tracking the Great Twitter Migration. https://doi.org/10.48550/arXiv.2302.14294
- Brady E. & Bigham, J.P. (2014). How companies engage customers around accessibility on social media. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility (ASSETS '14). Association for Computing Machinery, New York, NY, USA, 51–58. https://doi.org/10.1145/2661334.2661355
Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
+Alt text usage in top 5 clients¶
In the dataset, only five clients generated more than 1000 posts, accounting for more than 60% of the total posts. Hence, we will be investingating the specific posts more in detail.
Exploring alt text features of the top 5 clients¶
Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
+Exploring alt text features of the top 5 clients¶Web¶
Following what is brought to light by figure 6, we will now focus more in detail on the characteristics of the most popular clients for what concerns image description.
The most popular client is “Web”, the Web interface integrated in Mastodon’s source code [10] by default. It is what users access to by accessing the URL of their instance. Being accessible via browser, instances administrators can change the user interface to encourage alt text usage, for example by adding CSS snippets [11]. The administrators of mastodon.social do not seem to have adopted any practice to encourage alt text in addition to the default Web interface settings.
It is possible to enable a warning notification before posting an image without any description from the Web interface, but only in Mastodon Glitch Edition, an experimental fork of Mastodon, with more features (image 2).
-Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
+Official Mastodon apps¶
The third and fifth most popular clients are the official Mastodon applications for Android [13] and for iOS [14], respecively. In these cases, too, the default interface does not particularly incentivize the use of alt text by default. Nevertheless, from the applications settings it is possible to enable a notification warning before posting an image without description (opt-in) (images 4 and 5).
-Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
+Bots¶
Lastly, AboveMaidstoneBot and dlvr.it show very peculiar characteristics, strongly suggesting they are both automated software. We were unable to gather any information about the former, and we can only assume that it is probably a bot automating the publishing of some kind of content. The latter, instead, has an informative website [15] where it appears that it is a software built purposefully to enable cross-posting (the practice of publishing on one social media platform and automatically publishing the same content on another).
There are two possibilities explaining the lack of image descriptions: either the software does not support the inclusion of alt text, or images originally posted on other platforms almost never have an image description. The second option is the most likely, as only one post out of 5806 has alt text, meaning that the software supported alt text posting at least once.
Bots¶
Lastly, Abov
Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
+Note on Analytical Methods¶
Why Linear Regression, Logistic Regression, and KNN were not considered.
Several analytical approaches were initially considered to understand patterns in the dataset, including linear regression, logistic regression, and k-nearest neighbors (KNN). However, these methods were not considered for the following reasons:
- Linear Regression: The correlation analysis between the percentage of posts with alt text and the total status count revealed a very weak relationship (Pearson correlation: -0.06, Spearman correlation: 0.03). This indicates a lack of linear association, a prerequisite for linear regression. Applying this method would not yield significant insights or could lead to overfitting (not being able to generalize) or underfitting (trying to build a linear model in nonlinear data). @@ -9547,11 +9548,11 @@
Note on Analytical Methods
-Conclusion and future development¶
Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
+Conclusion and future development¶Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
Concerning the environment of the Fediverse, even though they supposedly are more sensitive to ethical issues, this analysis showed without any doubt that the use of image descriptions is still too limited.
-Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
+Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
Furthermore, our findings align with broader statistics, which indicate that much of the data on accessibility is either incomplete or underutilized. This underscores the pressing need for improved data collection practices and the prioritization of accessibility in digital content creation. By addressing these initial hurdles, future analyses can leverage more robust data to apply sophisticated techniques and generate actionable insights, ultimately contributing to a more inclusive digital landscape.
-Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
+Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
Conclusion and future development¶
Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
+Conclusion and future development¶Summary¶
In conclusion, the results of our analysis proved to be positive, as alt text usage in the dataset revealed to be strongly greater than in Twitter, but at the same time they are showing a lot of room for improvement for image descriptions in absolute terms, because only a minority of posts in the dataset had them. Furthermore, we found a strong inequality in the distribution of posts among clients, something we were not expecting, that forced us to adapt our approach and the flow of our analysis accordingly.
Concerning the environment of the Fediverse, even though they supposedly are more sensitive to ethical issues, this analysis showed without any doubt that the use of image descriptions is still too limited.
-Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
+Challenges¶
While our initial expectations for the project included applying more advanced technical models, the reality of working with real-world data often brings unforeseen challenges that must first be addressed. During our analysis, we encountered limitations such as the lack of significant features for certain models and the need for extensive data cleaning and restructuring. These challenges highlighted an essential truth: data analysis is often less about immediately implementing complex models and more about solving foundational issues to ensure data quality and reliability.
Furthermore, our findings align with broader statistics, which indicate that much of the data on accessibility is either incomplete or underutilized. This underscores the pressing need for improved data collection practices and the prioritization of accessibility in digital content creation. By addressing these initial hurdles, future analyses can leverage more robust data to apply sophisticated techniques and generate actionable insights, ultimately contributing to a more inclusive digital landscape.
-Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
+Future development¶
We believe that there is a great room for expansion and deepening of this analysis, not merely in terms of development, but chiefly in relation to the expansion of the dataset, in particular:
Appendix¶
References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
+References¶
Dataset: Bohacek, S. (2024). Mastodon.social alt text use by client app. Kaggle. https://www.kaggle.com/datasets/fourtonfish/mastodon-social-alt-text-use-by-client-app
- Bin Zia, H., He, J., Raman, A., Castro, I., Sastry, N. & Tyson, G. (2023). Flocking to Mastodon: Tracking the Great Twitter Migration. https://doi.org/10.48550/arXiv.2302.14294
- Brady E. & Bigham, J.P. (2014). How companies engage customers around accessibility on social media. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility (ASSETS '14). Association for Computing Machinery, New York, NY, USA, 51–58. https://doi.org/10.1145/2661334.2661355