Why is it so hard to analyze social media images at scale?

26th January 2018

In a way, social media image analysis is getting harder simply because of the quantity of images out there on social media. And it’s not just Instagram and Snapchat. Inexorably, visuals are taking over feeds that used to be text-based, like Twitter or the Facebook News Feed. So much that logging on to Twitter and not seeing images turns each post into a riddle. 


Yet at the global level, we still don’t really understand how, or why, most images spread. 

While social media analysis tools that parse and understand text are widespread and effective, machines often still struggle to identify what is depicted in images, much less understand the layers of meaning these images acquire in the context of social media conversations. 

As online conversations become more and more visual, brands and agencies are also very inclined to intercept, understand, and participate in these conversations at scale. Yet since images are difficult to analyze both quantitatively and in detail, on the brand-side, this often means that entire conversations are missed. 

(at Pulsar, we've built our own vertical vertical image analysis & recognition tools, which can be used to recognize many different features of images on social media)

The Sage Handbook of Social Media


Academics are beginning to dive into the matter, and with Farida Vis & Simon Faulkner, our own CEO (then VP of Product and Research) Pulsar Francesco D’Orazio tried to tackle some of these questions by analyzing different datasets and subjects with three different approaches, in a chapter in The SAGE Handbook of Social Media.

One key aspect is Francesco’s analysis of how the images of the death of refugee Alan Kurdi  –the small child – in 2015 spread around social media, shifting the perception of the global debate around the refugee crisis.

The chapter advocates that the study of visual social media may benefit from including a range of methods that, combined can offer deep insight into a single dataset.

The main takeaways from the chapter are:

  • There is no ‘catch-all’ method for analyzing images on social media, doing just quantitative network analysis or close qualitative work misses important aspects of the image’s subject matter and context
  • Studying social media images requires the use of a range of methods for deep insights such as content analysis, network analysis, deep learning, online ethnography;
  • These methods can be used to draw different insights from the same data and function in a mutually supportive way

Now let’s dive into some of the case studies:

Large-scale image analysis: The Phototrails and Selfiecity projects

Developed by Lev Manovich and the Software Studies Institute, the Phototrails data collection began in early 2011 and ended in 2012. City-specific data was extracted from Instagram geolocations and timestamps and displayed on the project website to display ‘visual signatures’ and ‘rhythms’ of the cities.

Although specifying individual images, the research identified macro-level patterns, rather than analyzing the pictorial content of the individual images. It tells us about the intensity with which images are posted to Instagram at certain times in specific places, they tell us little about what the images depict and why people created and uploaded them.

Selfiecity however, focused on a specific type of image, the selfie. The Selfiecity team narrowed down a large dataset to result in 640 selfies from five cities: Bangkok, Berlin, Sao Paolo, Moscow and New York. These were processed using facial analysis software to classify for face size, pose, emotional demeanor, the presence of glasses and smiles, whether eyes and mouth are open or closed, as well as gender and age.

They found that more selfies were taken by women and younger people and more people were identified as smiling in selfies from Bangkok and Sao Paolo than Moscow. Overall, this approach keeps discussion generic and does not address the question of how to reconcile the big data approach with a close study.

Working with images at different scales

Other studies have grappled with both large image-based datasets and specificities of specific images. One study in London after the 2011 riots discussed a mix of screengrabs and cameraphone photos of TV news in relation to John Berger’s understanding that the act of making a photographic images involves the basic statement ‘I have decided that this is worth recording’. This study required a shift from a systematic analytical methodology to a relatively arbitrary choice of a category of images and the images in the category.

This involved a mix of methods to break down large datasets to facilitate close qualitative analysis of specific images, but the researchers’ approaches responded to the specific content of the image. Therefore, this indicates that at the level of qualitative work, researchers must make intuitive choices about how to approach the analysis of images.

In-depth qualitative analysis of images

Several studies focus solely on the close reading of specific images. They offer good examples of how close interpretation of images can attend to the socio-cultural meanings of visual social media. Yet much of this work excludes context through which images were communicated and the scale of their circulation.


Case study combining these methodologies: the death of Alan Kurdi on Twitter

Alan Kurdi was a three-year old Syrian refugee who died with other members of his family in a failed attempt to cross the Mediterranean from Turkey to Greece. The Turkish photojournalist Nilüfer Demir photographed Alan’s dead body after it washed up on Bodrum beach in Turkey, producing a series of striking photographs that depicted the body lying at the meeting point between the sea and the beach, or picked up in the arms of a Turkish policeman. These photographs were first published by a Turkish news agency and circulated via mainstream and social media, triggering a significant on-the-ground response to the refugee crisis, including in the UK, where the VSML report the case study was based on focused.

Data, research questions and methods

Data was collected through Pulsar using historical data for 1-14 September 2015 from news, blogs, forums, Twitter and Tumblr, accounting to nearly 2.5m posts in various languages. Google search data was also included in the report to get a sense of global search patterns. The report tried to explore three questions: who was the child? Why did these particular images spread? Why was this the trigger to a the large-scale response after so many refugees had already died?


Below is a graph of the spread of the images during the first three hours they were published:

Social media images spread


Select findings:

  • The findings from the network analysis show that the diffusion of the story was constantly image led, but that didn’t mean that the same images were always being distributed.
  • The mediation of Alan Kurdi’s death was a process deeply entwined with the capacity of Twitter to act as a catalyst that allowed the communication of emerging stories to relevant audiences: Twitter made it possible for the story to spread before the mainstream media got hold of it.
  • It presented a particularly strong example of the ‘migration of images’ that has been a crucial aspect of their cultural function for millennia, but has been accelerated by social media.

The full chapter ‘Analysing social media images’ can be read in The SAGE Handbook of Social Media, SAGE publishing