• Analysing Francis Bacon: Part 2 – First Steps

    “All painting is an accident. But it’s also not an accident, because one must select what part of the accident one chooses to preserve.”

    -Francis Bacon

    Initial Data Exploration

    The first part of any data analysis project is data collection. In order to capture Francis Bacon’s 500+ paintings to start my analysis, I needed to find a reliable source- thankfully, all of the artist’s known works are recorded in a ‘Catalogue Raisonné’ found on the Francis Bacon estate’s website. The paintings are grouped by decade and each artwork comes with related details including the title, CR code (a unique painting identifier), painting dimensions, as well as the materials used to create it. There is other additional information that varies from painting to painting.

    The format of the webpage that displays each painting is regular and consistent- an image of the painting at the top of the website, followed by title etc. Although this provided an opportunity to automatically cycle through each page and retrieve the necessary information (a technique called web-scraping), I decided not to do this for the initial exploration because I wanted to see as many individual paintings as I could firsthand.

    For the initial data exploration, I recorded as much data as possible for the first 200 paintings of Bacon’s career in an Excel spreadsheet and used a free-to-use data visualisation tool called Flourish to produce the following plots.

    Visualising the data

    The bar chart above shows the number of paintings Bacon painted each year for the first 200 artworks he produced. We can see that there are two distinct periods of his career: the early period (1929 to 1936) where only 15 paintings are catalogued, and the ‘start’ of Bacon’s artistic career from 1944 onwards.

    Bacon used to insist that his professional artistic career began in 1944, it is well known that he was painting much earlier than that. His first known paintings, ‘Watercolour’ and ‘Gouache’, were created during this early period and were directly inspired by Pablo Picasso whose style is very apparent in these works. Bacon adopts Pablo’s signature abstract shapes combined with flat, muted colours- something starkly different to the more vivid and bold art seen later in Francis’ career.

    ‘Watercolour’, 1929

    ‘Gouache’, 1929

    Between these two periods of Bacon’s early career there is an 8 year gap from 1936 to 1944 where Bacon seemingly produces no paintings. The reason for this is unknown but Bacon’s reputation for destroying his own works, as well as the start of WW2, may have been the cause of this inactivity.

    There is evidence that Bacon was producing and destroying much of his earlier artworks. A 1932 painting by fellow artist and admirer Roy de Maistre gives a glimpse of Francis Bacon’s studio and shows the faces of unknown paintings by Bacon stacked on top of one another.

    Roy de Maistre (1894-1968), ‘Francis Bacon’s studio’, 1932

    The dynamic scatter plot above shows how the size of Bacon’s paintings changed over time. The average size of a Bacon painting (based on the first 200 he created) is approximately 1.4 m by 1 m (in the art world, the height dimension comes before the width). We can see that the majority of his early paintings from 1929 to 1944 were below this average size, but later on in his career he begins to use larger canvases- the largest being 2.02m × 1.42m.
    It is also interesting to see clustering around the 0.6m × 0.5m, 1.2m ×  1.5m and 2m × 1.4m sizes showing that Bacon had preferences (or was commissioned) to use a particular sized canvas. In addition to this, we can also see from the plot that Bacon preferred portrait paintings over landscape. Only 4 of the 200 paintings shown in the scatter plot are landscape paintings where the width of the painting is greater than the height.

    The reason why I felt recording the dimensions of the paintings was important was because I wanted to explore whether the size of the painting affected the auction price. However, retrieving auction price information of paintings is difficult, firstly because a lot of these paintings were sold in private and so the price the painting sold for is not publicly disclosed. Secondly, because many of Bacon’s paintings were sold or donated decades ago and so auction price information is simply not there.

    From the data available, I was also able to visualise common materials Bacon used for his paintings in a heat map. The painting base (y axis) is the material Bacon painted onto and the drawing material (x axis) is the material he used to paint with. The deep red colour of the canvas and oil box shows that Bacon had a clear preference for this combination, being used in 181 paintings out of the 200. The second most common pairing was sand and canvas. Although sand might seem like an unusual material to paint with at first, some digging revealed that Bacon used sand to create more texture to the paint- he wasn’t using sand to paint with, rather combining it with paint to enhance the image.

    The final visualisation I created was one of the current ownership of the paintings. The majority of the paintings are currently owned by anonymous private entities but there are some renowned institutions, such as the Tate gallery in the UK and Museum of Modern Art in New York, that own a few of Bacon’s works. An interesting discovery was that the the Sainsbury Centre for Visual Arts in Norwich holds the largest collection of Bacon’s paintings by a single known entity with 8 artworks.

    This initial data exploration was not meant to be an in depth analysis. Instead, the aim was to give me a better understanding of what I was dealing with and how I can use the information in the future for more complex investigation. As expected, the data acted like a window through which I could see glimpses of Bacon’s life and mind, and has only motivated me even more to study his paintings at a deeper level, and try to answer questions like: Why are his paintings are so powerful? Is there a method to the madness seen in his art? How are the prices for his paintings justified?

  • The Data Revolution in Sustainable Development

    In 2015, all UN Member States agreed on a 15-year plan to meet the Sustainable Development Goals (SDGs). These were 17 targets aimed at tackling urgent global issues in four key areas: climate change, poverty, human rights and gender inequality. In order to reach these goals by 2030, the United Nations also recognised the need to use big data as a foundation for better decision-making by governments. Leaders required more data to understand the poorest and most marginalised populations, for example, to reach zero extreme poverty as laid out by the SDGs. In response to this, the UN has played a key role in facilitating the data revolution and its integration into developmental policy-making.

    Sustainable Development Goals adopted by UN member states in 2015

    With the massive amounts of digital information being generated each day, international authorities saw huge potential for “data exhaust” (passively generated data) to be used for public good. When used responsibly, big data allows governments to gain a deeper understanding of their people and the environment. For example, survey data for income levels in less developed countries is known for being unreliable. Thanks to the lowering costs of technology, however, mobile phone payments have become increasingly widespread in these regions, and so more information about the people’s spending patterns has become available. Transactional data, combined with satellite imagery, can be analysed using big data techniques and can reveal the areas where people are most affected by extreme poverty, allowing governments to take focused action.

    PulseSatellite uses machine learning to analyse satellite images. One of its features allows it to identify rooftops (right) in hard-to-reach areas like slums.

    In addition to this, better access to technology has allowed all parts of society to be represented in some way or form within datasets, which means that previously hidden issues in society can be revealed through analysis. The informal sector is an area generally not represented well in official statistics because it is difficult to identify and measure. Analysing unstructured data in the form of social media posts and local radio transcripts reveal a more accurate picture of this part of society than traditional survey methods, that potentially let hidden humanitarian crises go unnoticed.

    Despite the clear benefits presented by big data, it poses several major challenges. Although data has a lot of potential to do good, if misused, it could enable breaches in human rights. Without the correct measures in place, people in positions of power could exploit technologies to cause harm to ethnic or religious groups. Although the majority of data may seem anonymous at first glance, digital footprints are unique and so combining multiple datasets could result in the re-identification of individuals, potentially putting them in harms way.

    Secondly, some countries do not have the technological infrastructure to benefit from big data. These include Least Developed Countries (LDCs), Land-locked Developing Countries (LLDCs) and Small Island Developing States (SIDS). It is important to focus on developing these areas by providing them with the same access to tools that ‘data-rich’ countries have in order to prevent widening inequality from big data implementation.

    Another major challenge posed by big data is that the techniques required to analyse the massive datasets generated each day are largely limited to the private sector. Social media, for example, is a great source of data. But the tools and expertise owned by companies like Facebook, are used to reach corporate goals rather than global development ones. In response to this, the United Nations is playing a key role in facilitating discussions and partnerships with private entities, to allow for more resources to be distributed through ‘data philanthropy’.

    Social media platforms including Facebook, Weibo, Reddit, Instagram and Twitter

    An Independent Expert Advisory Group was asked by the UN Secretary-General in 2014 to make recommendations on how to tackle these big data challenges. Some of the solutions proposed included improving data literacy, implementing regulations without stunting innovation, and distributing technology equally focusing on the less developed areas first. The UN’s vision for a data revolution has already started to take shape within its own systems, as big data technologies are being integrated into UN agencies, funds and initiatives, such as Global Pulse.

    Five years after the SDGs were first agreed upon, the 2020 SDG report showed that with the support of big data, progress was being made. For example, global poverty levels reduced to 8.2% in 2019 and, without factoring in the impact of COVID-19, it was projected to reduce even further to 7.4% by 2021. But as governments have been faced with the worst global crisis in years with the coronavirus pandemic, they have never been more reliant on data and the data revolution has only just been accelerated.

    Proportion of people living below $1.90 a day, between 2010 and 2021, forecast before and after COVID-19 (percentage) 1

    Sources:
    1 https://unstats.un.org/sdgs/report/2020/goal-01/
    https://www.undatarevolution.org/measuring-sustainable-development/
    https://www.unglobalpulse.org
    https://www.un.org/en/sections/issues-depth/big-data-sustainable-development/

  • Analysing Francis Bacon: Part 1 – An Ambitious Goal

    I believe in deeply ordered chaos.

    -Francis Bacon

    I have had an idea forming in my mind for about a year now. The first spark came after playing with random numbers using the programming language Python- perhaps there was a way I could use these random numbers to ‘paint’? The first artist that sprang to mind was American artist Jackson Pollock who embraced this idea of ‘controlled randomness’ to create his masterpieces. Surely it wouldn’t be too difficult to replicate one of his pieces on my computer. Colours can, in fact, just be represented as RGB values from 0 to 255. So, generate enough of these random values and structure them in an array and surely you’ve got yourself a Pollock painting, right?

    Jackson Pollock (1912-1956), ‘No 5.’, 1948

    An array of random RGB values

    Well, an hour later, I had written maybe 20 lines of inefficient code (I want to blame inexperience for this?), the output of which looked more like TV static noise than an awe-inspiring abstract piece of art. But at this point, I had already given up. So, like most of my Python projects, I put a pin in it so that maybe I might return more determined to tackle the problem again.

    Fast forward to a few weeks from now, I watched an incredible BBC documentary ‘Francis Bacon: A brush with violence’. Such a well-produced film that it would be difficult to find a viewer left uninspired. I, for one, found my ‘Pollock idea’ rekindled. Like Pollock’s style, Francis Bacon also strongly believed in randomness and chaos but in a controlled manner that resonated in his work. Bacon’s paintings however draw directly from the human image rather than pure abstract imagination and so my initial idea began to take on a different form. Instead of creating everything from scratch (generating random numbers), I imagined how it could be possible to use samples of an artist’s work to train some sort of machine learning model that could spit out paintings of the same style. And who better to choose than Francis Bacon?

    Cecil Beaton (1904-1980), ‘Francis Bacon in his studio’, 1960

    Bacon created almost 600 known surviving paintings1 during his career spanning over seven decades, which is more than enough image data to get going with. In addition to that, his distinctive style and use of vibrant colours is ideal because that way I have a better sense of what I want a successful output to look like.

    Bacon created almost 600 known surviving paintings1 during his career spanning over seven decades, which is more than enough image data to get going with. In addition to that, his distinctive style and use of vibrant colours is ideal because that way I have a better sense of what I want a successful output to look like.

    The first step of the process is to understand the data. I have already collected some images of Bacon’s paintings and started to experiment with different image processing Python libraries to see what I can do to better understand his works before exploring machine learning possibilities. I will cover the initial image analysis in part 2.

    Not only do I want to use this project to improve my coding skills and up skill in image processing and machine learning, I also want to take this opportunity to learn more about the artist through data and how each stage of Bacon’s life directly affected his paintings. And I think that is the magic of data- it will reveal things invisible to the naked eye but will also leave you with more questions than answers.

    1 https://www.francis-bacon.com/artworks/paintings/1990s

  • Defining Big Data with Big Vs

    If someone were to define a dataset of x amount of bytes and called it ‘Big Data’, the term would become obsolete within months. Technology is advancing at such a rapid rate that datasets are in turn getting larger and larger. We would have to start using comparative adjectives such as ‘Bigger Data’ and ‘Larger Data’, and this would defeat the point.

    In the modern technological age, we are no longer constrained to taking samples because we are often able to collect all the data needed on an entire population. Using the size of the data alone to define ‘Big Data’ is not enough. In 2001, Doug Laney (a former analyst at Gartner) helped outline some key characteristics to help move towards a better understanding what ‘Big Data’ actually is. The Big Vs.

    Volume

    How much data is being collected? How much data is being stored? These questions help gauge the volume of data. A 2012 study by the University of Oxford and IBM revealed that over half of the 1,144 data professionals surveyed judged datasets of between 1 Terabyte and 1 Petabyte to be ‘big’. More interestingly, about a third of the respondents said they frankly didn’t know.

    As a general statement, datasets that are so big that they cannot be collected, stored and analysed using traditional computing methods can be considered to meet the volume criteria.

    Variety

    Variety is a measure of how diverse the data is. This can refer to one of two things: the source of the data or the structure of the data. The former is usually considered when collecting data. Take for example, polling data- this can be collected directly after voters cast their votes on the ground, or it can be collected from online surveys. Although the format of the data is different, i.e. in raw paper form or stored in online databases, the data represents the same thing.

    Secondly, variety can be derived from structure. Data can generally be categorised into three ‘buckets’: structured, semi-structured and unstructured. Let’s take the polling data example from above again; data collected from online surveys will follow a consistent format and so this would be considered as semi-structured (there may still be text that need to be fully processed before it can be classed as structured). However, data collected directly from the ground with pen and paper may not follow such a standard format- people could write in words or could maybe just dictate their answers out loud. This data would be considered unstructured.

    Velocity

    The speed at which data needs to be collected, processed and analysed defines the final key characteristic of big data. Velocity and volume come hand in hand because the faster the data is generated, the more data there is. Likewise, the faster the data is collected and processed, the more there is to be analysed allowing for actionable decisions to be made. High speed data processing can be seen all around us from the image processing in your smartphone camera to watching videos on your laptop. This is usually referred to as stream processing and is beneficial when data need to be interpreted in real time.

    On the flip side of the coin, there is batch processing which is usually a slower form of data flow but allows for much larger volumes of data to be processed. A good example of this is financial data. There is no need for the data to flow continuously and quickly because it is more important the data comes in correctly. That’s why most financial institutions generate their data in chunks or ‘batches’ so that they can rely more on the data and also be more cost effective.

    Over the years, there have been more words that have been proposed including Veracity, referring to the reliability of the data and Visualisation, referring to how the data is presented in an interpretable manner. Although, these characteristics definitely help add to the definition of big data, it goes to show that the term can be used in many ways. Data is in fact, just a universal tool that allows us to better understand ourselves and the world we live in.

  • The 4 Pillars of OOP

    Object oriented programming (OOP) is a framework used in programming languages, such as Java and python, that is based on the concept of “objects”. The OOP framework is characterised by 4 features that are often referred to as “pillars” to emphasise their importance: Encapsulation, Abstraction, Inheritance and Polymorphism.

    Encapsulation

    Encapsulation describes the act of organising data and functions into distinct blocks. These distinct blocks are called classes and a single class can contain both data (data within a class are usually referred to as attributes) and functions (functions within a class are usually referred to as methods). By organising code into unique classes, it becomes easier for a developer or programmer to identify problems and resolve them because their program is, in a sense, modular. An “object” is described as an instance of a class- this means that it has the same features defined within the class i.e. contains the same data and functions.

    Abstraction

    Once encapsulation is achieved, the next step is to be able to interact with the “objects” and use them without affecting the internal code contained within the original class it came from. Abstraction describes the act of making only the essential features of an object accessible but keeps the main structure hidden away.

    Inheritance

    Often classes can be very similar to each other and so Inheritance describes the action of bringing attributes and methods from one class to another without having to define them again. Classes that have inherited from another class can also have additional attributes and methods added to them but the inherited attributes and methods cannot be changed. The original class is usual referred to as the parent class and the inherited class is usually referred to the child class.

    Polymorphism

    Polymorphism is the ability for classes to have different methods defined for its purpose. Polymorphism is similar to inheritance in that it uses this same relationship between two similar classes however polymorphism has the added benefit of being able to change an already defined parent class method in the child class.

  • How Big Data is helping environmental conservation

    The amount of available environmental data has exploded in past five years thanks to improvements in data collection and big data. By combining seemingly disparate datasets, the accuracy of the global environmental image has improved and has given scientists new insights into how best to conserve the environment.

    Satellites have been an integral driver of this data revolution in recent years. Compared with the 1970s, satellite imagery of the earth is now 20 times higher in quality, with images being sent back to earth 60 times more frequently. Data storage capabilities have increased from less than 2 exabytes to 4.6 zettabytes thanks to cloud technologies. With this infrastructure in place, collaborations, such as those between NASA and the European Space Agency, have been possible. The two agencies partnered up in 2018 to publish the most complete assessment of Antarctica’s Ice Sheet Mass Balance1. Using imagery from 24 satellite surveys, the study showed that between 1992 and 2017, Antarctica ice loss directly raised global sea levels by 7.6 millimetres. This is in contrast to the IPCC’s (Intergovernmental Panel on Climate Change) statement only a few years earlier that their model predictions showed that future Antarctic ice loss will have a negative contribution to global sea levels. At the current rate of ice loss Antarctica is experiencing, according to the NASA/ESA study, the projections for sea level rise in the near future indicate stark changes in the environment.

    Another demonstration of Big Data in environmental conservation is the work of Wildlife Insights, a collaboration between a number of major animal conservation bodies, including the WWF, Wildlife Conservation Society, and Google. One of the main aims of Wildlife Insights is to make use of the many millions of images produced by camera traps (small cameras that take images when its sensors detect movement) located in the wild around the world. Historically it has been difficult to process these images and extract useful insights from them because of barriers in technology. But with the improvement in Big Data processing tools, as well as analysis techniques such as Artificial Intelligence and Machine Learning, camera trap image processing has given conservationists a better understanding of wild animal populations in jungles, forests and more. Hoping to connect these Big Data insights with decision makers, Wildlife Insights regularly liaise with local governments and land managers so that they can gain a better understanding of the wildlife in their area.2

    Whilst big data has had a major impact on environmental studies, it has also aided corporations to better understand their operations and potential contribution to environmental damage. Pirelli is an Italian tire company which puts technology at the front of its business practices. Using HANA, a data management system from SAP, Pirelli has improved transparency in their processes allowing for more efficient detection of defective tires going to landfills. Increases in the speed of the data stream, thanks to the management system, has improved data collection, processing and distribution on all fronts of the company and it has in turn reaped the benefits by meeting waste reduction goals and increasing profits.3

    With massive amounts of data comes many challenges in the form of processing, storage and analysis limitations, but without fail, this data gives us a more detailed understanding of the world we live in and the ways we can protect it. Currently, Big Data is playing a vital role in environmental conservation and as it proves to be more accurate in forecasting for the future, decision makers must implement appropriate policies accordingly to tackle these issues and help conserve the environment.

    1 https://climate.nasa.gov/news/2749/ramp-up-in-antarctic-ice-loss-speeds-sea-level-rise/

    2 https://www.wildlifeinsights.org/about

    3 https://sustainablebrands.com/read/cleantech/what-does-big-data-mean-for-sustainability