Many, political experts included, are quick to tell us that this election is different and much harder to call. This is true, but part of the reason it is different is that the world is different than it was, even four years ago. People gather, think, communicate, consume information and are influenced very differently. Online communities are very different to the captive TV audiences of the past; ideas and political statements are no longer one way; broadcaster to voter. Opinions and debates are taking place online and opinions are challenged, altered and swayed in almost real-time.
Poll based forecasting was developed in a world where individuals, groups and demographics were more simply defined. Certain age groups, geographies, and education levels, voted consistently. Polling, voting data and commentary from previous elections provided the context and data to build a forecast. Phone a thousand people, apply some defined weighting, and out pops your result prediction.
Traditional polling is not that easy anymore. Vast swathes of voters aren’t contactable by phone or interested in taking online surveys. People change their minds more readily. Long-standing party loyalty is waning, as are communities that existed around political ideologies. People read multiple news sources and are as likely to be influenced by a Tweet from the other side of the world as by their parents, peers or political commentary. So, do traditional polls offer less value to forecasters and if so what can they do?
If you were to design a perfect method of predicting how people would vote, you would want to ask the view of every voter individually. That would, of course, be more difficult than running an election campaign itself. But vast numbers of people now make their views and opinions public, and almost everyone makes up their mind based on publically available information – so we have a lot of data to work with.
Collating all this information and reaching any kind of meaningful conclusion would have been impossible all but a few years ago. Modern analytics now allows us to blend lots of data from lots of different environments and use machine learning and AI techniques to convert this vast amount of messy data into insights we can understand and make informed decisions upon.
Social media is an obvious source of such data. Looking at Twitter, Facebook, Instagram and Reddit, even Tinder, can give us a great deal of information on opinion and how people will most likely vote. Newspapers, blogs and media comment sections are also a lucrative source of information and unlike most internet forums can reflect the views of an older generation. However, the challenge of any prediction from data is to make sure that your data is representative, this may not be possible using public data streams.
Many analyses of this information sources look for keywords, which only tells us how many people are Tweeting about a certain candidate, not whether they will actually vote for them. However, with correct training, AI can be used to spot positive and negative sentiment, whether a comment was an outlier, or even sarcasm and satire. In this way, AI builds a picture of voting intention of large communities within social media users. This isn’t something which will happen far away in the future, there are already companies out there who are applying these techniques.
Social media obviously cannot provide a complete picture. Polls still have a value as they ask a direct question about intention. Older demographics of voters, who use social media less, are still reasonably willing to answer the phone, so for this group phone surveys are particularly useful.
None of these data sources alone has all the answers, but together they build up a much more accurate picture. Where AI really offers something different from previous approaches is the relative ease with which it can bring lots of different messy data sources together and compare them. With the right training, AI can break down data into different demographics, and apply sentiment analysis to gain insight into how different groups will vote.
Historical data is also important. Polling, voting data and commentary from previous elections provides the context and training data used to train machine learning models and AI to recognize human behavior patterns, indicating how people express opinions and vote.
There are warnings and caveats, however. Firstly, machine learning models can only work within the context they are provided with. They will not be able to foretell a major political upset which could suddenly change a lot of minds – a candidate scandal for example.
Secondly, getting it right is complicated. It requires data scientists who can develop the sophisticated models and train them to a point where they can start digesting publically available information on their own and produce meaningful predictions.
But – and this would not be true as little as four years ago – it is possible. Data analytics, machine learning, and AI, can now be employed to blend lots of different data sources and develop more accurate predictions than traditional polls alone. Similar approaches to understanding human sentiment across large groups, based on public data, are already used in the finance, fraud prevention, retail, and investment industries, so we are not talking science fiction.
Humans are complex and any computer model will contain uncertainties. But a well-trained AI model may now be able to comfortably out-predict traditional experts. It will very likely play a part in political forecasting in the future; how much so, depends on the appetite of forecasters. AI is being used to combine diverse data sources and derive valuable insight in all sorts of industries. There’s no reason polling shouldn’t be one of them.