Bayes and Girls

2533475728_a0dcfd4524_zYou have long hair, never miss the Great British Bake-off and are particularly good at multitasking. You also work in Parliament. So are you more likely to be a man or a woman?

Well, some pretty lazy stereotypes aside, we may believe that the initial description is more likely to be female than male. The cliches certainly push us into thinking that way.

But, we should also consider the under-representation of women at all levels in British politics, a fact which has been highlighted in the news.

So, ignoring the introduction spele, if we were simply asked what is the chance that someone picked randomly from Parliament is a guy or a doll, how could we go about answering this? Well, let’s think in terms of the proportional representation of these two groups. Chances of picking a fella is certainly going to be larger. We can reasonably then go and place our stack of chips with the chaps.

What happens when we also consider the multitasking, baking-lover characteristics? These are things that stereotypically we would associate with the fairer sex so, we may reason, this person is more likely to be female. The result is a conflict between these two guesses.

7204293066_1866f0488a_zHow many men have long hair, like baking shows and can pat their head and rub their stomachs at the same time? Actually, probably quite a few. Maybe not proportionally as many as women but if we then include what we already know about the relative population sizes in Parliament, there may well be, in absolute terms, more men who fit this description. Simply by the prior fact that there are so many blokes in parliament, this may well weigh the answer in their favour.

This example is a different take on a common example of something called decision heuristics which can lead to cognitive bias. That is, rules-of-thumb that you and I use to take mental shortcuts, focusing on one aspect of a problem rather than the whole thing. It involves presupposition and is an example of where our intuition can lead us astray. The description of the person’s characteristics seduces us into imagining a woman but the baseline gender ratio, which is sneaked-in afterwards, is probably a better indicator.

What this little example can show is how we can go about combining two separate bits of information to get an overall, rational answer. In statistics this can be done formally using something called Bayes’ theorem.

In general terms, Bayes’ theorem takes what we already know- called the prior- and combines this with what extra information or data we observe- called the likelihood. As a result, we then have an answer- called a posterior- that is influenced by both of these things. How much it is influenced by each depends on the strength and conviction of the prior belief or data respectively. If they agree, then the answer is more certain than it would have been if we had used only one of them in isolation and if they are conflicting then the answer represents this too.

In the British politics example above, the prior could be what we know about the proportion of men and women and the extra data is the description of the employee. Like a tug-of-war, these two bits of information pull us in different directions. For example, we can not simply go along with the description in isolation and bank on a broad. The conclusion could be that, before hearing the characteristics, we are fairly sure we would get a gent at random and, even after hearing the profile, we may still think this is the most likely outcome, although we’re less sure about it.

5985805174_bd5e2cfe98_zKnowing that the mystery person has long hair moves us towards thinking that it is more likely to be a woman compared to before we knew anything about their hair-do but it’s just not enough to overpower what we already know about the parliamentary gender bias. That said, even though the flowing locks may not actually change our mind, they would introduce more doubt. Bayes’ theorem could help quantify this doubt.

Bayes’ theorem has applications far and wide, including spam filtering, internet search engines and voice recognition software. Originally, its statistical fundamentals were thought a little shaky, so have been extensively discussed and argued but it is fair to say a lot of progress has been made and the theorem has attained acceptance in most fields. That said, It has some way to go before it’s nearly as popular as the Great British Bake-off.

Post by: Nathan Green

Prediction

3378813599_64d9006e0f_zHave you ever wondered what you would do if you could see in to the future? It’s right up there with invisibility, flight and super-strength as the super-power that people would love to possess.

I don’t mean the type of prediction you get from suspect clairvoyants who read palms or tea leaves. I mean prediction with certainty, every time, guaranteed. The obvious thing would be to predict the winning lottery numbers and watch with knowing satisfaction as each ball drops into place.

Or perhaps, if sport is your thing, you could predict the Grand National winner or the football scores – just like Marty McFly in the Back to the Future films.

After enjoying the rewards for personal gain, maybe you could use your new superpower for good, like Bill Murray in Groundhog Day, saving kids from dangerous falls and helping stranded old ladies.

Or maybe you could simply use your predictive powers to plan that camping trip to coincide with sunshine for a change. This might even be the most spectacular achievement given the Great British weather and the infamous difficulty in predicting it.

This is clearly the stuff of fantasy but scientists look into the future all the time using mathematical models. The predictions are best guesses of what might happen. Predictions are useful for all kinds of things from students’ predicted A-level grades to the state of the economy.

The problems with predictions is that the further into the future we go, the more uncertain we are of what might happen. If you want to buy a season ticket for your local football team, you can be fairly sure how much it’ll cost in the coming season but who knows how much it’ll cost in ten years’ time?

With the recent cabinet reshuffle bringing more women to the fore, I wondered not just about the positions held by men and women but about equal pay. Research done in 2011 by the Chartered Management Institute predicts that men and women will not be earning equal pay until 2109 if current trends continue. In the 12 month period leading up to the results, a 2.1% and 2.4% increase in salaries was observed for men and women respectively.

How certain should we be about these predictions?

3064351634_2985f244c7_zA comparison was made between the £10,546 gender pay gap in 2011 and £10,031 from the same study the previous year. However, if we work backwards and reduce the current wages by the 2.1% and 2.4% growth rates to get an estimate of last year’s pay gap, this doesn’t match up. By this method, the estimated previous pay gap is about £10,421- giving a smaller absolute difference between the two years. As there is an inconsistency between the current and the previous year, so how can the predictions using the same growth rates be taken seriously out to 2109?

To emphasise this point further a similar report in 2008 about equal pay for women being ‘several generations away’, found that women’s pay increased by 6.8% over the year, compared with 6.6% for the men. The associated statistics then show that women will not receive equal pay equal pay until 2195

This very simple comparison highlights one of the problems with predictions: they are just a guess. In the case of using the same salary rates to predict way into the future, it should be made clear that these figures are uncertain at best and at worst, simply a fantasy- just like having superpowers.

Post by: Nathan Green

Dirty Tricks: World Cup Sweepstake – an analysis of 2014’s dirtiest teams

It’s ironic that the Brasil 2014 World Cup was one of the “cleanest” World Cups for some time. Ironic, because this was the World Cup where the Uruguayan striker Luis Suarez had another bout of teething trouble and chowed-down on an Italian shoulder. This hunger aberration aside, the players generally behaved themselves or at least when they did trespass perhaps the officials were prepared to be lenient if indeed they witnessed any wrong-doing at all.

This all brings me to an important consideration that I’m sure has divided many offices – The World Cup Sweepstake. A national tradition. An unmissable ingredient to raise the pulse. The best way to waste a pound.

The competition winner was fairly obvious. As was the runner-up. But then we get into some uncertain waters. The alternative prize at stake in my competition was the dirtiest team. So how to work this one out? Just give it to Uruguay? Or the team with the most red cards? Or the team with the most straight red cards perhaps?

Below is a plot of the yellow and red card statistics for each team, with the number of matches played in the competition in blue. When a player is given two yellow cards in the same game (resulting in a sending-off), this is shown separately to those yellow cards that do not contribute to a sending-off, green and red bars respectively. We can see that Brasil not only wore yellow but weren’t shy of picking up a few bookings here and there too. The Netherlands and Costa Rica also said hello to yellow more than the rest. The teams to the left up to Honduras all had a man sent for an early bath at some stage too. So, is Brasil the dirtiest team?

*Note: Click on graphs to enlarge

WC1

What we need is a combined yellow and red card statistic.The plot below shows some candidates. Let‘s assume that a yellow card is worth 1 point and a straight red card is worth 3 point so for example a team with one yellow and one straight red would score 4 points.

The question then is how much should a two-yellow-card-red-card (TYCRC) be worth?

The dark blue bars show the values if two yellows is simply worth the sum of two normal yellows i.e. two. The red bars are if its slightly worse but still not as bad as a straight red (2.5 points) and the green bars are if a TYCRC is worth the same as a straight red card (3 points). Purple is slightly worse still (4 points) and finally the light blue bars are if the TYCRC is worth the individuals yellows and the red card values (5 points). We see that in general Brasil are still naughty step contenders. Only when TYCRC is worth 5 do Costa Rica edge ahead.

WC2

But this still isn’t the whole picture. Obviously if a team played more games then they are more likely to have picked up a booking or two along the way so we should account for this. When we divide the numbers in the plot above by the number of games that each team played we get the plot below.

Now we get that when the TYCRD is worth two single yellows or just a bit more then Uruguay are on top. For any values bigger then Honduras come steaming-up on the inside to take pole position.

WC3So, what does this all mean? Well, personally, I would say that Brasil actually aren’t all that bad after all and that Uruguay are in fact the dirtiest team of the Brasil 2014 World Cup even without taking in to account the on-going antics of Luis the Chewey.

Post by: Nathan Green