Post-analysis of Scottish Independence Vote

Introduction

Fotolia_64516645_SOver the past few weeks I have being doing some analysis of the odds around the Scottish Independence Referendum, so let’s now see how the predictions actually went.

The final average odds for a No were 1/5 (5-to-1 on), with 4/1 for Yes vote. This seemed to go against the opinion polls at the time, so we analyse some of the trends around their predictions, and see where they got things right, and where they missed on the understanding of the dynamics. We’ll see that they basically got most things right, but missed out a bit on the head v heart analysis. They did though predict the result more clearly than the opinion polls did, which perhaps highlights a stronger understanding of human nature, and that highly skilled punters, too, can spot the correct odds for the current market.

Largest moves towards Yes or No

The bookies predicted that Dundee was the favourite for the strongest Yes, and so it was. In second and third place was Clackmannanshire and Glasgow. Table 1 and Figure 1 outlines the results, where it can be see that the Top 3 for the largest Yes vote in percentage terms were actually Dundee, West Dunbartonshire and Glasgow, so the bookies actually got No 1 and No 3 correct. Clackmannanshire, which was the first to be announced, actually ended up in 12th place. In terms of change, Inverclyde, Renfrewshire, North Lanarkshire and South Ayrshire show the greatest changes towards a Yes, while Clackmannanshire, Angus and Moray showed a stronger move towards No, where Clackmannanshire, which was 2nd favourite for the highest No vote ended up in 12th place. The writing was on the wall straight after Clackmannanshire was announced, as it was a No majority for the 2nd favourite for a Yes.

If we assume anything between +3 and -3 is roughly correct, the bookies managed to predict the following: Dundee City, Glasgow, East Ayrshire, Falkirk, South Lanarkshire, West Lothian, Midlothian, Argyll & Bute, Aberdeen City, Edinburgh, East Dunbartonshire, East Lothian, East Renfrewshire, Shetland Islands, Highland, Dumfries & Galloway and Scottish Borders. So the bookies have predicted more than 50% for share (17 out of 32).

They underestimated the strength of the Yes vote for West Dunbartonshire (-9), North Lanarkshire (-11), North Ayrshire (-12), Fife (-7) and South Ayrshire (-12), and underestimated the strength of the No vote for Comhairle nan Eilean Siar (+6), Stirling (+10), Perth & Kinross (+6) and Orkney Islands (+6). Orkney, which was predicted to be in 26th place, actually end up with the strongest No vote.

Table 1: Actual result and predicted place from odds

chart04chart06Figure 1: Actual result and predicted place from odds (-ve shows stronger Yes than prediction, and +ve shows a stronger No than prediction).

So … who won Punters or Bookies? Winner: Bookies.

Head over heart?

One thing that sticks out is the vote in Aberdeenshire, which was No 9 in most likely to vote for a Yes, but they ended-up at No 24. What seems to have happened is the Aberdeenshire is a strong SNP area, but, perhaps, the head ruled the heart … where the relative affluence of the area overruled the strong independence focus. Angus and Moray too are strongly SNP, but have gone more with their head, as both areas too are fairly affluent. The bookies obviously predicted that the heart would rule the head in these areas, and that’s why they didn’t manage to predict them in the correct position.

For heart over head, Dundee, Glasgow, and East Ayrshire were predictable; it was other West Coast areas which showed the great head over heart movement, including South Ayrshire moving from one of the least likely (No 31) to somewhere nearer the middle of the table. Renfrewshire and Inverclyde, though, stormed through the odds with massive heart over head turnaround.

East v West?

Scotland is changing as a nation, and the referendum vote perhaps highlights this in relation to the changes in the odds and the results. Table 2 should a rough split between North/South, West and East and tallies-up the scores for each. It can be seen that West scores -77 in changes, where many of the regions in the West produced stronger than predicted results, and the East scores a positive value of +69. The North/South also scores a positive with +8.

Table 2: East, West and North/South

chart10There are generally different demographics between the east and west of Scotland, with a migration of population from west to east. Edinburgh, for example, now has a population of 487,500 (nearly 10% of the total population of Scotland), and is growing at a rate of 1% each year. It also has a fairly young population, with 23.8% aged 16 to 29 years (as opposed to an average of 18.3% over Scotland). Midlothian, West Lothian and East Lothian have also been growing at a fairly fast rate, possibly due to the effect of a strong economy in Edinburgh, as opposed to failing populations in the West of Scotland. The areas of Moray and Angus have also seen growth due to the success of Aberdeen as a major Oil & Gas hub.

The different demographic is also highlighted with Glasgow has a life expectancy of 71.9 for males as opposed to Edinburgh where it is 77.2. This trend is not consistent, though, with areas such as East Dunbartonshire, appearing 26 out or 32, and having an average male life expectance of 79.4.

So who won in changing perception … East or West? East Coast.

Odds variation

The odds over the last few weeks did not reflect the opinion polls, especially in the last week or so, where No stayed around 1/4 and Yes started drifting out to 3/1, even as the polls were narrowing the gap. As the polls over the past few days showed the difference between Yes and No narrowing, the bookies started to push the odds out for a Yes. Figure 2 shows the variation of the odds for a Yes vote. It can be seen that there were a good deal of variations over August and September, but the bookies ended up with a fairly consistent consensus.

As seen in Figure 3, the high point of the campaign (in terms of the Yes vote odds) was 7 Sept, where the average odds came into 2.78, followed by a drift out to 4.39 by 13 Sept, and a slight drift-in to 3.91, but the last two days showed a drift out. Generally the odds didn’t quite mirror the opinion polls, which tended to narrow the gap over the week before the vote:

18 Sept: 4.52 (Out)
17 Sept: 4.23 (Out)
16 Sept: 3.91 (In)
15 Sept: 4.03 (In)
14 Sept: 4.19 (In)
13 Sept: 4.39 (Out)
12 Sept: 4.06 (Out)
11 Sept: 3.49 (Out)
10 Sept: 2.78 (Out)
09 Sept: 2.85 (Out)
08 Sept: 2.82 (Out)
07 Sept: 2.78 (In)

chart02Figure 2: Variation of No odds

chart03Figure 3: Yes odd for 30 days before poll

Key breakpoints

While the betting odds were fairly robust around opinion polls, there were three key break points that drove the odds:

  • Break point 1 (Darling wins). A key change took place around the 5 August debate, where the odds for a Yes vote odds had been dropping before the debate, but after it, the average odds for a Yes Vote moved steeply up (dropping one point in the days before the debate, and then rising 1.2, for four days after it):

9 Aug 5.75
8 Aug 5.43
7 Aug 5.05
6 Aug 5.05
5 Aug 4.54
4 Aug 4.39
3 Aug 5.5

  • Break point 2 (Salmond wins). On the day before the second debate there was a peak Yes Vote odds of 6.3 (22 August 2014), and this has since fallen to nearly 4 ( 31 August 2014). The largest changes in the odds have thus occurred around the debate points, with 5 August 2014 having 31 changes (where general the No vote odds have drifted out) and nine changes on 23 August 2014. Also, typically, there is an increasing rate of change of the odds as we move closer to the vote.
  • Break point 3 (Opinion polls come together, odds drift out). In the last three days before the vote, the opinion polls were showing a coming together, with one or two showing the Yes vote winning, but the odds started to drift out for a Yes, with some bookies (Betfair) actually paying out early, and other suspending the betting. The last two data points on Figure 3 show this trend.
So … who won … Opinion Polls or Bookies? Winner: Bookies.

% Share of the Vote

The opinion polls were putting the difference between Yes and No within just a few points in the last few days. The bookies though were still predicting 45-50% for a Yes vote, and this is where it ended-up. In the end the bet for 45-45% was 2.1 (which is almost an even money bet). It can be seen from Figure 4 that the prediction was for the lower end as the odds for 40 to 45% are much lower than 50-55%:

40% or less 6.8
40 to 45% 3.5
45 – 50% 2.1
50 – 55% 5.1

chart07Figure 4: Percentage share for Yes vote

So … who won … Opinion Polls or Bookies? Winner: Bookies.

Prediction on turn-out

The best bet of all was actually related to the how well the Scottish electorate did in turning-out. The actual turn-out was 83.9%, which few could have predicted, and the bookies got this right in terms of the favourite with 7/4, but it was an excellent bet, as there was a nearly 9% difference in the lower end of the bet:

>75% 7/4
70% – 75% 12/5
60% – 65% 5/2
65% – 70% 14/5
55%- 60% 23/5
55% 23/5

So … who won … Opinion Polls or Bookies? Winner: Bookies.

Wayback

Do you want to see the odds the day before the election? Well we can look in the Wayback engine here.

Conclusions

What was surprising in the run-up to the vote was that the bookies were not reflecting the trends of the opinion polls. While the opinion polls were getting closer, the odds for a Yes were drifting out. One of them was going to be wrong, and it looks like the bookies where predicting the result more accurately, and not going with the trends in the opinion polls. This perhaps shows that bookies understand the dynamics of human nature than targeted sampling.

The split between east and west cannot be ignored in terms of the prediction in how they would vote, with the West showing strong movement towards Yes, and the East, generally, away from it. Perhaps the risks around finance sector and Oil & Gas on the East Coast operated more for the head than the heart? In the west, there seems to have been a strong heart pull.

Which was the best bet? Ans: the Scottish People. Bookies gives odds of 7/4 for >75%, and where the end turn-out end up at 83.9%.

Further Analysis of Betting Odds for Scottish Independence Referendum

Please refresh your browser cache for the most up-to-date charts.

Introduction

Fotolia_64516645_SWith just one day to go to the vote, this analysis outlines the current odds, and may give some pointers to the outcome. The data used in this analysis looks back at the daily odds for a Yes vote from 23 bookmakers over the last five months (1 April – 17 September 2014).

Even in the day before the vote, there is a great deal of change with the average decimal odds for a Yes Vote at 4.3, and which has drifted out from 3.91 (16 September 2014). There’s a bit of movement that isn’t quite consistent, as 14 have odds which are lengthening for a No vote, and six with odds reducing. For the Yes vote with odds sit at 1.22 which has drifted in from 1.28 (16 September 2014). A typical vote is 1.22 (2/9) for No and 4 (3/1) for Yes. So at this point there is still a fairly wide gap in the odds.

Figure 1 outlines the odds for the past five months. It can be see that the odds for a No vote are approximately where they started off with at the start of May 2014, with a large drift out at the start of September 2014, but drifted back in over the last few weeks.

Figure 1: Odds on a Yes vote (using decimal odds)

Outline of odds

In the independence poll, there are only two horses in the race, so there is either a Yes or a No bet. The way that odds are normally defined is the fraction which defines the return, so Evens is 1/1, where for every £1 bet, you will get £1 back in addition to your stake (so you get £2). If the odds are 2/1 (2-to-1 against), you get £2 back plus your stake (so will get £3 on a win). For 1/2 (or 2-to-1 on), you get half your money back, and you’ll get £1.50 on a win. These types of odds are known as fractional odds, where the value defines the fraction for your payback. The multiplier, though, does not show your stake coming back to you, so decimal odds are used to represent this, and defines a value which is multiplied to the stake to give the winning amount (basically just the fractional odds plus 1, and then represented as a decimal value).

The factional odds value of Evens gives a decimal odds value of 2 (where you get £2 back for a £1 stake), and 2/1 (2-to-1 against) gives 3.0, while 1/2 (2-to-1 on) is 1.5. In terms of roulette, Evens would define the odds for a bet of Red against Black (as each are equally probable). In roulette, though, the odds are slightly biased against the player for a Red v Black bet, as 0 changes the odds in favour of the casino. For betting, overall, bookmakers try to analyse the correct odds so that they have attractive ones (if they want to take the best), against others. If they take too much of a risk, they will lose, so their odds around the independence vote should be fairly representatives of the demand around bets, and the current sentiment around the debate.

Percentage share of the vote and turnout

Figure 1 outlines the decimal odds for the Yes vote, and it perhaps reflects the closeness of the vote, with an Evens money bet for a share between 45 and 50%, with 3.5 for 40% to 45% and 5.1 for 50% to 55%:

40% or less 6.8
40 to 45%    3.5
45 – 50%     2.1 (approx Evens).
50 – 55%     5.1

In terms of the turnout, the favourites are over 85% and between 80 and 85%:

Over 85%  2.62
80 – 85%    3
75 – 80%    3.5
70 – 75%    5.5
65 – 70%    17

chart04Figure 1: Percentage share for Yes vote (17 Sept 2014)

The Trend

The trend for the last 30 days is shown in Figure 3. The key turn-around in the odds for the No vote happened after 22 August, where the odds were as high as 6.27. They generally slipped over the weeks after that hitting a plateau around 7 Sept until 10 Sept 2014 (down to 2.78), and rose back up to 4.39 over the next four days, and have started to come back down over the past four days. The past day, though, has seen them rise again, perhaps outlining the small lead in the polls for a No vote.

Figure 3: Yes odds for the past 30 days

Figure 4 shows the variation of the odds over the 30 days for the Yes vote odds, and highlights the drift out of the Yes vote around 8 Sept 2014.

Figure 4: Yes Vote odds for 23 bookmakers over August 2014

Geographical trends

In terms of voting for the place that is most likely to have the strongest Yes vote, Dundee has been out in-front for many months, with current odds of them having the largest Yes vote at 1.5 (1/2), and quite a long way in front of the Clackmannanshire (5/1) and Glasgow (8/1). Table 1 outlines the split of the Top 10 most likely to vote Yes (in terms of betting odds), and the Top 10 least likely to vote Yes. Of the areas which are the strongest, it is difficult to generalise, but the Highland areas, and the West of Scotland are strongest, whereas the South of Scotland, East of Scotland, and the Northern Isles the least likely (based on the odds).

Overall the major polulation areas on the east coast of Scotland, apart from Dundee, are generally in the bottom half of the geographical split, with Edinburgh (23th out of 32) and Aberdeen (19th out of 32):

  • Dundee (1st out of 32).
  • Glasgow (3rd out of 32).
  • Stirling (12th of 32).
  • Perth (17th out of 32).
  • Aberdeen (19th out of 32).
  • Inverness (22nd out of 32).
  • Edinburgh (23rd out of 32).

Table 1: Geographical split on most likely to have the strongest (and weakest) Yes vote

Top 10 to Vote Yes Top 10 least like to vote Yes
1. Dundee (most likely) 23. Edinburgh
Clackmannanshire Renfrewshire
Glasgow East Lothian
Na h-Eileanan Siar Orkney
Angus Shetland
Moray Scottish Borders
East Ayrshire East Dunbartonshire
Falkirk East Renfrewshire
Aberdeenshire South Ayrshire
10. Highland 32. Dumfries and Galloway (least likely)

The key changes

A key change took place around the 5 August debate, where the odds for a Yes vote odds had been dropping before the debate, but after it, the average odds for a Yes Vote moved steeply up (dropping one point in the days before the debate, and then rising 1.2, for four days after it):

9 Aug 5.75
8 Aug 5.43
7 Aug 5.05
6 Aug 5.05
5 Aug 4.54
4 Aug 4.39
3 Aug 5.5

On the day before the second debate there was a peak Yes Vote odds of 6.3 (22 August 2014), and this has since fallen to nearly 4 ( 31 August 2014). The largest changes in the odds have thus occurred around the debate points, with 5 August 2014 having 31 changes (where general the No vote odds have drifted out) and 9 changes on 23 August 2014. Also, typically, there is an increasing rate of change of the odds as we move closer to the vote.

Conclusions

There is a strange dynamic going on in the betting market, as the polls are saying it is close, but the bookmarkers are not reflecting that. Overall there’s only one thing that really matters, and that is the vote tomorrow, and opinion polls and bookmarker odds can only make best guess on a generalised feeling of the nation. For this vote, may perhaps show a new dynamic, where it is difficult to generalise feelings and analysis purely by opinion polls or general sentiment. It has been a vote which most of the printed media outlets have backed the No campaign, but, in these days of social media, they are not necessarily the main outlets for information.

Key observations:

  • Some bookmakers, especially the spread betting ones have rapid changes in odds, while other are static (with one bookmaker making changes in the odds one a month).
  • The largest number of changes in a single day, over the past five months, was 31 and occurred on 5 August 2014, which was the date of the debate between Alex Salmond and Alistair Darling.
  • Dundee, Glasgow and Clackmannanshire come out way in-front for the place most likely to vote Yes, with Dumfries and Galloway the least.
  • The odds are erring on the side of a Yes vote from 45% to 50%.

One disappointing factor is that Betfair has already started to pay-out on a No vote, as they reckon it is 78% certain (which in no way is even nearly a certainty). In fact, that’s the odds of not pulling out the Ace of Spades from four cards holding each of the aces. In one in four turns we will pull-out the ace of spades. So I do maths and not politics, but I do understand that 78% is nowhere near certain.

Well only tomorrow will actually define what really has happened … the bookmakers and punters can only speculate.

Note: This is a non-political analysis, and is purely focused on analysing open-source data related to bookmaker odds. It is inspired by the usage of big data analysis, and how this identifies trends.

The Architects of the Future – Creating the Virtual Infastructure for our Lives

Introduction

AEnergy efficient constructions we stand at the beginning of a new semester, our computer science students stand in a place to become the architects of the future. It is within the Internet and the Cloud that we are building a new World, and one which does not differentiate any class or nationality, as it completely inclusive for everyone in the World.

Few technologies have ever managed to make such an impact on our lives, and it is our new Computer Science students who will build these systems. In time, too, the Internet will improve our health and social care, and will deliver education to every single person, also providing everyone with a voice and a platform to showcase their talents (I appreciate that it can go the other way, and that it can provide a barrier to these things to, but our new architectures have the chance to improve things, and not see national or physical barriers getting in their way).

As students start their career in computing science, they must think clearly about their future, and position themselves to pick-up the skills that are required to re-architect the most amazing infrastructure that mankind has ever created: The Internet, and within the Internet we are building the most amazing building ever: The Cloud. As part of this there are so many new rock-star careers being created which will properly build it including for the great growth areas of Cloud Computing, Big Data, Cyber Security, and e-Health (Figure 1).

Figure 1: Four great career of this new generation: Cloud Computing, Big Data, Cyber Security and e-Health

Changing landscape

We are in a phase of saying goodbye to the desktop PC, which has served us well for the last 30-odd years. The future,though, is to continue their legacy to build clustered computing environments, using heterogeneous servers, and move away from stand-alone systems to ones who build into a generalised computing infastructure.

This new infrastructure will allow our desktops within a cloud infrastructure, rather than run it on a physical computer. This has many advantages for companies, especially in updating desktops within their cloud, rather than on physical hosts.

So we are in a phase where we are re-architecting our information infrastructure, where we have moved though a phase of stand-alone computers (in the 1980s), onto congregated hosts around physical servers and in gaining access to software components (DLLs), and now onto running our computing infrastructure in centralised way, where we use terminals to access the resources - thin computing, and where applications are building by binding to Web services – Figure 2. This change is reflected in a new software architecture: SoA (Sevice-Oriented Architecture) where software applications are created by binding them to services which run in the Cloud.

222Figure 2: Re-architecting

Hello Clustering

The demand for processing power, and data storage are becoming key drivers with the spend on servers increasing 6% over the past year (and over 4% in Q1 of 2014, alone). It is thought that 17% of this spend is for Big Data/Cloud applications. Overall the Intel x86 architecture has the lead, with nearly 80% of the market revenue. The growth in servers is also identified with an increase of 19% in Microsoft Windows and 15% growth for Linux.

HP (35.7%), Dell (15.1%) and Oracle (7% – gained from their acquisition of Sun Microsystems) all recorded increases between 7 and 8%, while IBM (22.2%) dropped their review by 15.5%, mainly due to them selling off their lower-end x86 server architecture technology to Lenovo. Cisco Systems, while only gaining 4.8% of the market, have shown a 63.1% increase in server turnover.

Advanced Teaching Cloud

To create the architects of the future, we need to ways to support them, where they can learn in a safe environment, and where they can learn the limits of what possible. We thus need new virtualised environments where students can learn about all of the elements of what creates these new buildings, and understand how to design, build and look-after them.

At Edinburgh Napier, we’ve been building our own training cloud here, as we found it has many advantages of running virtualised infastructures, where we can create real-life information environments (figures 3 and 4). This environment can range from building full-defined systems, which connect to the Internet, to ones which are fully sandboxed and which they can analyse malware spreads and in using advanced security tools.

Slide8Figure 3: Advantages of using the Cloud for training

Slide11Figure 4: DFET Virtual Training environment

Conclusions

What we are seeing at the current time is a re-architecting of information systems, from physical hosts congregated against physical servers, we are moving to the point where virtual hosts congregate around virtual servers, running in a Cloud infrastructure. This is a massive change, and it is the clustered servers who provide a resource for all the hosts and servers to share the same clustered environment, so that everything is controlled by software, and where hardware does not limit any of the virtual hosts.

For those graduating, studing or entering computing science, there has never been such create opportunities, and it is to you that we look to, to rebuild this amazing infastructure, and create something that has benefits for every person in the World. At one time knowledge was locked-away within privileged cities and countries, but not any more, the Internet has enable knowledge for all, no matter their background, location or financial status.

So there was fire … the wheel … the transistor … and Cloud Computing!

Your Whole Life on a Postage Stamp

Introduction

SanDisk have just created an SD card with 512 GB of memory, and it is expected that 2 TB will be achieved from the format. As someone who also has over 1TB in both my Microsoft OneDrive and Dropbox, but only 8GB on my university account, it seems that my corporate storage systems are not keeping up with the latest trend in the scale-up in cloud-based storage. It should be remember that security in the Cloud is not really an issue, as it is possible to storage into a cloud-based data bucket, and encrypt the data. So as data storage capacity has increased 1000-fold over the last 10 years, my corporate storage has increased by a factor of eight.

Intel created the first DRAM (dynamic random-access memory) chip in 1971. It was named the 1103 and could hold 1kB of data. DRAM chips are made up of small capacitors which are charged up with electrical charge (for a binary 1), or discharged (for a binary 0). As they use the charging up and discharging of capacitors, they tended to be slower than the static version – SRAM (dynamic random-access memory), which toggle the state of a pair of transistors. A SRAM needs a larger space than DRAM, DRAM has often been used to create larger memory storage chips than the equivalent SRAM ones. Both SRAM and DRAM lose their contents when the power is taken away (volatile memory), so storage systems use nonvolatile memory to preserve the data when the power is taken away.

Life on a postage stamp

The IBM PC, released in 1981, only had around 1MB of memory and a 30MB hard disk. Now we are looking at 2TB on a card the size of a postage stamp (where the area is mainly taken up with the physical layout of the card and connector). One of the key growth areas of computing is likely to be in-memory computing, where it is possible to store all the data you need locally on memory, and have no real need for connections to the Internet. With a 2TB data storage, you could probably hold all the data you are ever going to need.

So let’s look at Bob’s footprint over his 80 years on the planet:

  • Email. Bob sends 300 emails a day, and receives 400, each have 1000 characters (1B), so that’s 700KB each day, and 255MB a year. Then over 80 years this generates 20GB of emails. Footprint: 0.1%.
  • Photos. I take five photos every day, each are 1.5MB. This 547MB each year, and over 80 years it creates 43GB. Footprint: 2.15%.
  • Documents. Bob creates 10 documents each day, with an average of 1MB for document, which creates 3.6GB over a year, and 292GB over 80 years. Footprint: 14.6%.
  • Bob loves Wikipedia, which takes around 10GB of space, and this would take Footprint: 0.05%.
  • Social media. Bob wants to save ever post that he has made to Facebook, Twitter, and other social media sites. This is an average 30MB of data each day. This gives 10.9GB of data each year, and 876GB of data over 80 years. Footprint: 43.8%.
  • Video. Bob takes a 5 videos each week, with a size of 100MB. That’s 5.2GB each year, and 416GB for 80 years. Footprint: 20.8%.

So that is 81% of the SD card used and we have stored the whole of Bob’s life … every email, photo, media post … in fact everything on the size of a postage stage.

Why don’t corporate systems keep up with the Cloud?

Corporations are still struggling with the Cloud, and knowing how they should use public and private cloud, and how to create a single entity which keeps some things locally, but can burst into public cloud spaces (Figure 1). The problems with the scalability have continually been around performance, resilience, and security.

  • Security. Security can be solved by creating an encryption layer for all the data which leaves the corporate infrastructure and is then stored in a public cloud.
  • Resilience. With resilience, the main cloud providers such as Amazon AWS and Microsoft Azure have shown a near 100% up-time over the past few years, with very few problem with outages. The last major outage for Amazon Web Services (AWS) happened in August 2013 for nearly an hour and was due to issues in their North Virginia datacenter. It mainly affected Amazon.com, but it’s effect on many companies who had built their own business in the Cloud, such as Vine and Instagram.It is estimated that Amazon lost as much as $1,100 in net sales per second (to put into context a five-minute outage in August 2013 cost Google $545,000).
  • Performance. Performance has always been an issue, especially where network connections are slow, or become busy over certain time periods. This as an issue is reducing as high-speed network connections provide fast response rates, especially where the content is placed at the edge of the public cloud.

Slide3Figure 1: Public, private and hybrid clouds

Conclusions

Corporations are playing catch-up with the Cloud, and many are in the process of understanding how they can create a single Cloud infastructure, which is integrates their private cloud infastructure with a public one. In order to do this effectively they need to understand issues around security, performance and resiliance. When this is done, then perhaps I can get more than 8GB for my storage.

We now have the opportunity to store all the data that we need in-memory, and have no need to store in remote databases. Increasing applications may store all the data they need in-memory, and avoid fetching is from relatively slow network connections.

Is your Digital Shadow Exempt from the 4th and 5th Amendment Rights?

Introduction

George Orwell wrote in 1984: “Big Brother is Watching You”, and while there is little evidence of large-scale government surveillance within the UK, there are, though, increased opportunities for governments around the world to snoop and gather evidence on citizens. Luckily we have acts such as DPA (Data Protection Act) which protects us from those who aim to gain access to data held within protected infrastructures, but the ease of access to this data increases the opportunities for spying. The flip side of DPA is RIPA (Regulation of Investigatory Powers) Act, which is a law which allows law enforcement agencies access to data on citizens for their Internet records. In the US the PRISM program provides an easier mechanism to access to cloud-based records, especially as it has now gained access to the nine major Internet companies, including Microsoft, Google, Facebook and Apple.

With the growth of the Cloud, social media and mobile devices, we are all leaving digital shadows of our activities, whether it be Twitter posts, Facebook activity, mobile phone records, and so on, and leave behind a digital shadow which is difficult to erase. Figure 1 outlines some of the traces that we leave, and much of these data sources are open source.

Screen Shot 2014-09-13 at 16.54.15Figure 1: Big Data traces

Law Enforcement Requests

Last week it was revealed that, in 2007, Yahoo refused a demand from the NSA for a bulk demand of email metadata, but subsequently lost its fight both in the Foreign Intelligence Surveillance Court (FISC) and in an appeal to the Foreign Intelligence Court of Review. Yahoo then, in 2008, finally ended its resistance to the NSA’s PRISM program when it faced a $250,000 a day fine if it didn’t comply. In fact it was one of the first of nine major Internet companies who were forced to comply with these requests, and who include Microsoft, Google, Facebook, Youtube, Skype, AOL and Apple.

Yahoo have since fought to unseal the case documents in order to provide some transparency around the data collection programs, and in the fact that the FISA Court’s has approved nearly every data request [Link]. Several companies such as Yahoo have been criticised for allowing data to be released, but the newly released records shows that they doggedly tried to fight against it.

Yahoo fought back on Fourth Amendment grounds, which prohibits unreasonable searches and seizures, and that these must be sanctioned by a judical warrent, and supported by probable causes. In the PRISM requests, Yahoo felt that the requests were too broad in their scope, and thus violated the Constitution.

Working across borders

One of the greatest challenge for investigators is working across national boundaries, and UK law enforcement often struggle to gain access to digital information which is held within US-based Cloud infrastructures. Government departments in the US seem, though, to have a much stronger ability to release information from Microsoft for information within their Dublin-based Cloud infrastructure.

In the UK, RIPA defines that law enforcement agencies can gain access to digital information on citizens, with the support of a warrant, whereas, in the USA, the PATRIOT Act has a much wider coverage for law enforcement agencies to obtain information on individuals, if relevant to counter-terrorism or counter-intelligence investigations. At the most extreme end, the USA PATRIOT Act is Section 215 allows the FBI to gain information from the Foreign Intelligence Surveillance Court related to international terrorism or espionage. This allows the US authorities access to personal data stored from within the EU by US-based companies, and which completely disregarding UK and EU legislation on data protection. These requests are known as National Security Letters (NSL), and are requests from the FBI to organisations, and should not relate to ordinary criminal, civil or administrative matters.

In 2013, Google published in its transparency report, that it received 53,356 requests for data affecting 85,148 accounts [2]. Table 1 outlines the requests within the USA, and the percentages of these that were accepted, along with the number of accounts affected. It can be seen that the rate of acceptable of the requests ranges from 75% to 100%. For 2013, Microsoft received a total of 35,083 requests related to 58,676 user account, and which resulted in a rejection rate of 3.4% [1] (although 17.85% of the requests resulted in no data being found). It can thus be seen that the majority of requests are accepted, and go forward to a disclosure of the requester.

Table 1: Google requests for personal information (2013, USA)

Type Requests Requests Accepted User Accounts affected
Other Court Orders 689 75% 1,588
Search Warrant 2537 81% 4,180
Emergency Disclosures 153 78 % 217
Subpoena 7,044 84% 11,999
Pen Register Order 140 90% 259
Wiretap Order 11 100% 11

Protecting data

Users should always encrypt sensitive data into cloud infrastructures, but they must be able to hand-over their encryption key when required. This was highlighted in 2014 when Christopher Wilson, from Tyne and Wear was jailed when he refused to hand encrypted passwords related to investigations related to an attack on the Northumbria Police and the Serious Organised Crime Agency’s websites. He handed over 50 encrypted passwords, but none of these worked, so a judge ordered him to provide the correct one, but after failing to do this, he received a jail sentence of six months.

In 2012, Syed Hussain and three other men, were jailed for discussing an attack on a TA headquarters using a home-made bomb mounted on a remotely controlled toy car. Syed, who admitted have terrorist sympathises, was jailed for an additional four months for failing to hand-over a password for a USB stick.

In the UK, citizens have the right to silence (a Fifth Amendment Right in the US – related to the right against self-incrimination) but there is an exception to this related to encryption keys, and the failure to reveal encryption keys can often be seen as a sign that someone has something to hide, and is covered by Section 49 of RIPA.

Who protects you?

As the Big Nine Internet companies final gave into the PRISM Act, there is a general worry about the scope of the PATRIOT Act in the US. Thus the EFF (Electronic Frontier Foundation) have awarded gold stars to organisations for the following [3]:

Star 1: Requires a warrant for content.
Star 2: Tells users about government data requests.
Sart 3: Publishes transparency reports.
Star 4: Publishes law enforcement guidelines.
Star 5: Fights for user privacy rights in courts.
Star 6: Fights for user privacy rights in Congress.

The six star companies include Dropbox, Google, Microsoft, Twitter, and Yahoo, while Amazon gains two stars (stars 1 and 5), along with AT&T (stars 3 and 4).

Getting rid of your digital shadow

Our digital footprint on the Internet is being traced on a continual basis, with information either logged on the information we readily provide to the Internet, but there is also a whole lot of information that is logged without us having control of it. Within the Google Cloud, there is information on our locations, our Web history, the Apps will install, and all of which can be used to build-up a picture of our activities.

Figure 2 shows an example of where I travelled from Edinburgh to Aberdeen for a Xmas Schools Cyber lecture. It shows the complete journey archived from location tracking from an Andriod phone. You can see it also contains a timeline, so that the speed over various parts of the journey can be calculated. Both Apple and Andriod phones, by default, gather this information can store it within cloud-based systems, giving a mine of information for investigators.

With the increasing usage of cloud infrastructures, it is extremely difficult to actually remove the full traces of digital footprint, especially with back-up systems storing deleting files and with disk systems still containing fragments on their disk, evening though the files have been deleted. In fact, a disk can contain a fragment of a file for years after it was deleted from a disk.

Screen Shot 2014-09-13 at 18.37.37Figure 2: Example location information in Google Cloud

Conclusions

We are increasingly creating a long digital shadow in the Cloud, and this information is typically stored within the cloud infrastructures created by US-based companies, such as Google, Microsoft and Apple. It has been seen that these companies often must comply with the PATRIOT Act, even when it overrules European data protection laws. Law enforcement agencies in the UK often struggle to gain access to US-based information, thus agencies within the US have an advantage over their UK based equivalents.

In terms of spying on citizens, there are increasing opportunities for this to happen, but our citizens should be protected in terms of data protection acts. The major change happens around criminal investigations, where RIPAA allows law enforcement to gather cloud-based data, typically held with ISPs (as these tend to be UK-based). They have a much greater challenge in gaining access to information held by US-based companies.

In terms of protecting information in Cloud-based systems, encryption is the best solution, but there is no guarantee that a law enforcement agency will not come along and demand the keys to decrypt the information. For many, with some many passwords, this might be difficult to comply with it, and innocent citizens could be improvised for just forgetting their password, or not known the place they have kept their secret keys.

For Yahoo, from being criticised for being one of the first Internet companies to comply with requests related to PRISM, they have now been shown to actually have fought against the request. For them they have fought to release the documents around the fine, in order that there is some transparency around it. For the major Internet companies, such as Microsoft, Google, Facebook and Apple, there is a strong focus on user trust, and they are keep to make sure that they can build trust with the user, in that the companies will fight on their behalf against PRISM requests.

References

[1] http://www.microsoft.com/about/corporatecitizenship/en-us/reporting/transparency/
[2] http://www.google.com/transparencyreport/userdatarequests/
[3] https://www.eff.org/who-has-your-back-2014

Gardai Email Data Leakage – When Blind Carbon Copy Saves the Day

safe the webIntroduction

Few users on corporate systems can say that they have never regretted sending an email to a distribution list, and tell something that was just meant for one person. Luckily many email systems will default to send back to just the originator of the email, but mistakes can often still happen, and many users receive emails which reveal a little too much information on the distribution list. Personally I have seen many emails which are send with a To: list which include many email addresses who should be kept private.

Forgetting to Blind Copy

So the yesterday, the Gardai had to apologise for data breach which released over 1,700 email addresses, and which was blamed on an ‘administrative error’. A particular problem with this is when an email is to be sent to a contact list, and where they are supposed to be sent through a blind carbon copy (BCC:) and by mistake end up on a carbon copy (CC:) list. Most users understand the difference, and known that a BCC: version does not release the full distribution list. Care must be taken if an email is sent to a person, and they do not known that it includes a BCC list, and then one of the users on the list sends back an email to in the To: field, which can cause some embarrassment for the reason for the BCC distribution (the person on the BCC is meant to know that it’s a secret distribution). Often the method used is to set the sender of the email to being both the sender and receiver of the email, and all others on the BCC: list.

The data breach happened within Dublin North Central Gardaí when they sent our their community policing information bulletin to their distribution list, but did not hide recipients’ addresses to others. This is seen as a breach of Ireland’s data protection laws related to the leak of personal data. While they tried to recall it, the mechanisms for recall often do not work, once the email has left the local system, as they user is often prompted as to whether they want to delete it or not. Like it or not, most users actually view the email, even though the sender has tried to recall it.

Also, the recall can actually make the breach worse, especially onto non-Microsoft Exchange servers, as the To: field will also include the email addresses of the users who where sent the email in the first place, doubling the impact. Luckily there wasn’t any sensitive information in the newsletter, but the email distribution list included the email addresses of others in the community.

Conclusion

This is an unfortunately mistake, and apart from user training, there are many solutions to this. The best solution is to avoid BCC altogether, and use an emailing system which takes a list, and then sends the email to each address on the list, and then checks the outgoing emails so that there is no other possibilities for data leakage.

Along with this, in sensitive environments, back-end email systems should check the number of users in the CC: or TO: fields to make sure there are not too many setup. It can also be avoided by buffering emails by taken the client off-line, and then getting others to check the email before it is sent.

Data breach was unfortunate, especially as the newsletter was a key mechanism in engaging with the local community. It would be hoped that there is not a knee-jerk reaction to this in the Gardia, and that the engagement continues, but where future breaches are avoided with improved procedures.

Few users can say that they have never made a mistake in sending an email, so this must be acknowledged in this case – as mistakes happen. The key thing is that organisations need to set in-place safeguards, in order to protect themselves, and for large-scale data breaches.

As much as possible, users need to be careful when sending to large-scale distribution list (such as across a whole organisation), as a single Reply All can result in a great deal of embarrassment, along with a great deal of annoyance.

Gardai Email Data Leakage – When Blind Carbon Copy Saves the Day

Introduction

Few users on corporate systems can say that they have never regreted sending an email to a distribution list, and tell something that was just meant for one person. Luckily many email systems will default to send back to just the originator of the email, but mistakes can often still happen, and many users receive emails which reveal a little too much information on the distribution list. Personally I have seen many emails which are send with a To: list which include many email addresses who should be kept private.

Forgetting to Blind Copy

So the yesterday, the Gardai had to apologise for data breach which released over 1,700 email addresses, and which was blamed on an ‘administrative error’. A particular problem with this is when an email is to be sent to a contast list, and where they are supposed to be sent through a blind carbon copy (BCC:) and by mistake end up on a carbon copy (CC:) list. Most users understand the difference, and known that a BCC: version does not release the full distribution list. Care must be taken if an email is sent to a person, and they do not known that it includes a BCC list, and then one of the users on the list sends back an email to in the To: field, which can cause some embarressment for the reason for the BCC distribution (the person on the BCC is meant to know that it’s a secret distribution). Often the method used is to set the sender of the email to being both the sender and receiver of the email, and all others on the BCC: list.

The data breach happened within Dublin North Central Gardaí when they sent our their community policing information bulletin to their distribution list, but did not hide recipients’ addresses to others. This is seen as a breach of Ireland’s data protection laws. related to the leak of personal data. While they tried to recall it, the mechanisms for recall often do not work, once the email has left the local system, as they user is often prompted as to whether they want to delete it or not. Like it or not, most users actually view the email, even though the sender has tried to recall it. There wasn’t any sensitive information in the newsletter, but the email distribution list included the email addresses of others in the community.

Conclusion

Apart from user training, there are many solutions to this. The best is to use a package which creates the emails one at a time, and then prepares them for the user to review and check. Otherwise, in sensitive environments, email systems should check the number of users in the CC: or TO: fields to make sure there are not too many setup.  The release is unfortunately as it was a key mechanism that was being used for engaging with the local community, so it would be hoped that they engagement continues, and that future breaches will be avoided. Few users can say that they have never made a mistake in sending an email, so this must be acknowledged in this case – as mistakes happen. The key thing is that organisations need to set in-place safe-guards, in order to protect themselves, and for large-scale data breaches.