# Triage on Digital Systems

As disk systems increase in storage, along with an increasing number of electronic devices, there is an increasing problem in imaging disk storage, as it can take many hours to archive a disk. There is thus a need for quick sampling of disk systems, in order to determine if there is any contraband material on a storage system. Along with this there can be a problem when computer systems are switched off, as evidence may be lost in the process of powering off. A mobile phone, for example, could loose its evidence if it is switched off, especially as it might not be possible to switch it back on. The method of sampling a computer system for possible contraband is known as triage.

Simon Garfinkel thus proposed a method of sampling disk systems in order to determine whether it contained contraband content. For these he sampled 512-byte sectors, and determined the hash signature, and then matches this against hash signatures of 512-byte block sectors from contraband content. If the hash signature matches a contraband one, there is a high probability that the rest of the contraband exists (or did exist) on the disk system.

### Probability Theory for Sampling the Disk

The key thing we need to understand is the probably of finding contraband material for a given number of samples, and for a given size of contraband file size. For this, let’s take a simple example. Let’s say there is 20 red balls, and 5 black balls in a bag, where the black balls represent the contraband. Let’s see if we can determine the chances of finding a black ball for each round of a ball being drawn.

First round
The probability of not drawing a black ball will be:

P(B) = 20 / 25 = 0.8

We thus have a 20% chance of finding a black ball. Now we put the red ball aside, and we now have 19 red balls and 5 black balls.

Second round
The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 = 0.633

We thus have a 37% chance of finding a black ball in the first two goes. Now we put the red ball aside, and we now have 18 red balls and 5 black balls.

Third round
The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 * 18/23 = 0.495

We thus have a 50.5% chance of finding a black ball in the first three goes. Now we put the red ball aside, and we now have 17 red balls and 5 black balls.

Forth round
The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 * 18/23 * 17/22 = 0.383

We thus have a 61.7% chance of finding a black ball in the first four goes. Now we put the red ball aside, and we now have 16 red balls and 5 black balls.

Fifth round
The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 * 18/23 * 17/22 * 16/21 = 0.292

We thus have a 70.8% chance of finding a black ball in the first five goes. Now we put the red ball aside, and we now have 15 red balls and 5 black balls.

Sixth round
The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 * 18/23 * 17/22 * 16/21 * 15/20= 0.21

We thus have a 79% chance of finding a black ball in the first six goes. By this stage, and there is no black balls drawn, we have a near 80% confidence that there are no black balls in the bag. Now we put the red ball aside, and we now have 14 red balls and 5 black balls.

Seventh round
The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 * 18/23 * 17/22 * 16/21 * 15/20 * 14/19 = 0.15

We thus have a 85% chance of finding a black ball in the first seven goes. By this stage, and there are no black balls drawn, we have a 85% confidence that there are no black balls in the bag. Now we put the red ball aside, and we now have 14 red balls and 5 black balls.

Eighth round
The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 * 18/23 * 17/22 * 16/21 * 15/20 * 14/19 * 13/18 = 0.11

We thus have a 89% chance of finding a black ball in the first eight goes. By this stage, and there are no black balls drawn, we have a 90% confidence that there are no black balls in the bag.

### Confidence levels

Let’s say now that we are happy with a 90% confidence that there are no black balls in the bag, and can end. In this way we only need to draw eight balls from a bag with 20 red and 5 black balls to be 90% confident that we’ll find at least one black ball. In the same way we can sample a disk. The following provides a calculation for the sample of a disk:

http://www.asecuritysite.com/coding/tri

For example for a 500GB disk, and for a contraband file size of 200MB, and with 20,000 samples, we end up with a 99.967% of finding the contraband, if it exists.