As disk systems increase in storage, along with an increasing number of electronic devices, there is an increasing problem in imaging disk storage, as it can take many hours to archive a disk. There is thus a need for quick sampling of disk systems, in order to determine if there is any contraband material on a storage system. Along with this there can be a problem when computer systems are switched off, as evidence may be lost in the process of powering off. A mobile phone, for example, could loose its evidence if it is switched off, especially as it might not be possible to switch it back on. The method of sampling a computer system for possible contraband is known as triage.

Simon Garfinkel thus proposed a method of sampling disk systems in order to determine whether it contained contraband content. For these he sampled 512-byte sectors, and determined the hash signature, and then matches this against hash signatures of 512-byte block sectors from contraband content. If the hash signature matches a contraband one, there is a high probability that the rest of the contraband exists (or did exist) on the disk system.

### Probability Theory for Sampling the Disk

The key thing we need to understand is the probably of finding contraband material for a given number of samples, and for a given size of contraband file size. For this, let’s take a simple example. Let’s say there is 20 red balls, and 5 black balls in a bag, where the black balls represent the contraband. Let’s see if we can determine the chances of finding a black ball for each round of a ball being drawn.

**First round**

The probability of not drawing a black ball will be:

P(B) = 20 / 25 = 0.8

We thus have a 20% chance of finding a black ball. Now we put the red ball aside, and we now have 19 red balls and 5 black balls.

**Second round**

The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 = 0.633

We thus have a 37% chance of finding a black ball in the first two goes. Now we put the red ball aside, and we now have 18 red balls and 5 black balls.

**Third round**

The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 * 18/23 = 0.495

We thus have a 50.5% chance of finding a black ball in the first three goes. Now we put the red ball aside, and we now have 17 red balls and 5 black balls.

**Forth round**

The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 * 18/23 * 17/22 = 0.383

We thus have a 61.7% chance of finding a black ball in the first four goes. Now we put the red ball aside, and we now have 16 red balls and 5 black balls.

**Fifth round**

The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 * 18/23 * 17/22 * 16/21 = 0.292

We thus have a 70.8% chance of finding a black ball in the first five goes. Now we put the red ball aside, and we now have 15 red balls and 5 black balls.

**Sixth round**

The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 * 18/23 * 17/22 * 16/21 * 15/20= 0.21

We thus have a 79% chance of finding a black ball in the first six goes. By this stage, and there is no black balls drawn, we have a near 80% confidence that there are no black balls in the bag. Now we put the red ball aside, and we now have 14 red balls and 5 black balls.

**Seventh round**

The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 * 18/23 * 17/22 * 16/21 * 15/20 * 14/19 = 0.15

We thus have a 85% chance of finding a black ball in the first seven goes. By this stage, and there are no black balls drawn, we have a 85% confidence that there are no black balls in the bag. Now we put the red ball aside, and we now have 14 red balls and 5 black balls.

**Eighth round**

The probability of not drawing a black ball in this turn too will be:

P(B) = 20/25 * 19/24 * 18/23 * 17/22 * 16/21 * 15/20 * 14/19 * 13/18 = 0.11

We thus have a 89% chance of finding a black ball in the first eight goes. By this stage, and there are no black balls drawn, we have a 90% confidence that there are no black balls in the bag.

### Confidence levels

Let’s say now that we are happy with a 90% confidence that there are no black balls in the bag, and can end. In this way we only need to draw eight balls from a bag with 20 red and 5 black balls to be 90% confident that we’ll find at least one black ball. In the same way we can sample a disk. The following provides a calculation for the sample of a disk:

http://www.asecuritysite.com/coding/tri

For example for a 500GB disk, and for a contraband file size of 200MB, and with 20,000 samples, we end up with a 99.967% of finding the contraband, if it exists.