Tuesday, May 29, 2012

EDRM Processing - Metrics


Processing - Metrics

One of the biggest challenges that occurs when dealing with electronic data, is estimating the volume when all that is known is the total GB to process. Since the overall volume will have significant impact on the project as a whole, it is important to understand the circumstances that will drive that estimate.



Means of Measuring


Pages

In a lot of cases the overall review time and cost for a project can be determined by the total number of pages that will be reviewed, and eventually produced. This can be better estimated the more you know about the collection. If you can separate the total volume, and identify the amount of email data, application data, and non printable data, you can get a more accurate estimate then you would base on volume alone.

Number of Documents

Since another important driver in how much effort will need to be put in to the document review, is the number of documents that will be reviewed, estimating this can be a valuable statistic. Although there are quick ways to identify the number of documents in the collection, it becomes more challenging to quickly identify the documents that will be removed from the culling process.

Culling Rate

The amount of deduplication can vary greatly based on the nature of the data (backups, live data, or a combination), the scope of the deduplication (within or across custodian), and the custodian retention habits.
Searching/Filtering is another aspect that is important to consider when estimating the overall volume that will be delivered for review. Depending on the on the number of terms, and the nature of the documents the results can vary greatly.

Non-Printable Files

Non-printable files are documents that in general will not be delivered or reviewed. Therefore it is important to exclude them from the document/GB/page estimates in order to yield more accurate results.

Industry Benchmark Survey

The table below lists some industry averages that can be used as a tool for guidance for estimating a document collection:

Benchmark
Value

High
Median
Low
Images *[1] per GB
78,671
47,213
18,534
Images per file email
11
4
2
Images per file app files
63
10
3
Files per GB email
36,530
22,572
9,934
Files per GB app files
20,305
15,791
7,553
GB per custodian email
5
2
1
GB per custodian app files
4
1
0
Culling Rate Percentages



Deduplication
51%
21%
6%
Searching/Filtering
64%
61%
23%
Non-printable files
22%
5%
2%
Processing Speeds



Process time per GB native
117
33
11
Process time per GB image
35
32
23
Process time to first deliverable
53
35
21
Process time by file type
4
3
2
Process time by file type
6
4
3
Process time by file type
2
3
2
Quality



First pass quality yield % *[2]
57%
78%
73%


Paper-to-Electronic Estimate Conversion Table

Boxes of Documents
Approximate Total Pages
Megabytes, Gigabytes, Terabytes
1
2,500
50
Megabytes
10
25,000
500
Megabytes
20
50,000
1
Gigabyte
100
250,000
5
Gigabyte
200
500,000
10
Gigabyte
300
750,000
15
Gigabyte
400
1,000,000
20
Gigabyte
500
1,250,000
25
Gigabyte
1,000
2,500,000
50
Gigabyte
2,000
5,000,000
100
Gigabyte
5,000
12,500,000
250
Gigabyte
10,000
25,000,000
500
Gigabyte
20,000
50,000,000
1
Terabyte
40,000
100,000,000
2
Terabyte
60,000
150,000,000
3
Terabyte


Footnotes

  1. ^  Images are counted one per page, so that a 4-page multi-page TIFF would count as 4 images.
  2. ^  The percentage of data that runs through without intervention or exception handling.


EDRM Processing - Metrics., Retrieved May 29, 2012

No comments:

Post a Comment