Ever since yesterday morning when I read that Microsoft reported that breaches involving data loss are down, it’s been a puzzlement because all of the data I’ve seen this year suggest that the number of breach reports are up.  I think I’ve figured out how they came to what I see as an erroneous statement.

Here’s the relevant section of their report to get you oriented:

The information in this section was generated from worldwide data security breach reports from news media outlets and other information sources that volunteers have recorded in the Data Loss Database (DataLossDB) at http://datalossdb.org. (For more information about the DataLossDB and breach types, see Security Breach Trends in the Reference Guide section.)

Figure 12. Security breach incidents by incident type, 1H08-1H10

As in recent periods, the first six months of 2010 saw a decline in the total number of incidents reported. This downward trend may be related to the overall decline in worldwide economic activity over the same time period.


But comparing the first half of 2010 to any previous half-year is only valid if:  (1) the reports were compiled during the half-year periods,  (2) the statistics were based on the same number of sources for each half-year, and (3) DataLossDB.org has kept up with  2010 reports.  None of those conditions appear to have been met.

The DataLossDB.org database is  backfilled on an ongoing basis as they discover reports that they had either missed or that had not been revealed at the time. As one result, statistics for past half-years continue to climb. All things being equal (which they’re not), past years have been documented more completely than recent years. Trying to compare current figures with past statistics yields a spurious decrease for the current year.

To demonstrate the problem: the database currently shows 762 incidents for 2008, but in April 2009, the 2008 total stood at 562. Had we checked even earlier — at the end of 2008 — we would have seen a total that was much, much lower than that.  Similarly, the database shows a total of 609 incidents for 2009, but the yearly total was nowhere near that when 2009 ended and was probably more on the order of 275 or so. Any attempt to compare the current year to past years is confounded by past years being more complete.  And each year or half-year may be more complete than the one following it.

So where are we really in 2010? Right now DataLossDB.org shows 290 breaches recorded this year for U.S. and non-U.S. breaches combined while the counter for the Identity Theft Resource Center currently shows 537 incidents for U.S. breaches that might lead to ID theft. Both organizations use different criteria and sources for inclusion and ITRC does not backfill its chronologies for previous years: if a breach that occurred in 2009 is first made public this year, they would record it in their 2010 statistics while DataLossDB.org would backfill it in their 2009 statistics. There are other reasons that their statistics are so discrepant for this year but that’s not relevant to the main point of why I thought  Microsoft’s  analysis of trend was confounded.

None of the above is intended as any criticism of OSF/DataLossDB.org who provide yeoman service to researchers and security professionals.   I just think that researchers need to be aware of how the database is updated before trying to make trend statements. 

All sources that I’ve seen or read seem to agree that there were fewer new breach incidents reported in 2009 than in 2008.   Whether the decrease in 2009 is due to an actual decrease in incidents or an increased failure to detect, increased  failure to report, or some other factor is unclear, but I see no credible evidence that breach reports of new incidents are down in 2010 compared to 2009. Indeed, ITRC’s breach counter has already topped the 2009 total and may come close to the 2008 total or even exceed it.

Of course, there are those who argue that the number of incidents isn’t really important and that we should be looking at those cases linked to fraud or some other harm, and I won’t debate that here.  And it may be the case that some statements about particular kinds of attacks or findings may hold up.  My point is simply that what appeared to be a happy downward decline in semi-annual number of breach reports was seriously confounded by the way in which the database grows over time.

Sorry, Microsoft.

We will now return to our regularly scheduled program….

