Drek on the Internet
The reader my be familiar with statistics on the proportion of email believed to be unsolicited spam, the currently accepted value being 90%, I believe.
I've compiled a similar little statistic myself, on the amount of junk material referenced on websites; that is, things like ads that are to be loaded even though they aren't what the viewer is visiting the site to see. My number is a lower bound for the set of websites I visit, and is the proportion of URLs blocked by PithHelmet. For each URL encountered in a page that Safari would normally load, PithHelmet steps in and applies a set of user specified rules to determine whether the URL looks like junk or should really be loaded.
The value I get from looking at the numbers stored on my other machine1 (which was been browsing the internet since before this one was built) shows that PithHelmet has blocked just over 4% of all 783,000 URLs it has examined. This suggests that at least 4% of all content displayed, at least on the sites I visit, is stuff I never wanted to see anyway. The real value may be higher, as still not all ads on pages I visit are blocked, only most of them. It's still not near as bad as the commonly held numbers for spam email, but not a good trend to see. What I'd be curious about is: How does the size of data break down? Especially with all those horrible flash ads, I've seen pages that spent far more time loading the ads than the page itself. I fear that the signal to noise ratio may be much lower when examined that way, but I don't have a handy way to check.
-
Interestingly, the fraction is far higher on this computer, a 10.5% out of 84,000. The statistics here are much poorer, obviously, but I'm wondering how I've been browsing differently that I've been hitting more ads. ↩