Every antivirus or security suite product promises to protect you from a horde of security risks and annoyances.

But do they work? When evaluating these products for review, I put their claims to the test in many different ways.

Each review reports the results of my tests, as well as my hands-on experience with the product.

This article will dig deeper, explaining just how my tests work.

Of course, not every test is appropriate for every product. Many antivirus utilities include protection against phishing, but some don’t. Most suites include spam filtering, but some omit this feature, and some antivirus products add it as a bonus. Whatever features a given product offers, I put them to the test.

Click any test in the list below to jump straight to my description of that test.

Testing Real-Time AntivirusEvery full-powered antivirus tool includes an on-demand scanner to seek out and destroy existing malware infestations and a real-time monitor to fend off new attacks.
In the past, I’ve actually maintained a collection of malware-infested virtual machines to test each product’s ability to remove existing malware.

Advances in malware coding made testing with live malware too dangerous, but I can still exercise each product’s real-time protection.

Each year in early spring, when most security vendors have finished their yearly update cycle, I gather a new collection of malware samples for this test.
I start with a feed of the latest malware-hosting URLs, download hundreds of samples, and winnow them down to a manageable number.

I analyze each sample using various tools that I coded myself.
Some of the samples detect when they’re running in a virtual machine and refrain from malicious activity; I simply don’t use those.
I look for a variety of different types, and for samples that make changes to the file system and Registry. With some effort, I pare the collection down to about 30, and record exactly what system changes each sample makes.

To test a product’s malware-blocking abilities, I open the folder of samples. Real-time protection in some products kicks in immediately, wiping out known malware.
If necessary to trigger real-time protection, I single-click each sample, or copy the collection to a new folder.
I take note of how many samples the antivirus eliminates on sight.

Next, I launch each remaining sample and note whether the antivirus detected it.I record the total percentage detected, regardless of when detection happened.

Detection of a malware attack isn’t sufficient; the antivirus must actually prevent the attack.

A small program I wrote checks the system to determine whether the malware managed to make any Registry changes or install any of its files.
In the case of executable files, it also checks whether any of those processes are actually running.

And as soon as measurement is complete, I shut down the virtual machine.

If a product prevents installation of all executable traces by a malware sample, it earns 8, 9, or 10 points, depending on how well it prevented cluttering the system with non-executable traces.

Detecting malware but failing to prevent installation of executable components gets half-credit, 5 points.

Finally, if, despite the antivirus’s attempt at protection, one or more malware processes is actually running, that’s worth a mere 3 points.

The average of all these scores becomes the product’s final malware-blocking score.

Testing Malicious URL Blocking The best time to annihilate malware is before it ever reaches your computer. Many antivirus products integrate with your browsers and steer them away from known malware-hosting URLs.
If protection doesn’t kick in at that level, there’s always an opportunity to wipe out the malware payload during or immediately after download.

While my basic malware-blocking test uses the same set of samples for a season, the malware-hosting URLs I use to test Web-based protection are different every time.
I get a feed of the very newest malicious URLs from London-based MRG-Effitas and typically use URLs that are no more than a day old.

Using a small utility I wrote for the purpose, I go down the list, launching each URL in turn.
I discard any that don’t actually point to a malware download, and any that return error messages.

For the rest, I note whether the antivirus prevents access to the URL, wipes out the download, or does nothing.

After recording the result, my utility jumps to the next URL in the list that isn’t at the same domain.
I do skip any files larger than 5MB, and I also skip files that have already appeared in the same test.
I keep at it until I’ve accumulated data for at least 100 verified malware-hosting URLs.

The score in this test is simply the percentage of URLs for which the antivirus prevented downloading malware, whether by cutting off access to the URL completely or by wiping out the downloaded file.
Scores vary widely, but the very best security tools manage 90 percent or more.

Testing Phishing Detection Why resort to elaborate data-stealing Trojans, when you can just trick people into giving up their passwords? That’s the mindset of malefactors who create and manage phishing websites.

These fraudulent sites mimic banks and other sensitive sites.
If you enter your login credentials, you’ve just given away the keys to the kingdom.

And phishing is platform-independent; it works on any operating system that supports browsing the Web.

These fake websites typically get blacklisted not long after their creation, so for testing I use only the very newest phishing URLs.
I gather these from phishing-oriented websites, selecting those that have been reported as frauds but not yet verified.

This forces security programs to use real-time analysis rather than relying on simple-minded blacklists.

Symantec’s Norton Security has long been an outstanding detector of such frauds.
Since the actual URLs used differ in every test, I report results as the difference between a product’s detection rate and Norton’s.
I also compare the detection rate with that of the phishing protection built into Chrome, Firefox, and Internet Explorer.

I use five computers (most of them virtual machines) for this test, one protected by Norton, one by the product under testing, and one each using the three browsers alone. Using a small utility I wrote, I launch each URL in the five browsers.
If any of the five returns an error message, I discard that URL.
If the resulting page doesn’t actively attempt to imitate another site, or doesn’t attempt to capture username and password data, I discard it.

For the rest, I record whether or not each product detected the fraud.

In many cases, the product under testing can’t even do as well as the protection built into one or more of the browsers. Only a very few products come close to matching Norton’s detection rate.

Testing Spam Filtering These days many email accounts have the spam vacuumed out of them by the email provider, or by a utility running on the email server.

But if your email account isn’t pre-filtered, you may find you’re drowning in ads for male enhancements and come-ons from fake Nigerian princes. Most security suites include antispam as an option, but only a few are as accurate as the best standalone antispam programs.

For antispam testing, I maintain a real-world email account that receives tons of spam along with plenty of valid email.

A tweak to the email server feeds the incoming mail into eight identical accounts. When it’s time to run a test, I simply let the email client download all the pent-up mail from next account in line.

Depending on how many spam filters I’ve been testing, there can be as many as 15,000 messages piled up. Once the email download finishes, I discard any messages that are more than 30 days old.

That still leaves anywhere from 3,000 to 5,000 messages, which is plenty.

The next step is a bit tedious.
I sort the messages in the Inbox into subfolders, one for valid personal mail, one for newsletters and other valid bulk mail, and one for undeniable spam.

Anything that’s not a clear match for those three categories goes in the trash.
I repeat that process for the spam folder and then start crunching numbers.

I do look at the percentage of spam caught by the filter, but I also watch very closely for valid mail discarded along with the spam.

Deleting spam messages that got past the filter can be tedious, but you’d be really upset if you missed a business opportunity because the email got mis-marked as spam.
I start deducting points if the filter tosses even 1 percent of valid mail, or if it misses 5 percent of undeniable spam.

One more thing.

During the initial download of email, I time how long it takes to download 1,000 messages and compare that with the time when no spam filter is present.

A spam filter that significantly slows downloading mail could be an annoyance, especially for people who only check email every few days.

Testing Security Suite Performance When your security suite is busily watching for malware attacks, defending against network intrusions, preventing your browser from visiting dangerous websites, and so on, it’s clearly using some of your system’s CPU and other resources to do its job.
Some years ago, security suites got the reputation for sucking up so much of your system resources that your own computer use was affected.

Things are a lot better these days, but I still run some simple tests to get an insight into each suite’s effect on system performance.

Security software needs to load as early in the boot process as possible, lest it find malware already in control.

But users don’t want to wait around any longer than necessary to start using Windows after a reboot. My test script runs immediately after boot and starts asking Windows to report the CPU usage level once per second.

After 10 seconds in a row with CPU usage no more than 5 percent, it declares the system ready for use.
Subtracting the start of the boot process (as reported by Windows) I know how long the boot process took.
I run many repetitions of this test and compare the average with that of many repetitions when no suite was present.

In truth, you probably reboot no more than once per day.

A security suite that slowed everyday file operations might have a more significant impact on your activities.

To check for that kind of slowdown, I time a script that moves and copies a large collection of large-to-huge files between drives.

Averaging several runs with no suite and several runs with the security suite active, I can determine just how much the suite slowed these file activities.

A similar script measures the suite’s effect on a script that zips and unzips the same file collection.

The average slowdown in these three tests by the suites with the very lightest touch can be as low as 1 percent.

At the other end of the spectrum, a very few suites average 25 percent, or even more. You might actually notice the impact of the more heavy-handed suites.

Testing Firewall Protection It’s not as easy to quantify a firewall’s success, because different vendors have different ideas about just what a firewall should do.

Even so, there are a number of tests I can apply to most of them.

Typically a firewall has two jobs, protecting the computer from outside attack and ensuring that programs don’t misuse the network connection.

To test protection against attack, I use a physical computer that connects through the router’s DMZ port.

This gives the effect of a computer connected directly to the Internet.

That’s important for testing, because a computer that’s connected through a router is effectively invisible to the Internet at large.
I hit the test system with port scans and other Web-based tests.
In most cases I find that the firewall completely hides the test system from these attacks, putting all ports in stealth mode.

The built-in Windows firewall handles stealthing all ports, so this test is just a baseline.

But even here, there are different opinions. Kaspersky’s designers don’t see any value in stealthing ports as long as the ports are closed and the firewall actively prevents attack.

Program control in the earliest personal firewalls was extremely hands-on.

Every time an unknown program tried to access the network, the firewall popped up a query asking the user whether or not to allow access.

This approach isn’t very effective, since the user generally has no idea what action is correct. Most will just allow everything. Others will click Block every time, until they break some important program; after that they allow everything.
I perform a hands-on check of this functionality using a tiny browser I wrote myself, one that will always qualify as an unknown program.

Some malicious programs attempt to get around this kind of simple program control by manipulating or masquerading as trusted programs. When I encounter an old-school firewall, I test its skills using utilities called leak tests.

These programs use the same techniques to evade program control, but without any malicious payload.
I do find fewer and fewer leak tests that still work under modern Windows versions.

At the other end of the spectrum, the best firewalls automatically configure network permissions for known good programs, eliminate known bad programs, and step up surveillance on unknowns.
If an unknown program attempts a suspicious connection, the firewall kicks in at that point to stop it.

Software isn’t and can’t be perfect, so the bad guys work hard to find security holes in popular operating systems, browsers, and applications.

They devise exploits to compromise system security using any vulnerabilities they find. Naturally the maker of the exploited product issues a security patch as soon as possible, but until you actually apply that patch, you’re vulnerable.

The smartest firewalls intercept these exploit attacks at the network level, so they never even reach your computer.

Even for those that don’t scan at the network level, in many cases the antivirus component wipes out the exploit’s malware payload.
I use the CORE Impact penetration tool to hit each test system with about 30 recent exploits and record how well the security product fended them off.

Finally, I run a sanity check to see whether a malware coder could easily disable security protection.
I look for an on/off switch in the Registry and test whether I can use it to turn off protection (though it’s been years since I found a product vulnerable to this attack).
I attempt to terminate security processes using Task Manager.

And I check whether it’s possible to stop or disable the product’s essential Windows services.

Testing Parental Control Parental control and monitoring covers a wide variety of programs and features.

The typical parental control utility keeps kids away from unsavory sites, monitors their Internet usage, and lets parents determine when and for how long the kids are allowed to use the Internet each day. Other features range from limiting chat contacts to patrolling Facebook posts for risky topics.

I always perform a sanity check to make sure the content filter actually works.

As it turns out, finding porn sites for testing is a snap. Just about any URL composed of a size adjective and the name of a normally-covered body part is already a porn site.
Very few products fail this test.

I use a tiny browser that I wrote myself to verify that content filtering is browser independent.
I issue a three-word network command (no, I’m not publishing it here) that disables some simple-minded content filters.

And I check whether I can evade the filter by using a secure anonymizing proxy website.

Imposing time limits on the children’s computer or Internet use is only effective if the kids can’t interfere with timekeeping.
I verify that the time-scheduling feature works, then try evading it by resetting the system date and time.

The best products don’t rely on the system clock for their date and time.

After that, it’s simply a matter of testing the features that the program claims to have.
If it promises the ability to block use of specific programs, I engage that feature and try to break it by moving, copying, or renaming the program.
If it says it strips out bad words from email or instant messaging, I add a random word to the block list and verify that it doesn’t get sent.
If it claims it can limit instant messaging contacts, I set up a conversation between two of my accounts and then ban one of them. Whatever control or monitoring power the program promises, I do my best to put it to the test.

Interpreting Antivirus Lab TestsI don’t have the resources to run the kind of exhaustive antivirus tests performed by independent labs around the world, so I pay close attention to their findings.
I follow two labs that issue certifications and five labs that release scored test results on a regular basis, using their results to help inform my reviews.

ICSA Labs and West Coast Labs offer a wide variety of security certification tests.
I specifically follow their certifications for malware detection and for malware removal.
Security vendors pay to have their products tested, and the process includes help from the labs to fix any problems preventing certification. What I’m looking at here is the fact that the lab found the product significant enough to test, and the vendor was willing to pay for testing.

Virus Bulletin has been putting antivirus programs to the test for as long as I can remember, and the list of programs tested is much larger than most of the labs.
I look specifically at the RAP (Reactive And Proactive) test, which runs every month.

This two part test checks each product’s ability to detect recent malware samples (the reactive part) and to detect newer samples without a chance to update malware signatures (the proactive part). Products receive a score from 0 to 100 percent.

Based in Magdeburg, Germany, the AV-Test Institute continuously puts antivirus programs through a variety of tests.

The one I focus on is a three-part test that awards up to 6 points in each of three categories: Protection, Performance, and Usability.

To reach certification, a product must earn a total of 10 points with no zeroes.

The very best products take home a perfect 18 points in this test.

To test protection, the researchers expose each product to AV-Test’s reference set of over 100,000 samples, and to several thousand extremely widespread samples. Products get credit for preventing the infestation at any stage, be it blocking access to the malware-hosting URL, detecting the malware using signatures, or preventing the malware from running.

The best products often reach 100 percent success in this test.

Performance is important—if the antivirus noticeably puts a drag on system performance, some users will turn it off.

AV-Test’s researchers measure the difference in time required to perform 13 common system actions with and without the security product present.

Among these actions are downloading files from the Internet, copying files both locally and across the network, and running common programs.

Averaging multiple runs, they can identify just how much impact each product has.

The Usability test isn’t necessarily what you’d think.
It has nothing to do with ease of use or user interface design. Rather, it measures the usability problems that occur when an antivirus program erroneously flags a legitimate program or website as malicious, or suspicious. Researchers actively install and run an ever-changing collection of popular programs, noting any odd behavior by the antivirus.

A separate scan-only test checks to make sure the antivirus doesn’t identify any of over 600,000 legitimate files as malware.

I gather results from five of the many tests regularly released by AV-Comparatives, which is based in Austria and works closely with the University of Innsbruck.
Security tools that pass a test receive Standard certification; those that fail are designated as merely Tested.
If a program goes above and beyond the necessary minimum, it can earn Advanced or Advanced+ certification.

AV-Comparatives’s file detection test is a simple, static test that checks each antivirus against about 100,000 malware samples, with a false-positives test to ensure accuracy.

The retrospective test attempts to measure a product’s ability to detect zero-day malware by forcing it to use old antivirus signatures; in this test, any samples not caught on sight are allowed to run, in case behavioral detection catches them.

And the performance test, much like AV-Test’s, measures any impact on system performance.

I consider AV-Comparatives’s dynamic whole-product test to be the most significant.

This test aims to simulate as closely as possible an actual user’s experience, allowing all components of the security product to take action against the malware.

Finally, the remediation test starts with a collection of malware that all tested products are known to detect and challenges the security products to restore an infested system, completely removing the malware.

Where AV-Test and AV-Comparatives typically include 20 to 24 products in testing, Simon Edwards Labs generally reports on no more than 10.

That’s in large part because of the nature of this lab’s test. Researchers capture real-world malware-hosting websites and use a replay technique so that each product encounters precisely the same drive-by download or other Web-based attack.
It’s extremely realistic, but arduous.

A program that totally blocks one of these attacks earns three points.
If it took action after the attack began but managed to remove all executable traces, that’s worth two points.

And if it merely terminated the attack, without full cleanup, it still gets one point.
In the unfortunate event that the malware runs free on the test system, the product under testing loses five points.

Because of this, some products have actually scored below zero.

In a separate test, the researchers evaluate how well each product refrains from erroneously identifying valid software as malicious, weighting the results based on each valid program’s prevalence, and on how much of an impact the false positive identification would have.

They combine the results of these two tests and certify products at one of five levels: AAA, AA, A, B, and C.

For some time I’ve used a feed of samples supplied by MRG-Effitas in my hands-on malicious URL blocking test.

This lab also releases quarterly results for two particular tests that I follow.

The 360 Assessment & Certification test simulates real-world protection against current malware, similar to the dynamic real-world test used by AV-Comparatives.

A product that completely prevents any infestation by the sample set receives Level 1 certification. Level 2 certification means that at least some of the malware samples planted files and other traces on the test system, but these traces were eliminated by the time of the next reboot.

The Online Banking Certification very specifically tests for protection against financial malware and botnets.

Coming up with an overall summary of lab results isn’t easy, since the labs don’t all test the same collection of programs.
I’ve devised a system that normalizes each lab’s scores to a value from 0 to 10. My aggregate lab results chart reports the average of these scores, the number of labs testing, and the number of certifications received.
If just one lab includes a product in testing, I consider that to be insufficient information for an aggregate score.

Image courtesy of Flickr User DaveBleasdale.