By Marcus Ranum, email@example.com
Security practitioners have plenty of opportunity to be extra-glum during holiday season: it’s the time of year when a breach can have maximum impact – avoiding downtime and disruption is critical and your systems are going to be processing more credit cards than any other time of the year.
Let’s not dwell on that, because unless you’ve been living under a rock for the last decade, you will have seen plenty of examples of what can go wrong. A better way to think of it is that the holidays are a particularly good opportunity to show how good your systems awareness has become. You’re going to be potentially dealing with unusual and interesting loads, and anything that can be done to increase your awareness of systems behavior improves your chance of being able to quickly diagnose and repair any problem. It’s not just a security problem, either, though other system problems can sometimes manifest as what appear to be attacks in progress.
In 2002, I was involved in a full-on incident response for a major web e-tailer that had a series of mysterious server/database hangs during the busiest shopping day of the year. It took a tremendous amount of hard work from a team of security geeks to determine that an obscure bug was causing the site’s shopping cart allocator to fail because of a hard-coded limit. Admittedly, there’s no security tool that would proactively detect something like that, but signs that the system was suffering performance degradation at a certain load-level were clear. Unfortunately, nobody was looking.
‘Tis the season to look for anomalies and – from a security standpoint – that means continuous monitoring of your critical assets. First off, it’s good to establish a baseline of how the system behaves under normal load, thereby eventually establishing an approximate idea what “normal” looks like. That trains your eye to detect abnormal loads or events. If you know, for example, that your site typically has 12,000 active shopping carts at a time and that your customer-base tends to be U.S./Canada, you’ll know instantly that something interesting is happening if you suddenly see 30,000 active shopping carts and that 10,000 of them are from Eastern Europe. The point is that if you aren’t continually monitoring your system you don’t know what “normal” looks like – you’re left with “appears to work OK” and “does not work.” Those states don’t convey enough information for holiday operations.
Most of us have gotten the message regarding configuration and patch management around the holidays: make sure it works and don’t mess with it, and have emergency plans if you need to add additional capacity. Unfortunately, attackers might have different plans; you still need to be keeping an ear to the ground regarding system vulnerabilities. In the old days you had to closely monitor the security mailing lists, and have a good idea which pieces of critical software you had exposed in case someone published an ill-timed proof of concept exploit against something on your site.
Nowadays, fortunately, that kind of vulnerability management can be automated – tools can coordinate identifying what flaws there might be in your software as it’s installed, and can quickly flag any new vulnerabilities as soon as they're published. Depending on how complex and heterogenous your network/applications mix happens to be, that can be a huge time-savings as it amounts to getting a prioritized punch-list that you can fix (and track to closure) instead of having to manage your own vulnerability research and assessment process. Automation of your work-flow and prioritization process is the key to keeping things from slipping through the cracks.
Just today I was in a meeting with the CSO of a large bank who said something that floored me: “Until recently, I didn’t think system log/analysis was particularly useful, since it was backward-looking and I wanted our security practice to be proactive.” Well, I’m glad that he’d come to his senses finally, because system log collection (and analysis) is critical to establishing your baseline of “normal” behavior, which allows you to get proactive when things begin deviating from your expectations. For retailers, however, the value of the backward-looking component is also very high. In the event that you have an incident, being able to figure out what happened is going to frequently depend on your ability to analyze your system logs and determine:
- Duration of a break-in;
- Actions taken by the attacker;
- Customer data that was exfiltrated (if any); and
- Customer data that was present on compromised systems during the time they were compromised.
If you are under any kind of regulation (i.e.: you handle credit cards, customer information, transactional data including home addresses) for breach notification, the data collected by your continuous monitoring may save you very large amounts of money. One of my acquaintances in the industry has been able to document $3m+/year savings on breach notification/remedies alone, simply by using the data his team collects as part of their monitoring effort to reduce (generally eliminate, in fact) the need for breach notifications to customers. In my friends’ company, he manages a vast point-of-sale network, and he maintains netflow traces of traffic within branch networks and between the branches and central office.
In the event that a point-of-sale terminal at one of the branches is compromised, they can look back through the system execution and firewall audit logs to determine when and how the breach occurred, then retrieve all the flows from the system within that time and conclusively argue whether or not the attack was able to successfully spread horizontally or reach any customer data on the servers at the central office. Having that level of data at your fingertips is a normal byproduct of a continuous monitoring program – you want to collect and retain defensible data regarding past events, while establishing that baseline of normalcy that you can search for deviations from.
It’s a bit Grinch-like to be reminding people that systems/network security is especially important during the holidays. But, perhaps, you can use the oncoming holidays as an opportunity to assess whether your processes are all they need to be. A good assessment should cover:
- Are we logging enough? Do we keep the logs that we need in order to determine what happened in a breach? Consider network, web server, firewall, and application logs.
- Are our incident response procedures in place and understood by all the critical responders?
- Are our backup and business resumption processes functioning and effective? How long would it take to get our key servers back online if we suffered a destructive attack?
- How long does it take for us to identify a vulnerability in critical software, assess its impact to our operations, and react/patch/upgrade an affected system?
- Do we have the contacts and support that we may need in the event that we have to go into a full-blown incident response that is outside of our capacity to handle in-house?
Even if you don’t have the time or the staff depth to run an incident response drill, you can learn a lot by asking the questions above, and considering your posture. In today’s “get more done with less” environment it’s a truism that no systems administration/network management/security team has all the resources that they want. The trick is figuring out if you can make do with what you’ve got.