18 Jun Make it Harder to Hide: 3 Techniques for Conducting Threat Hunting at Scale
He began a security presentation with a picture of the City of San Francisco at night. The image was cropped perfectly to one million pixels and took up the entire screen. Every pixel represented an IP address on a network. It’s obviously a large network and that’s the scale of the environment in which threat hunters are tasked with finding adversaries in the enterprise.
That analogy is how Vernon Habersetzer opened up his RSA Conference session on threat hunting. He leads the hunt team for Walmart, a company he later noted, with an IT infrastructure three times the size of his analogy.
Looking for threats in a network of that magnitude might seem like an impossible task. And it may as well be impossible if the only aspect of threat hunting you do is exclusively centered on indicators of compromise (IOCs).
Why? Because IOCs get threat hunters too focused on granular threat detection, says Mr. Habersetzer. Even with attack frameworks, like the one championed by MITRE, hunt teams end up writing detection rules against specific tactics, techniques and procedures (TTPs). That is an unending task that never quite catches all the issues.
To be clear, his presentation was not saying detection rules or IOCs aren’t important. Instead, he’s suggesting threat hunting teams should to take a step back. In other words, in addition to addressing TTPs, strive to look at the whole picture from an entirely different perspective.
>>> Related content: 7 Simple but Effective Threat Hunting Tips from a Veteran Threat Hunter
Mathematics and Probability as Threat Hunting Tools
The law of large numbers is a theorem within probability theory that can provide that perspective. According to Wikipedia, the source Mr. Habersetzer cited in his presentation, the theorem says:
“The average of the results obtained from a large number of trials should be close to the expected value and will tend to become closer as more trials are performed.”
The flip of a coin is the quintessential example, he explained. The average chance of it landing on either side is 50-50. That might not happen initially, but if you perform the toss many times it always works out to that average.
In his presentation, he modified this theorem for security professionals:
“The average of the results obtained from a large number of trials events should be close to the expected value (benign) and will tend to become closer as more trials are performed events are analyzed.”
If most of the transactions on the average corporate network are benign, then threat hunting should focus on the anomalies. Since most attacks introduce something new into the environment, security can use the law of large numbers theorem to narrow down a massive amount of activity on a large network to just those transactions that seem out of place.
Mr. Habersetzer brought the image of the city – glowing from all of the lights at night – back up on the screen. He isolated a single dot of light in the upper left-hand corner of the screen. That one light, he said, signifies a group of 200 domain controllers. If just one of those machines starts using a different protocol, or talking to a different segment of the network, or runs a new process, then it can be identified with this theorem even amid that vast array of city lights.
This concept works for virtually any network anomaly – and he offered these examples:
- a .zip file leaving via Server Message Block (SMB) protocol;
- a strange binary copied (1.exe); or
- a remote desktop protocol session (RDP) from a development workstation.
If you are able to ferret out the anomalies for closer scrutiny, you are effectively making it harder for adversaries to hide, regardless of the size of the network.
3 Techniques for Conducting Threat Hunting at Scale
Most organizations already have the data sources they need to perform threat hunting this way, according to Mr. Habersetzer. For example, most have proxy logs, full packets, NetFlow, Zeek logs (formerly known as Bro), centralized endpoint logs, among others.
All it takes is joining these data sources with an asset description table and then crafting queries to group events by asset, time, count and unique source or destination IPs for any given set of behaviors or characteristics.
If there’s prerequisite legwork involved, it’s that asset tagging is required in two of his three threat hunting methods. This means describing the host for each IP address. An asset management solution does make this easier, but it can also be performed manually, he noted.
A good way to get started is to identify assets, such as domain controllers, point-of-sale (POS) machines, or any other group of devices on a network that are critical to your organization. Sources for identifying the types of hosts include Active Directory, Domain Name Systems (DNS), vulnerability management solutions and even internal wikis.
His organization runs scripts that scrapes some of these sources hourly. This provides a starting point for this threat hunting efforts. To that end, here are the three techniques he outlined.
1) Find outliers exhibiting a TTP.
The first of Mr. Habersetzer’s threat hunting techniques doesn’t require asset tagging. All that’s need is a table of proxy logs showing IP source, domain and method. Using the SQL functions of COUNT, DISTINCT, and GROUP BY – and WHERE for the selected TTP – the query will filter the data set down to just the anomalies exhibiting that TTP.
In his presentation, the count of IP addresses exhibiting that TTP narrows the set. He then runs a WHERE query from that view to filter domains never seen before. The result is a single machine exhibiting C2 style behavior that is accessing a domain that none of the other 2.5 million Walmart employees have access in the last six months.
A video of his presentation is embedded below, and this example starts at a time stamp of about 16:56.
2) Identify outlier artifacts by asset type.
His second technique involved selecting an artifact from a NetFlow log. This artifact could be protocols, IP addresses, or running files, for example. He used protocols in his presentation – and joined a tagged asset table with the log table.
Next, he selects the asset type – using the domain controller to stick with his earlier analogy – and the orders the count by distinct IP sources. The subsequent list shows 150 domain controllers using protocols including RPC, KERBEROS, LDAP, SSL – but just one or two are using SMB or DNS.
These are the anomalies and it could be two machines or a single machine using both protocols. He doesn’t know at that moment, but that’s the point: he’s not looking for a specific TTP, instead, he’s looking for what’s unusual from a protocol perspective.
It could be a compromised machine, or it could be someone in the IT shop troubleshooting a machine. The goal here is to identify the outliers and then take a closer look. This sort of filtering is using the law of large numbers theorem to make it “hard for the adversary” to target critical infrastructure with a new protocol.
This example starts at a time stamp in his presentation of about 20:40.
3) Filter for unusual characteristics by asset type.
His final technique is to define a characteristic and then look for outliers among tagged assets by type. In the presentation, he illustrates a packet log (full packets) in a spreadsheet. He applies what he called a dummy header of “IP address exists” for a reference point for all entries. This is what will provide a total count from which to filter anomalies.
Next, he chooses a traffic characteristic such as HTTP content type, file types, byte counts, or traffic over non-standard ports. There are hundreds of choices here, but the idea is to look for things “relevant to the threat landscape.”
In the example for his presentation, he turns to the domain controllers again. He filtered the joined table down by distinct IP sources to find three that are exhibiting traffic characteristics the other domain controllers are not:
- HTTP direct to IP request;
- SSL self-signed certificate; and
- host header contains a port.
This particular approach is interesting, Mr. Habersetzer said, “because you find things you didn’t even know to look for.” More importantly, when you find something interesting, you run a wider search for those characteristics in larger groups of assets you’ve tagged – like Exchange servers, ATM networks, or DNS.
This example starts at a time stamp in his presentation of about 22:38.
>>> Related content: 6 Ways Modern Threat Detection Keeps the Enterprise Ahead of Cybersecurity Trends
Proof of Concept and Threat Hunting Tips
While the examples he used in his presentation were fictional for instructional purposes, he provided real examples of anomalies he’s found in his work at Walmart. These have included unusual protocols, C2 activity, a packet utility running and custom malware – that wouldn’t have fired a signature detection engine.
In closing his session, he offered threat hunters two tips to keep in mind when employing these sessions. First, threat hunters should experiment with the timing of the data set. He suggests starting with 24 hours. Sometimes shorter periods give things the appearance of an anomaly because it’s too early to see the trend. Greater lengths of time provide longer context – say six months – but go too long and you may miss timely events.
Second, not all anomalies are malicious. Threat hunting can and does turn up odd, but benign activity – misconfigured machines and or an executable left running after troubleshooting. “In threat hunting, there’s always some noise,” he said.
* * *
The full session runs a little more than 40 minutes including a question and answer period: Threat Hunting Using 16th-Century Math and Sesame Street. In addition, Vernon Habersetzer can be found on Twitter and LinkedIn.
If you enjoyed this post, you might also like:
How Enhanced Network Metadata Resolution Facilitates Network Threat Hunting