Problem

Advanced persistent threats (APTs) are stealthy cyber attacks, which pose a security threat to many sectors, including national defense, finance, manufacturing, energy etc. These attacks are hard to detect because they use customized incursion techniques, with specific targets, can last in the system for a very long time and are methodically designed to bypass conventional security methods. Dynamic Information Flow Tracking (DIFT) is a method that is widely used to detect APTs. But DIFT requires a lot of memory and computation resources to be effective. Thus, the problem this project addressed was “how can DIFT be made more resource effective without trading off accuracy?” So, we wanted to model a DIFT-based defense mechanism against APTs that:

Captures the trade-off between detection accuracy and resource efficiency.
Accounts for rate of false negatives.

Approach

Model the DIFT and APT interactions as a stochastic non-zero sum game (from game theory) with imperfect and incomplete information to obtain a formulation for the solution (Nash Equilibrium)
Use input convex neural networks (ICNN) to approximate the Nash equilibrium of the game in the presence of uncertainty (false negatives).

Game

The modeled game as two players, the attacker (A) and the defender (D).

A’s goal is to evade detection and reach its target. Therefore, A gets rewarded for reaching its target and penalized for getting detected or dropping out of the system.
D’s goal is to dynamically place traps on A’s path in a way that maximizes the probability of detection and minimizes system overhead. Therefore, D gets rewarded for detecting A or if A drops out and gets penalized if A is able to reach its target without detection. D also incurs a constant resource cost for using system resources.

The game takes place on an information flow graph with nodes and edges.

Solution Concept

We proved that the above game:

terminates in a finite number of steps
has a Nash equilibrium
has complexity linear in the number of stages and edges for computing the transition probabilities

The key challenge is to find the Nash equilibrium (NE) for the above game with unknown transition probabilities. The transition probabilities are unknown because of the unknown rate of false negatives in the DIFT system. If transition probabilities are not fully known, it becomes difficult to solve for the NE as the utility function (reward) becomes hard to compute. Therefore, we use supervised learning to approximate the NE.

Supervised Learning

The intuition behind using supervised learning is to train two neural networks, one for A and one for D to predict the utility function (reward) for A and D respectively. The training is done by an alternating optimization method, where each player’s strategy is updated by fixing the strategy of the opponent and maximizing the utility funtion of that player. Once the two neural networks converge, the resultant utilities of each player are the utilities at the approximate NE.

Why use input convex neural networks?

To solve for the NE, we use a specific type of neural network known as the partially input convex neural networks (PICNN). PICNNs have constraints on the parameters, such that the output of the network is a convex function of a subset of the inputs. We use PICNNs because the utility functions of A and D are non-concave with respect to the transition probabilities. Therefore, there is a chance that using a normal neural network will converge to a local NE. To get around this, we use PICNNs to get a convex relaxation of the utility functions.

Case Studies

To demonstrate the performance of algorithm, we conducted two different simulation studies: (i) using a randomly generated information flow graph (IFG) and (ii) using an IFG of ScreenGrab attack data obtained by the Refinable Attack Investigation System (RAIN).

We conducted a sensitivity analysis to validate that the converged strategies correspond to a Nash Equilibrium of the game. They did this by perturbing each player’s strategy while keeping the other player’s strategy fixed and checking if the output of the algorithm remained a Nash Equilibrium.

Results

Case 1: Random Graph

Case 2: ScreenGrab Attack

The results of the sensitivity analysis show that for most random perturbations, the payoff obtained by the non-neural network approach is less than the payoff of the strategy returned by the neural network. Only two perturbations resulted in higher payoff for A and none for D in the random graph experiment. In the ScreenGrab experiment, no perturbation resulted in higher payoff for either player. If the PICNN indeed converge to an NE, then no player can have a higher payoff unilaterally, as long as the strategy for the opponent remains unchanged. These results support the claim that the proposed approach learns an approximate equilibrium because for most perturbations the utility from the perturbed strategies is less than or equal to the red line (utility learned from PICNN).

Future Work

Some avenues for future work include:

Characterizing the convex approximation factor for the payoff functions of both players, to provide a measure for how close the PICNN NE is to the true NE.
Analyzing the trade-off between obtaining a good convex approximation and the accuracy of the partial input convex neural networks. This will require further experimentation and testing to determine the most effective approach.
The supervised learning approach can be investigated and applied to other types of games with unknown transition probabilities.

Machine Learning for Cyber Security