Using Type Systems for Taint Analysis for Android Apps

Posted on Tue 28 April 2020 in Post

What is Taint Analysis?

Taint analysis is the process of detecting applications which collect users' sensitive information and leaks them to external sources, regardless of if the application developers did this maliciously or accidentally. The two methods for doing this analysis is dynamic and static, where dynamic is done by monitoring an application while it is being run. Due to this analysis being performed alongside the application, the application will run slower. Another major short coming of the dynamic approach is that it will only detect leaking of sensitive data (taint) if it is actually leaked during the runtime when the analysis takes place. Meaning that it will fail to detect taint in some applications. The alternative is static taint analysis which is what I will talk about here.

Static taint analysis involves looking at the program code itself without running the program. There are many benefits over dynamic here, such as not having to run the app that may be malicious, having full access to the app's capabilities (without having to run the app and run all branches of the tree) and being able to run the app after without any slowdown. Static analysis is easy with android apps as the .apk file they are stored in can be de-compiled into java source code, and the source code can be analysed including libraries and classes. The authors Wei Huang, Yao Dong, Ana Milanova and Julian Dolby who have written the paper on this topic referenced below, have presented the systems, DFlow and DroidInfer. The authors have presented this system to demonstrate a method of performing static taint analysis, using a type system.

Type Systems

A type system in programming is the idea that variables and pieces of data have a type, such as integer, floating point, string, etc. and depending on the type of the data, certain operations and conversions can or cannot take place. For example, it is possible to change an integer into a float, and to multiply a float by an integer, but trying to convert an integer to a string is not possible in a simple way using type conversion (it requires a more substantial algorithm). Type systems are often used to detect bugs in code during compilation where it can check if operations and conversions make sense. The system proposed by the authors of the paper use their system to analyse the source code of the programs in a similar fashion to a compiler.

Analysing Taint with a Type System

The technique proposed in the paper uses two types that are applied to variables in the source code; Safe and Tainted (some variables can have no type and be ignored). A safe variable is a variable that contains sensitive information. A variable becomes safe when it is assigned a value from a source or sensor that is considered sensitive, such as GPS location or device ID. If a variable takes a value from a safe variable, the variable also becomes safe. An example would be a variable that is storing the device ID and is set to type safe, then a second variable that is a URL string appends the device ID variable, the URL variable is then set to type safe due to appending a safe variable. The flow of assigning the safe type to the source code goes forwards through the code similar to the execution of the program.

The tainted variable type is one that is eventually sent to an "untrusted sink". This could be a data log file or being sent over a HTTP web request. This means a tainted variable is a variable that gets sent to an untrusted place. When tainted variable takes a value from another variable, the variable becomes tainted as well. An example would be loading a URL string for a http request, which sets the URL string to be tainted. Before the request, the URL string was constructed using a combination of multiple strings and variables, all these variables become tainted too. The flow of assigning the tainted variable type goes backwards from the sink to the variable declaration, the opposite to the safe type.

The rules of the type system state that a safe variable cannot be declared a tainted variable and vice versa. So when the type system is run over source code, a type error will occur when a safe variable and a tainted variable have a data flow, which represents data from a sensitive source in a safe variable, flowing into a tainted variable which eventually sends the data to an untrusted sink, hence marking the program as tainted. Therefore a type error in the type system means the program is tainted. This is how the system proposed in the paper works. The only input from the user to use the system is flagging the sensitive data sources and untrusted data sinks.

The Results

In the paper, the authors tested their taint analysis systems with other system that are already available. The results showed their new system to have a similar accuracy when it flagged apps as tainted, but had a significant reduction in the number of tainted data flows that were missed.

Reference

"Scalable and precise taint analysis for Android", Wei Huang, Yao Dong, Ana Milanova and Julian Dolby https://dl.acm.org/doi/10.1145/2771783.2771803