Searching for Sensitive Information in Mobile Apps

Aug 24, 2022 | Mobix news

This article is intended for those who analyze the security of applications, as well as developers interested in proper data storage and security in general. We will talk about what to pay attention to when looking for and analyzing sensitive information in an application, how to search for it, and how it can be properly stored. 

The term sensitive information (sensitive data) refers to any information that could allow an attacker to build an attack vector against the user. This can be, for example, authentication data (passwords, cookies, etc.), personal information (phone number, full name, passport/contact information), session ID that was returned by server part of the application, various tokens from third-party services, etc. 

Searching for Sensitive Information 

It is necessary to search for various sensitive information in a fairly large number of sources and formats. You should definitely pay attention to: 

  • Application files. There can be a lot of them. Nowadays platforms actively use a mobile device for storage in order to work faster. 
  • Databases. This is a special case of application files, but you have to search there using a completely different approach. 
  • System Log. You need to make sure that nothing gets into it, even though only a limited number of applications can read it. 
  • Network activity. It is necessary to look at what data goes between an application and its server part and identify sensitive information, potentially interesting for attackers. 
  • Interprocess communication. You need to check how an application communicates with the outside world, what components are available externally, what goes there, and what an application sends to other applications/services. 
  • Deeplink and Applink as a special case of interprocess communication. 
  • Application code and resources. You need to look for various secrets left in decompiled / disassembled application code. 

If you sum up all this data, you get quite a big volume. And all the information is presented in different formats, and each of them must be properly analyzed. Consider, for example, the plist format. You can search for data in it in the classic way by keywords, for example, by the word key, as it is usually done. With this approach, you can get a fairly large number of triggers that are not related to sensitive information, and drown in parsing them. In doing so, it is very easy to lose sight of something important. 

But the main thing is not even quite clear initial search of the data itself, but validation that it is not stored anywhere else. For example, we find out that after authentication, a server sends us a Cookie value to be used further in the application. And we want to make sure that the value of this identifier does not occur anywhere in the data in clear form. At this point, you need to re-search this information for all the data you have collected.  

Here is a good recent example from our practice. It clearly demonstrates how this approach helps during analysis and how it can be used to identify the root cause of an issue. In one application, the value of a user-defined pin code was stored locally inside the application sandbox as a SHA512 hash. Although it is not open storage, the hashing algorithm was applied to a five-digit pin code made of numbers without using salt. If you search Google for the value of this hash, it will be in the first results. That is, on the one hand, this storage is not in the open, but on the other hand, it is not safe. The information was only formally protected. 

After analyzing a number of applications that have stored sensitive user information in this way, we believe it is necessary to search not only for the original information, but also for derivatives of it. The process of searching for such data can be carried out in several stages: 

  1. After an initial analysis of all collected data, the sensitive information processed or stored in the application is determined. 
  2. From each trigger from this list of sensitive data, derivatives in the form of md5, sha1, sha256, sha512 and base64 are calculated. 
  3. After that, you should go through all the collected data again and search again to see if the value of sensitive information and its derivatives are found somewhere. 

That way, if sensitive information or a derivative of it shows up somewhere, it will be possible to trace it and show where and on what basis the defect came from.  

This approach will uncover more complex chains and patterns in the storage and use of data in an application and help identify interesting vulnerabilities during analysis. And this data can be further applied to business logic analysis and other vulnerability testing. 

Here is another good example of how retrieval of secondary information can help. The analyzed application used a database encrypted with the popular SQLChipher library. Since the process of creating or opening such a base has long been known, you can intercept it in Runtime and look at the password value. 

What to do with this information next? It’s very simple. If an application uses a password to encrypt a database, it must be stored somewhere. Next, you need to try to find its value and its derivatives in all application data. If you are lucky, you can get by with a trivial study of the source code. But in a general case, the password value used may be hidden somewhere much deeper. And then the search approach described above can be very helpful. We immediately get a whole chain, fully characterizing vulnerability and the attack vector: the application uses an encrypted database, a specific password is used for encryption, the password is stored in the source code. Such a process is relevant for absolutely any information that has been identified as sensitive.  

Another example is related to the search of session ID values. In several applications, when a user authenticated, the value of the session IDs through a third-party library went into the system log or was stored in cached network requests in the file system. You can’t find that with a regular keyword search, but the proposed search approach makes it possible. 

So don’t forget to analyze the data collected during the analysis, and do it in several passes so as not to miss anything. 

Expanding the Search 

After finding sensitive information and further analysis of it and its derivatives, you can try to dig a little deeper. More precisely, to understand how to determine the vast number of different formats and variations of session headers, different formats and types of encryption keys that can be left in the source code or application resources. It is possible that they came during networking or from other places, all of which suggests an entropic analysis of the information. 

There exists the Gibberish Detector repository, which can detect a sequence of characters that does not look like regular words. This is a simple model based on a Markov chain. The essence of it is that by learning from an arbitrary text in English, the model understands how often letters follow each other. On this basis, the model analyzes the input data to see how often the same sequence of letters occurs. 

For ease of use, there is a repository that implements the functionality of this model in the form of a Python library. All you need to do is to get the strings of interest and send them to the input of the trained model. For example, you can take all strings longer than 16 characters. At the output you will get some value. The bigger it is, the more likely it is not words, but some random sequence of characters. And these are worth paying attention to, because they often turn out to be hashed values of interesting data, or even just values of tokens or keys. 

The second option without models is based on calculating Shannon entropy, also known as information entropy, for each text fragment longer than the same 16 characters (or any other number). You can find an algorithm to figure out how to do this in this article, and you can find the implementation in a tool called TruffleHog. It allows you to search for secrets in the source repositories all the way down to a specific commit, when and by whom this value was added. If you analyze applications with source code available to you, we recommend that you think about using this tool all the time. 

Such triggers are quite often false and they have to be additionally filtered and processed. But sometimes you can find really important things there that you might miss during the initial analysis. 

How to Store 

There are different options for storing sensitive information using encryption for Android, generating and storing keys in Security Enclave and Android KeyStore. In this article, we will look at how this can be implemented using the key expansion algorithm, that is, obtaining a strong encryption key from information provided by user (e.g., password) and with examples for iOS. 

In parallel with analysis of the process of creating a key based on a password and encrypting user data with it, we will talk about the terms used in cryptography. But before we get into the code and encryption, let us emphasize that it is not a good idea to try to copy what is written here directly. Rather, it is a material which allows you to structure information and find a starting point for your own secure implementation. 

Creating a Key 

A very common mistake with any encryption is to use a password as a key. What if user chooses a very simple or predictable password? How can we get the user to apply a key for encryption that is random and strong enough, i.e., has sufficient entropy? What if the user then remembers it and enters it every time to log in to the app or device? 

The solution is to use Key Stretching algorithms. This allows you to get an encryption key from a fairly simple password by applying the hash function to it several times along with a salt. The salt is some sequence of random data. A common mistake is to exclude the salt from the algorithm. The salt gives the key much more entropy. Without it, it is much easier to get/restore/find the key. Moreover, without using the salt, two identical passwords will have the same hash value and, consequently, the same final encryption key value.  

Another mistake is to use a predictable random number generator when generating salt. An example is the rand() function in C, which can be accessed from Swift or Objective-C. The result of this function can be very predictable. To create a sufficiently random salt, it is recommended to use the SecRandomCopyBytes function to generate a cryptographically secure sequence of random numbers. 

To use code from the following example, you need to add the following line to the headers:

1#import <CommonCrypto/CommonCrypto.h>


Below is the code that creates the salt:
var salt = Data(count: 8)
salt.withUnsafeMutableBytes { (saltBytes: UnsafeMutablePointer<UInt8>) -> Void in
let saltStatus = SecRandomCopyBytes(kSecRandomDefault, salt.count, saltBytes)
//... 
From here on in the text, we will add to this code bit by bit, bringing it to a complete form. 

PBKDF2 

Now let’s proceed with the key strengthening procedure. To do this, we will use the Password-Based Key Derivation Function (PBKDF2): 

  • PBKDF2 performs the strengthening function in several iterations to obtain the key. This is usually about ten thousand iterations. 
  • Increasing the number of iterations increases the time required for a successful brute force attack. 
var setupSuccess = true var key = Data(repeating:0, count:kCCKeySizeAES256) var salt = Data(count: 8) salt.withUnsafeMutableBytes { (saltBytes: UnsafeMutablePointer<UInt8>) -> Void in    let saltStatus = SecRandomCopyBytes(kSecRandomDefault, salt.count, saltBytes)    if saltStatus == errSecSuccess    {        let passwordData = password.data(using:String.Encoding.utf8)!        key.withUnsafeMutableBytes { (keyBytes : UnsafeMutablePointer<UInt8>) in            let derivationStatus = CCKeyDerivationPBKDF( CCPBKDFAlgorithm(kCCPBKDF2),
password,
 passwordData.count,
 saltBytes,
 salt.count,
 CCPseudoRandomAlgorithm(kCCPRFHmacAlgSHA512),
 14271,
 keyBytes,
 key.count)             if derivationStatus != Int32(kCCSuccess)            {
               setupSuccess = false            }        }    }    else    {        setupSuccess = false    } }

Modes and Initialization Vector 

Block encryption algorithms work with text of a certain length. If the message we want to encrypt is longer than a block the algorithm knows how to handle, it is simply divided into parts. Because of this, if the algorithm is not properly configured, there may be some specifics. For example, if when splitting into blocks the text in them coincides, then in the encrypted form we get the same ciphertext. Just to avoid such situations, use different options for linking blocks to each other: 

  • Electronic Code Book – ECB 
  • Cipher Block Chaining – CBC 
  • Cipher FeedbackCFB 
  • Output FeedbackOFB 
  • Counter Mode – CM, CTR 

ECB mode is the simplest option, where all blocks are encrypted independently of each other. 

 

ECB mode encryption 

This is exactly the case where blocks do not depend on each other and are encrypted separately. This is applied by default. If we simply specify AES when configuring encryption without any additional parameters, then this is the option that will be used. For this reason, it is strongly recommended to specify explicitly the mode of linking blocks with each other. 

CBC mode is one of the encryption modes for symmetric encryption using a feedback mechanism. This means that during encryption all blocks are linked and depend on each other. This approach avoids duplication of information in the same blocks. 

 

CBC mode encryption 

It is this mode that is most often recommended to use. We will not consider the other modes, as they differ from each other only in the way of the block chaining, and for us, in general, it does not matter. The main thing is to understand the difference between ECB and other modes. 

But there is one more problem: the first block remains the same in either mode. If the message to be encrypted starts the same as the other message, the initial encrypted text (the first block) will be the same in both cases. This will let an attacker know that the text in these blocks is the same.  

In order to avoid such problems, the concept of initialization vector (IV) is introduced.  

Initialization vector (IV) is an arbitrary number that can be used with a key to encrypt data.  Using IV prevents duplicates in encryption data in the first block. 

It is recommended to use the SecRandomCopyBytes function to generate the initialization vector: 

var iv = Data.init(count: kCCBlockSizeAES128)  iv.withUnsafeMutableBytes { (ivBytes : UnsafeMutablePointer<UInt8>) in     let ivStatus = SecRandomCopyBytes(kSecRandomDefault, kCCBlockSizeAES128, ivBytes)     if ivStatus != errSecSuccess     {         setupSuccess = false     }  }
Padding 

Block encryption algorithms work with plaintext messages whose length must be a multiple of the length of one block. If this condition is not met, then the necessary number of bits, called padding, are added to the message. 

This parameter specifies which way the shorter block should be padded. There are various options, but PKCS7 is the preferred method of padding the ciphertext block. It sets the value of each byte to be appended to the number of bytes to be appended. For example, if we have a block of 12 characters, it will be padded with four bytes [04, 04, 04, 04] to the standard block size of 16 bytes. If the block has a size of 15 bytes, it will be appended with one byte [01]. If the block has a size of exactly 16 bytes, we add a new block consisting of [16]*16. 

Note that some authors recommend not using padding at all because of the Padding Oracle Attack. So, it makes sense to consider whether to use it or not. 

Encryption and Decryption 

So, it’s time to tie everything together, perform password strengthening, encryption and decryption. Since we use a key strengthening algorithm, we don’t need to store it somewhere. Every time we need it, we will use the user data to generate it. For example, you can save a randomly generated value to a Keychain (necessarily with the correct access keys), protecting it using biometrics. Thus, the value will be accessed only after the user has confirmed biometric data. The resulting value should be passed as input to the PBKDF2 function to generate the key. As a result, the user will not need to enter a password/pin every time. It will be enough to provide a fingerprint or face. This scheme, of course, has its drawbacks, but it is quite good. Although, you could just use Security Enclave in the same way. 

For encryption and decryption we use the CCCrypt function with kCCEncrypt or kCCDecrypt. Since the block cipher is used, it is necessary to append the message if it does not match multiplicity of the block size. Using the KCCOptionPKCS7Padding parameter, define the padding type as PKCS7: 

Encrypt 

1class func encryptData(_ clearTextData : Data, withPassword password : String) -> Dictionary<String, Data> 

2{ 

3    var setupSuccess = true 

4    var outDictionary = Dictionary<String, Data>.init() 

5    var key = Data(repeating:0, count:kCCKeySizeAES256) 

6    var salt = Data(count: 8) 

7    salt.withUnsafeMutableBytes { (saltBytes: UnsafeMutablePointer<UInt8>) -> Void in 

8        let saltStatus = SecRandomCopyBytes(kSecRandomDefault, salt.count, saltBytes) 

9        if saltStatus == errSecSuccess 

10        { 

11            let passwordData = password.data(using:String.Encoding.utf8)! 

12            key.withUnsafeMutableBytes { (keyBytes : UnsafeMutablePointer<UInt8>) in 

13                let derivationStatus = CCKeyDerivationPBKDF(CCPBKDFAlgorithm(kCCPBKDF2), password, passwordData.count, saltBytes, salt.count, CCPseudoRandomAlgorithm(kCCPRFHmacAlgSHA512), 14271, keyBytes, key.count) 

14                if derivationStatus != Int32(kCCSuccess) 

15                { 

16                    setupSuccess = false 

17                } 

18            } 

19        } 

20        else 

21        { 

22            setupSuccess = false 

23        } 

24    } 

25      

26    var iv = Data.init(count: kCCBlockSizeAES128) 

27    iv.withUnsafeMutableBytes { (ivBytes : UnsafeMutablePointer<UInt8>) in 

28        let ivStatus = SecRandomCopyBytes(kSecRandomDefault, kCCBlockSizeAES128, ivBytes) 

29        if ivStatus != errSecSuccess 

30        { 

31            setupSuccess = false 

32        } 

33    } 

34      

35    if (setupSuccess) 

36    { 

37        var numberOfBytesEncrypted : size_t = 0 

38        let size = clearTextData.count + kCCBlockSizeAES128 

39        var encrypted = Data.init(count: size) 

40        let cryptStatus = iv.withUnsafeBytes {ivBytes in 

41            encrypted.withUnsafeMutableBytes {encryptedBytes in 

42            clearTextData.withUnsafeBytes {clearTextBytes in 

43                key.withUnsafeBytes {keyBytes in 

44                    CCCrypt(CCOperation(kCCEncrypt), 

45                            CCAlgorithm(kCCAlgorithmAES), 

46                            CCOptions(kCCOptionPKCS7Padding + kCCModeCBC), 

47                            keyBytes, 

48                            key.count, 

49                            ivBytes, 

50                            clearTextBytes, 

51                            clearTextData.count, 

52                            encryptedBytes, 

53                            size, 

54                            &numberOfBytesEncrypted) 

55                    } 

56                } 

57            } 

58        } 

59        if cryptStatus == Int32(kCCSuccess) 

60        { 

61            encrypted.count = numberOfBytesEncrypted 

62            outDictionary["EncryptionData"] = encrypted 

63            outDictionary["EncryptionIV"] = iv 

64            outDictionary["EncryptionSalt"] = salt 

65        } 

66    } 

67  

68    return outDictionary; 

69} 

And the decryption function: 

Decrypt 

1class func decryp(fromDictionary dictionary : Dictionary<String, Data>, withPassword password : String) -> Data 

2{ 

3    var setupSuccess = true 

4    let encrypted = dictionary["EncryptionData"] 

5    let iv = dictionary["EncryptionIV"] 

6    let salt = dictionary["EncryptionSalt"] 

7    var key = Data(repeating:0, count:kCCKeySizeAES256) 

8    salt?.withUnsafeBytes { (saltBytes: UnsafePointer<UInt8>) -> Void in 

9        let passwordData = password.data(using:String.Encoding.utf8)! 

10        key.withUnsafeMutableBytes { (keyBytes : UnsafeMutablePointer<UInt8>) in 

11            let derivationStatus = CCKeyDerivationPBKDF(CCPBKDFAlgorithm(kCCPBKDF2), password, passwordData.count, saltBytes, salt!.count, CCPseudoRandomAlgorithm(kCCPRFHmacAlgSHA512), 14271, keyBytes, key.count) 

12            if derivationStatus != Int32(kCCSuccess) 

13            { 

14                setupSuccess = false 

15            } 

16        } 

17    } 

18      

19    var decryptSuccess = false 

20    let size = (encrypted?.count)! + kCCBlockSizeAES128 

21    var clearTextData = Data.init(count: size) 

22    if (setupSuccess) 

23    { 

24        var numberOfBytesDecrypted : size_t = 0 

25        let cryptStatus = iv?.withUnsafeBytes {ivBytes in 

26            clearTextData.withUnsafeMutableBytes {clearTextBytes in 

27            encrypted?.withUnsafeBytes {encryptedBytes in 

28                key.withUnsafeBytes {keyBytes in 

29                    CCCrypt(CCOperation(kCCDecrypt), 

30                            CCAlgorithm(kCCAlgorithmAES128), 

31                            CCOptions(kCCOptionPKCS7Padding + kCCModeCBC), 

32                            keyBytes, 

33                            key.count, 

34                            ivBytes, 

35                            encryptedBytes, 

36                            (encrypted?.count)!, 

37                            clearTextBytes, 

38                            size, 

39                            &numberOfBytesDecrypted) 

40                    } 

41                } 

42            } 

43        } 

44        if cryptStatus! == Int32(kCCSuccess) 

45        { 

46            clearTextData.count = numberOfBytesDecrypted 

47            decryptSuccess = true 

48        } 

49    } 

50      

51    return decryptSuccess ? clearTextData : Data.init(count: 0) 

52} 

 

To verify that these functions work and that the encryption/decryption is correct, you can use a simple example: 

Example 

1class func encryptionTest() 

2{ 

3    let clearTextData = "some clear text to encrypt".data(using:String.Encoding.utf8)! 

4    let dictionary = encryptData(clearTextData, withPassword: "123456") 

5    let decrypted = decryp(fromDictionary: dictionary, withPassword: "123456") 

6    let decryptedString = String(data: decrypted, encoding: String.Encoding.utf8) 

7    print("decrypted cleartext result - ", decryptedString ?? "Error: Could not convert data to string") 

8} 

In this example, we package all the necessary information and return it as a dictionary, so that all the pieces can later be used to successfully decrypt the data. This requires storing IV and salt either in Keychain or on a server. 

Conclusion 

Data stored and processed in a mobile app should be treated with great care and attention. Apps run on the user’s device, that is, in an adverse environment. Besides, mobile app can be treated as another version of frontend. It’s not like we store a user’s password in the browser’s Local Storage (at least it should be). So why can we afford to do it on a mobile app?  

Unfortunately, problems with the storage of sensitive information are still at the top of the list in terms of prevalence. In practice, we encounter new cases almost every day.  With this article, we would like to help developers and security analysts understand how to look for such problems, what to look for, and most importantly, how to try to fix it and do it right.