Microsoft Excel Workbook Encryption
24 Nov 2021 · Comments: · Tags: MicrosoftOffice, encryption, security, PowerShellSummary
I recently found myself considering how a non-technical user of Microsoft Windows in a business environment might go about encrypting text within a file. Since Microsoft Office has a rather ubiquitous presence in the workplace I decided to look into what it had to offer in terms of producing encrypted files. For the purpose of this post I’m focusing specifically on Microsoft Excel workbook files.
XLS Workbooks
Microsoft Office 2003 was the last version of Microsoft Office to favour file
formats that were based on the OLE Compound File Binary Format. The file
extension for Excel workbooks under this format was XLS
.
At the time of writing (November 2021) it’s still possible to save a workbook
as an XLS
; indeed, Excel 2021 and Excel for Microsoft 365 (Version 2109, Build 16.0.14430.20314)
provide the option to save a workbook using two variations of this format:
Excel 97-2003 Workbook (*.xls)
Microsoft Excel 5.0/95 Workbook (*.xls)
Saving a workbook using one of the two variations above will produce a file that’s
faithful to the respective implementation of the format including adherence to
its password protection specification. So in the case of Excel 97-2003 Workbook (*.xls)
,
protection is achieved using RC4 encryption and MD5 hashing, whereas Microsoft Excel 5.0/95 Workbook (*.xls)
uses XOR obfuscation. For further
details see Microsoft Office encryption evolution: from Office 97 to Office 2019.
To demonstrate these differences I created two password protected XLS
files in Excel 2021:
-
Excel_97-2003_Workbook.xls
saved using the file formatExcel 97-2003 Workbook (*.xls)
. -
Microsoft_Excel_5.0_95_Workbook.xls
saved using the file formatMicrosoft Excel 5.0/95 Workbook (*.xls)
.
Within each file I launched the Visual Basic Editor (Alt+F11), opened the Immediate Window (Ctrl+G) and obtained the workbook’s password protection related properties:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Debug.Print ActiveWorkbook.Name
Excel_97-2003_Workbook.xls
Debug.Print Application.Application + ", Version: " + Application.Version + ", Build:" Trim(Str(Application.Build))
Microsoft Excel, Version: 16.0, Build: 14326
Debug.Print ActiveWorkbook.PasswordEncryptionAlgorithm
RC4
Debug.Print ActiveWorkbook.PasswordEncryptionFileProperties
False
Debug.Print ActiveWorkbook.PasswordEncryptionKeyLength
128
Debug.Print ActiveWorkbook.PasswordEncryptionProvider
Microsoft Enhanced Cryptographic Provider v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Debug.Print ActiveWorkbook.Name
Microsoft_Excel_5.0_95_Workbook.xls
Debug.Print Application.Application + ", Version: " + Application.Version + ", Build:" Trim(Str(Application.Build))
Microsoft Excel, Version: 16.0, Build: 14326
Debug.Print ActiveWorkbook.PasswordEncryptionAlgorithm
OfficeXor
Debug.Print ActiveWorkbook.PasswordEncryptionFileProperties
False
Debug.Print ActiveWorkbook.PasswordEncryptionKeyLength
-1
Debug.Print ActiveWorkbook.PasswordEncryptionProvider
Office
I also repeated the above using Excel for Microsoft 365 (Version 2109, Build 16.0.14430.20314)
and the result was exactly the same except for line 5, where the output was:
Microsoft Excel, Version: 16.0, Build: 14430
.
An alternative way to obtain this information is to parse the file, which is
something you’d need to do if you don’t have a copy of Excel and/or you don’t
know a workbook’s password and therefore can’t launch VBA. I’m not going to cover
this in detail but one such way is using the Python script office2john.py
which is intended to be run from the command line and retrieve the password hash
contained within a Microsoft Office file so it can be fed into the password
cracking tool John the Ripper
(specifically the community-enhanced, “jumbo” version). When I ran this script
against Microsoft_Excel_5.0_95_Workbook.xls
the output helpfully included
Excel 95 XOR obfuscation detected
. The output returned for
Excel_97-2003_Workbook.xls
wasn’t quite so helpful but by stepping through the
code with some strategically placed print statements (aka printf() debugging),
I was able to determine the encryption algorithm, key length and the encryption
provider.
XLSX Workbooks
Since the release of Microsoft Office 2007, the default file formats in
Microsoft Office have been based on Office Open XML.
This lead to the introduction of a new password protection specification using
AES encryption and SHA hashing. For Microsoft Excel workbooks, the adoption of
Office Open XML resulted in the creation of a new file extension, XLSX
.
VBA can be used to obtain the password protection related settings of an XLSX file in exactly the same way as an XLS file. Here’s an example of what’s returned against the XLSX workbook that was created in Excel 2021:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Debug.Print ActiveWorkbook.Name
Excel_Workbook.xlsx
Debug.Print Application.Application + ", Version: " + Application.Version + ", Build:" Trim(Str(Application.Build))
Microsoft Excel, Version: 16.0, Build: 14326
Debug.Print ActiveWorkbook.PasswordEncryptionAlgorithm
Debug.Print ActiveWorkbook.PasswordEncryptionFileProperties
True
Debug.Print ActiveWorkbook.PasswordEncryptionKeyLength
256
Debug.Print ActiveWorkbook.PasswordEncryptionProvider
I also repeated the above using Excel for Microsoft 365 (Version 2109, Build 16.0.14430.20314)
and the result was exactly the same except for line 5, where the output was:
Microsoft Excel, Version: 16.0, Build: 14430
.
You may have noticed that in the output above the PasswordEncryptionAlgorithm
and PasswordEncryptionProvider
properties are empty. These properties don’t
appear to be populated in XLSX files. Regarding PasswordEncryptionAlgorithm
, its
absence is no great loss because we know it’s going to be AES
. Regarding
PasswordEncryptionProvider
, I’ve struggled to find a proper definition of this
property but for the the purpose of this blog post I don’t think it’s something
we need to be concerned with.
What is of concern is that none of the password protection related properties
exposed in VBA cover hashing which is a problem when dealing with XLSX files
because the hashing implementation isn’t static, it’s subject to change between
Office versions, so assumptions cannot be made. The aspects of hashing that I
was specifically interested in obtaining was the algorithm and the number of
iterations performed (hashing the password + salt and then iterating over itself n
times).
In pursuit of answers I stumbled upon the ExcelTable README
which helped tip me off that Office 2007 uses something called the Standard
encryption
method and subsequent versions use the Agile
encryption method. I decided to
confine my research to the Agile
method only; to find out more I consulted the
[MS-OFFCRYPTO]: Office Document Cryptography Structure
documentation, the latest version of which was published on 2021-10-05, see:
[MS-OFFCRYPTO]-211005.pdf
which contains the following:
-
Section
1.3.3 Encryption
explains that one of the mechanisms for creating password-protected documents isECMA-376 document encryption
withAgile encryption
which uses an XMLEncryptionInfo
structure. -
Section
2.3.4.10 \EncryptionInfo Stream (Agile Encryption)
shows that the first four bytes of theEncryptionInfo
stream is occupied byEncryptionVersionInfo
, followed by another four bytes occupied byReserved
, followed byXmlEncryptionDescriptor
which is an XML element.
Equipped with this information, I proceeded to create a password protected XLSX
file in Excel 2021 and inspected its EncryptionInfo
stream using PowerShell as
follows:
I then retrieved only the values that were of interest to me from the XML:
I repeated the above using Excel for Microsoft 365 (Version 2109, Build 16.0.14430.20314) and the result was exactly the same.
If you’re wondering why the property names cipherAlgorithm
, keyBits
and
hashAlgorithm
appear twice in the XML then bear with me, I will attempt to
explain this a bit later. What I will explain now is the purpose of spinCount
(because it’s perhaps not immediately obvious from the name), it refers to the
number of hash iterations performed.
In addition to Microsoft Office Pro Plus 2021 and Excel for Microsoft 365 (Version 2109, Build 16.0.14430.20314), I also happened to have access to the Pro Plus version of 2010, 2013 and 2016, so I decided to extract the password protection settings from an XLSX workbook created in each of these versions for comparison purposes:
$xml.encryption.keyData | Select-Object cipherAlgorithm, keyBits, hashAlgorithm
cipherAlgorithm | keyBits | hashAlgorithm | |
---|---|---|---|
Excel 2010 | AES | 128 | SHA1 |
Excel 2013 | AES | 256 | SHA512 |
Excel 2016 | AES | 256 | SHA512 |
Excel 2021 | AES | 256 | SHA512 |
Excel for Microsoft 365 Version 2109, Build 16.0.14430.20314 |
AES | 256 | SHA512 |
$xml.encryption.keyEncryptors.keyEncryptor.encryptedKey | Select-Object cipherAlgorithm, keyBits, hashAlgorithm, spinCount
cipherAlgorithm | keyBits | hashAlgorithm | spinCount | |
---|---|---|---|---|
Excel 2010 | AES | 128 | SHA1 | 100000 |
Excel 2013 | AES | 256 | SHA512 | 100000 |
Excel 2016 | AES | 256 | SHA512 | 100000 |
Excel 2021 | AES | 256 | SHA512 | 100000 |
Excel for Microsoft 365 Version 2109, Build 16.0.14430.20314 |
AES | 256 | SHA512 | 100000 |
Regarding the XML containing two instances of the property names cipherAlgorithm
,
keyBits
and hashAlgorithm
, Chris Morgan has written a blog post on
Default Encryption Settings and Behaviors for OneNote 2013 (Office 365)
which explains that a user’s password is used to generate a key (referred to
as the user key) which encrypts an intermediate key (produced from a random
array of bytes) and it’s the intermediate key that’s responsible for encrypting
the data. Knowing that there are two layers of encryption goes some way to
explaining the repetition of these property names.
Here’s my attempt at trying to identify the role of each property based on Chris Morgan’s breakdown of the encryption process using a sample OneNote 2013 file:
- A user key is derived from the password. This is achieved by combining the
password with a salt and producing a hash using
$xml.encryption.keyEncryptors.keyEncryptor.encryptedKey.hashAlgorithm
. The salted hash is then iteratively hashedx
times according to the value of$xml.encryption.keyEncryptors.keyEncryptor.encryptedKey.spinCount
. Chris explains that “…the hash function’s final output is truncated to match the keyBits attribute’s value”, I’m going to assume that’s thekeyBits
under$xml.encryption.keyEncryptors.keyEncryptor.encryptedKey
rather than under$xml.encryption.keyData
. - An intermediate key is produced from a random array of bytes and
keyBits
under$xml.encryption.keyData
is used to specify its size. Chris doesn’t mention whether thehashAlgorithm
under$xml.encryption.keyData
is used in the production of the intermediate key. - The intermediate key is encrypted with the user key and the
cipherAlgorithm
under$xml.encryption.keyEncryptors.keyEncryptor.encryptedKey
. - The intermediate key and the
cipherAlgorithm
under$xml.encryption.keyData
are used to encrypt the data.
When each occurrence of cipherAlgorithm
, keyBits
and hashAlgorithm
contains
the same value there’s really no need to try and understand the precise purpose
that each serves. However, there may be occasions when the values differ, read
on or a few such examples…
I wanted to know what effect (if any) manipulating an XLSX workbook in Excel 2021 would have on a file that was created in Excel 2010, so I conducted a few tests:
1) Action: Opened and closed the file.
Outcome: No change in encryption settings.
2) Action: Added some text to a cell and saved the file.
Outcome: No change in encryption settings.
3) Action: Changed the existing password.
Outcome: See below.
4) Action: Removed the existing password and then added a new password.
Outcome: See below.
I repeated the above using Excel for Microsoft 365 (Version 2109, Build 16.0.14430.20314) and the result was exactly the same.
The examples above illustrate an important point for system administrators which is just because you’ve deployed the latest version of Microsoft Office in your environment it doesn’t mean that all existing password protected files will automatically begin to use the AES and SHA capabilities it has to offer.
Excel for the Web
Excel for the web is a browser based version of the product that’s available free of charge with a Microsoft account or through a paid subscription using Microsoft 365.
I first began performing research for this blog post in early October 2021 and
at that time
the Microsoft Support article entitled Differences between using a workbook in the browser and in Excel
stated that: “Workbooks that are protected (encrypted with password protection)
cannot be viewed in a browser window. To edit, open the workbook in Excel on the
desktop.”. Contrary to this claim, I found that password protected workbooks
opened just fine using Excel for the web but I guess this limitation must have
existed at some point in the product’s history. Anyhow, the article
has since been updated and as at 2021-11-23 it now states:
“Workbooks which are protected (encrypted with password protection) can be viewed and edited in Excel for the web.”.
Whilst opening existing password protected workbooks in Excel for the web is not a problem it’s not possible to modify a password or create a new password protected workbook.
Comments