

In the context of HDFS encryption, the KMS performs three basic responsibilities: The EDEK of a file will generated using the encryption zone key from the closest ancestor encryption zone.Ī new cluster service is required to manage encryption keys: the Hadoop Key Management Server (KMS). on the root directory /), a user can create more encryption zones on its descendant directories (e.g. After an encryption zone is created (e.g. To support this strong guarantee without losing the flexibility of using different encryption zone keys in different parts of the filesystem, HDFS allows nested encryption zones. HDFS datanodes simply see a stream of encrypted bytes.Ī very important use case of encryption is to “switch it on” and ensure all files across the entire filesystem are encrypted. Clients decrypt an EDEK, and then use the subsequent DEK to read and write data. Instead, HDFS only ever handles an encrypted data encryption key (EDEK). Each file within an encryption zone has its own unique data encryption key (DEK). Each encryption zone is associated with a single encryption zone key which is specified when the zone is created. An encryption zone is a special directory whose contents will be transparently encrypted upon write and transparently decrypted upon read. The operating system and disk only interact with encrypted bytes, since the data is already encrypted by HDFS.įor transparent encryption, we introduce a new abstraction to HDFS: the encryption zone. HDFS-level encryption also prevents attacks at the filesystem-level and below (so-called “OS-level attacks”). HDFS also has more context than traditional filesystems when it comes to making policy decisions. HDFS encryption is able to provide good performance and existing Hadoop applications are able to run transparently on encrypted data. HDFS-level encryption fits between database-level and filesystem-level encryption in this stack. Only really protects against physical theft. Easy to deploy and high performance, but also quite inflexible. A database might want different encryption settings for each column stored within a single file.ĭisk-level encryption. For instance, multi-tenant applications might want to encrypt based on the end user. However, it is unable to model some application-level policies. This option offers high performance, application transparency, and is typically easy to deploy. One example is that indexes cannot be encrypted.įilesystem-level encryption. However, there can be performance issues. Most database vendors offer some form of encryption. Similar to application-level encryption in terms of its properties.

This is also not an option for customers of existing applications that do not support encryption.ĭatabase-level encryption. However, writing applications to do this is hard. The application has ultimate control over what is encrypted and can precisely reflect the requirements of the user. This is the most secure and most flexible approach. Choosing to encrypt at a given layer comes with different advantages and disadvantages.Īpplication-level encryption. Running Applications in runC ContainersĮncryption can be done at different layers in a traditional data management software/hardware stack.Running Applications in Docker Containers.
