CI/CD platforms logging secret scrubbing is a misfeature

Many CI/CD workflows require sensitive data to perform their tasks, and as security conscious individuals we all like to prevent those secrets from being exposed. Following that logic one would think that implementing log scrubbing to prevent secrets from accidently being exposed would be useful and good.

But here is the thing, it doesn't actually prevent secrets from being exposed and its actually impossible to prevent it. At some point while implementing an improvement to a workload a developer is likely going to need to check the secret. Maybe they just need to see if the value is properly set, ensure its formatted correctly for the intended usage, some other reason we really don't need to care about. The reason doesn't really matter. What matters is they are going to need to see it, and as a result they're going to expose it and there is no implementation of scrubbing that will prevent them from doing so.

Now without scrubbing its as simple as printing the value, then once confirmed deleting the log statement and moving on. This is less than ideal obviously, as the now the value was pushed to the logs and given how most ci/cd platforms work will be exposed for as long as those logs are saved, usually forever.

echo "${SENSITIVE_VALUE}"

Naturally our security concious team members are like 'well thats the worst situation imaginable' and request from the operations team that something be done to prevent this from occurring resulting in a product manager getting their team together to brainstorm and come up ideas. Which naturally results in scrubbing the logs! Its simple and effective! What more could we ask for?

Well you see... you shouldn't be telling your CI/CD system your sensitive values to begin with and in order to implement such a feature you absolutely need to. Your sensitive values should be stored securely in a secret manager and only read when needed.

Now this simple feature in order for it to prevent secret exposure requires you to.... expose your secrets.... oops.

Even worse by implementing such a feature you've now made everyones life harder down stream when it comes to maintaining the workloads run by the platform and you've started an war between development and security, developers will continue to find inventive ways to get around the functionality that security has put in place to prevent sensitive data leaks.

I, (as a developer), need to confirm my workload value is being made available properly to my workload and I tried simply printing the value only to discover the scrubbing feature. What does a developer do in this situation? well its simple, obfuscation! lets base64 this baby.

echo "${SENSITVE_VALUE}" | base64

problem solved! well guess what, I've still exposed the secret in an easily reversible manner and on top of that I've made it harder at a glance to tell I've done so. I've also had to burn additional mental effort to do figure this out, but as a result I discover the issue with the workload and I'm happy.

Then what happens? Well, our security team gets wind of this workaround to their simple and effective process and decide to improve their scrubbing mechanism, which I remind you requires them to expose the secrets in the first place, and implement scrubbing of the base64 encoded values! Developers discover this and switch to hex encoding, only to discover the security team is one step ahead of them and have taken care of many of the common binary encoding formats.

easy enough the developer thinks I'll just change the output of the base64 by adding a salt! will handle this nonsense and viola!

echo "DERP ${SENSITVE_VALUE}" | base64

At this point I hope everyone sees where I'm going with this, we've started a war between the developers working on building and maintaining workloads and the security team trying to prevent sensitive values from leaking. A war the security team will never win, I can still get past all the bits and bobs github has implemented to protect sensitive values with ease barely breaks a sweat.

The next stage is AST processing to identify references to the secret value being used and scrubbing the entire output of that line. Which then leads developers to develop scripts that do this so it can't directly analyzed, so on and so forth. Now your company and team are wasting orders of magnitude of time and effort in a cat and mouse game that will never be won and will continue to grow in complexity.

Want to know the real solution here? Its also a form of defense in depth and involves multiple stages but they're stages you're likely already doing.

  1. Development environments with a focus in this case on ensuring local first functionality. (more on that in a minute)
  2. Peer review / Automated linters. This process will help prevent the log statement from being released into the more sensitive environments like staging and production. The automated linters will match the behavior of the scrubbers but don't impede the developers workflow or ability to debug issues.
  3. add security controls to the logs themselves, since sensitive data can show up in them they are inherently sensitive.
  4. automatically delete logs after a period of time and give the ability to delete them on demand.

These steps are more effective at preventing sensitive values from being exposed and access managed to the logs managed without starting a competing functionality war between the various teams within a company. Eg daemon has (or plans to have) all of these. We already automatically delete logs of workloads after a period of time. We'll implement linting tools that will handle common exposure patterns on the platform without impeding development.

Finally eg is built with local first development as a guiding principle for our platform. Allowing developers to rapidly iterate and test their work. We've also prevented starting a race to the bottom war between the two groups within the company and instead have translated the effort into a cooperative task between the teams and given the tools necessary to handle exposures quickly and gracefully when they do happen. And by doing all of this we've made it so one less system has to be directly aware of your sensitive values.

Now your company can focus on its product and not on the tension between your teams created by a well intentioned misfeature.

p.s. oh did I mention that automated scrubbing can expose secrets via attacks? lets say your company has a secret, and its not a particular good one for the sake of this argument, lets say its '176c3335-0af1-4ca9-ab64-8420a90322e2'. And I, as the neferious human that I am, have access to your workload definitions because you're contracting this work out, or because you used my very useful public workflow. Your company is doing most of the right things. I only have access to the development environment and read/pull request permissions for the source forge application (say... github...). Well I suspect you're sharing secrets between environments how can I confirm this? well thankfully you've implemented log scrubbing. All I need to do is:

echo "checkpoint 176c3335-0af1-4ca9-ab64-8420a90322e2"

and let the workload run for the sensitive environment:

echo "checkpoint ********"

Thanks for the confirmation log scrubber, you've been of great assistence. And before you say well don't share secrets between environments duh! This works with as long as the attacker can see the results from the log scrubbing they can confirm if they've found credentials. Same with encrypting the secrets before emitting them to logs bypassing any checks the scrubber could make.

Eg helps mitigate these types of supply chain attacks in a couple of ways.

  1. eg runs everything inside a container and is local first by default which limits the opportunities for exfiltration.
  2. we highly restrict what secrets get injected into environment variables. as of writing we only inject a single token to clone your repository. and we have plans for that to disappear as well.
  3. because we dont inject secrets, an attacker has to know were your secrets are stored, the ids to look them up, and have the necessary credentials available to read them. all of which can be finely controlled in eg.
  4. we don't make logs public. they're sensitive information remember?
  5. explicit dependency management. which allows us to rollout workload protections rapidly system wide for compromised supply chains.