Security – Romiko Derbynew

Embracing Pragmatism in Container Security

In the world of container orchestration, securing thousands of Docker containers is no small feat. But with a pragmatic approach and a keen understanding of risk assessment, it’s possible to create a secure environment that keeps pace with the rapid deployment of services.

The Risk Matrix: A Tool for Prioritization

At the heart of our security strategy is a Risk Matrix, a critical tool that helps us assess and prioritize vulnerabilities. The matrix classifies potential security threats based on the severity of their consequences and the likelihood of their occurrence. By focusing on Critical and High Common Vulnerabilities and Exposures (CVEs), we use this matrix to identify which issues in our Kubernetes clusters need immediate attention.

Risk Matrix – Use the following to deduce an action/outcome.

Likelihood: The Critical Dimension for SMART Security

To ensure our security measures are Specific, Measurable, Achievable, Relevant, and Time-bound (SMART), we add another dimension to the matrix: Likelihood. This dimension helps us pinpoint high-risk items that require swift action, balancing the need for security with the practicalities of our day-to-day operations.

DevSecOps: Tactical Solutions for the Security-Minded

As we journey towards a DevSecOps culture, we often rely on tactical solutions to reinforce security, especially if the organization is not yet mature in TOGAF Security practices. These solutions are about integrating security into every step of the development process, ensuring that security is not an afterthought but a fundamental component of our container management strategy.

Container Base Images

Often, you might find Critical and High CVEs that are not under your control but are due to a 3rd party base image; take Cert-Manager, External-DNS as prime examples, the backbone of many Kubernentes Clusters in the wild. These images will rely on Google’s GoLang Images, which in turn use a base image from jammy-tiny-stack. You see where I am going here? Many 3rd party images can lead you down a rabbit hole.

Remember, the goal is to MANAGE RISK, not eradicate risk; the latter is futile and leads to impractical security management. Look at ways to mitigate risks by reducing public service ingress footprints or improving North/South and East/West firewall solutions such as Calico Cloud. This allows you to contain security threats if a network segment is breached.

False Positives
Many CVEs contradict the security severity ratings, so always look at the Risk and likelihood e.g.
Though not every CVE is removed from the images, we take CVEs seriously and try to ensure that images contain the most up-to-date packages available within a reasonable time frame. For many of the Official Images, a security analyzer, like Docker Scout or Clair might show CVEs, which can happen for a variety of reasons:

The CVE has not been addressed in that particular image
- Upstream maintainers don’t consider a particular CVE to be a vulnerability that needs to be fixed and so won’t be fixed.
  - e.g., CVE-2005-2541 is considered a High severity vulnerability, but in Debian is considered “intended behavior,” making it a feature, not a bug.
- The OS Security team only has so much available time and has to deprioritize some security fixes over others. This could be because the threat is considered low or that it is too intrusive to backport to the version in “stable”.e.g., CVE-2017-15804 is considered a High severity vulnerability, but in Debian it is marked as a “Minor issue” in Stretch and no fix is available.
- Vulnerabilities may not have an available patch, and so even though they’ve been identified, there is no current solution.
The listed CVE is a false positive
- In order to provide stability, most OS distributions take the fix for a security flaw out of the most recent version of the upstream software package and apply that fix to an older version of the package (known as backporting).e.g., CVE-2020-8169 shows that curl is flawed in versions 7.62.0 though 7.70.0 and so is fixed in 7.71.0. The version that has the fix applied in Debian Buster is 7.64.0-4+deb10u2 (see security-tracker.debian.org and DSA-4881-1).
- The binary or library is not vulnerable because the vulnerable code is never executed. Security solutions make the assumption that if a dependency has a vulnerability, then the binary or library using the dependency is also vulnerable. This correctly reports vulnerabilities, but this simple approach can also lead to many false positives. It can be improved by using other tools to detect if the vulnerable functions are used. govulncheck is one such tool made for Go based binaries.e.g., CVE-2023-28642 is a vulnerability in runc less than version 1.1.5 but shows up when scanning the gosu 1.16 binary since runc 1.1.0 is a dependency. Running govulncheck against gosu shows that it does not use any vulnerable runc functions.
The security scanners can’t reliably check for CVEs, so it uses heuristics to determine whether an image is vulnerable. Those heuristics fail to take some factors into account:
- Is the image affected by the CVE at all? It might not be possible to trigger the vulnerability at all with this image.
- If the image is not supported by the security scanner, it uses wrong checks to determine whether a fix is included.
  - e.g., For RPM-based OS images, the Red Hat package database is used to map CVEs to package versions. This causes severe mismatches on other RPM-based distros.
  - This also leads to not showing CVEs which actually affect a given image.

Conclusion

By combining a risk-based approach with practical solutions and an eye toward DevSecOps principles, we’re creating a robust security framework that’s both pragmatic and effective. It’s about understanding the risks, prioritizing them intelligently, and taking decisive action to secure our digital landscape.

The TOGAF® Series Guide focuses on integrating risk and security within an enterprise architecture. It provides guidance for security practitioners and enterprise architects on incorporating security and risk management into the TOGAF® framework. This includes aligning with standards like ISO/IEC 27001 for information security management and ISO 31000 for risk management principles. The guide emphasizes the importance of understanding risk in the context of achieving business objectives and promotes a balanced approach to managing both negative consequences and seizing positive opportunities. It highlights the need for a systematic approach, embedding security early in the system development lifecycle and ensuring continuous risk and security management throughout the enterprise architecture.

Sometimes, security has to be driven from the bottom up. Ideally, it should be driven from the top down, but we are all responsible for security; if you own many platforms and compute runtimes in the cloud, you must ensure you manage risk under your watch. Otherwise, it is only a matter of time before you get pwned, something I have witnessed repeatedly.

The Real World:
1. Secure your containers from the bottom up in your CICD pipelines with tools like Snyk
2. Secure your containers from the top down in your cloud infrastructure with tools like Azure Defender – Container Security

3. Look at ways to enforce the above through governance and policies; this means you REDUCE the likelihood of a threat occurring from both sides of the enterprise.

4. Ensure firewall policies are in place to segment your network so that a breach in one area will not fan out in other network segments. This means you must focus initially on North / South Traffic (Ingress/EgresS) and then on East / West traffic (Traversing your network segments and domains).

There is a plethora of other risk management strategies from Penetration Testing, using honey pots to SIEM, ultimately you all can make a difference no matter where in the technology chart you sit.

Principles
The underlying ingredient for establishing a vision for your reorganisation in the TOGAF framework is defining the principles of the enterprise. In my view, protecting customer data is not just a legal obligation; it’s a fundamental aspect of building trust and ensuring the longevity of an enterprise.

Establishing a TOGAF principle to protect customer data during the Vision stage of enterprise architecture development is crucial because it sets the tone for the entire organization’s approach to cybersecurity. It ensures that data protection is not an afterthought but a core driver of the enterprise’s strategic direction, technology choices, and operational processes. With cyber threats evolving rapidly, embedding a principle of customer data protection early on ensures that security measures are integrated throughout the enterprise from the ground up, leading to a more resilient and responsible business.

Sources:
GitHub – docker-library/faq: Frequently Asked Questions

https://pubs.opengroup.org/togaf-standard/integrating-risk-and-security/integrating-risk-and-security_0.html

https://www.tigera.io/tigera-products/calico-cloud/

https://snyk.io/

A lot of organisations are still using the Implicit Flow for authorization when their client applications are browser based e.g. ReactJS/NodeJS. The problem with this workflow is that it was designed when browsers had a lot less capabilities.

Implicit Grant

Implicit Grant flow leverages a redirect with the access token in the url.

If someone gains access to your browser history and your token has not yet expired. They can then gain full access to your resources.

As we can see above, this is in the url on a redirect. So a simple script can find these tokens in your browser history. Step 6 below in the Implicit grant is where the issue occurs and the token is recorded in your history.

If your clients are using modern browsers that support CORS via javascript. Then the solution is to use a flow where step 6 is not an HTTP Get Redirect (302). Ideally we want an http post.

Authorization Workflow with PKCE

Proof Key for Code Exchange – The PKCE extension prevents an attack where the authorization code is intercepted and exchanged for an access token by a malicious client, by providing the authorization server with a way to verify the same client instance that exchanges the authorization code is the same one that initiated the flow.

Secondly instead of an HTTP GET, an HTTP POST is used to send the token over the wire. Thus the exchange is not recorded in the browser history at all. The token is sent as a payload in the HTTP data section and not the URL.

Notice below the token is requested via an HTTP POST (ClientID, (v), (a) on step 8.

The user clicks Login within the native/mobile application.
Auth0’s SDK creates a cryptographically-random code_verifier and from this generates a code_challenge.
Auth0’s SDK redirects the user to the Auth0 Authorization Server (/authorize endpoint) along with the code_challenge.
Your Auth0 Authorization Server redirects the user to the login and authorization prompt.
The user authenticates using one of the configured login options and may see a consent page listing the permissions Auth0 will give to the mobile application.
Your Auth0 Authorization Server stores the code_challenge and redirects the user back to the application with an authorization code.
Auth0’s SDK sends this code and the code_verifier (created in step 2) to the Auth0 Authorization Server (/oauth/token endpoint).
Your Auth0 Authorization Server verifies the code_challenge and code_verifier.
Your Auth0 Authorization Server responds with an ID Token and Access Token (and optionally, a Refresh Token).
Your application can use the Access Token to call an API to access information about the user.
The API responds with requested data.

So… I should just use Authorization Workflow with PKCE? Not so fast. If you have a large customer base that are using older browsers that do not support CORS via javascript, you might be stuck with implicit grant.

Another consideration is that the token endpoint on your Authorization Server of choice MUST support CORS for the trick to come together; not every major vendor supports it yet.

Figure 2: Authorization code grant in a SPA application

However, if you can influence you customers and client browsers to use later versions and security is a big item on your list. This might be the best case to put forward an upgrade plan.

In Summary

Does your Authorization Server supprot CORS?
Can your clients use modern browsers that support CORS?

If the answer is yes to both, then there is no need to use Implicit Grant with OAuth 2.0

OKTA ands Auth0 are some of the ID providers that support PKCE in SPA clients.

Note: Microsoft Identity V2.0 does not currently support Auth Code Workflow with SPA. This may change in the future as it is a new product and MS are investing in V2.

Romiko Derbynew

Securing Docker Containers with a Risk-Based Approach

OAuth 2.0 – Authorization Code with PKCE vs Implicit Grant

Implicit Grant

Authorization Workflow with PKCE