Author: Romiko Derbynew

Demystifying .NET Core Memory Leaks: A Debugging Adventure with dotnet-dump

It has been a while since I wrote about memory dump analysis; the last post on the subject was back in 2011. Lets get stuck into the dark arts.

First and foremost, .NET is very different to .NET Core, down to App Domains and how the MSIL is executed. Understanding this is crucial before you kick off a clrstack! or dumpdomain! Make sure you understand the architecture of what you debugging from ASP to console apps. Dumpdomain caught me off guard, as you would use dumpdomain to get the sourcecode and decompile via the PDB files in the past.

Feature	ASP.NET Core	ASP.NET Framework
Cross-Platform Support	Runs on Windows, Linux, and macOS.	Primarily runs on Windows.
Hosting	Can be hosted on Kestrel, IIS, HTTP.sys, Nginx, Apache, and Docker.	Typically hosted on IIS.
Performance	Optimized for high performance and scalability.	Good performance, but generally not as optimized as ASP.NET Core.
Application Model	Unified model for MVC and Web API.	Separate models for MVC and Web API.
Configuration	Uses a lightweight, file-based configuration system (appsettings.json).	Uses web.config for configuration.
Dependency Injection	Built-in support for dependency injection.	Requires third-party libraries for dependency injection.
App Domains	Uses a single app model and does not support app domains.	Supports app domains for isolation between applications.
Runtime Compilation	Supports runtime compilation of Razor views (optional).	Supports runtime compilation of ASPX pages.
Modular HTTP Pipeline	Highly modular and configurable HTTP request pipeline.	Fixed HTTP request pipeline defined by the Global.asax and web.config.
Package Management	Uses NuGet for package management, with an emphasis on minimal dependencies.	Also uses NuGet but tends to have more complex dependency trees.
Framework Versions	Applications target a specific version of .NET Core, which is bundled with the app.	Applications target a version of the .NET Framework installed on the server.
Update Frequency	Rapid release cycle with frequent updates and new features.	Slower release cycle, tied to Windows updates.
Side-by-Side Deployment	Supports running multiple versions of the app or .NET Core side-by-side.	Does not support running multiple versions of the framework side-by-side for the same application.
Open Source	Entire platform is open-source.	Only a portion of the platform is open-source.

So we embark on a quest to uncover hidden memory leaks that lurk within the depths of .NET Core apps, armed with the mighty dotnet-dump utility. This tale of debugging prowess will guide you through collecting and analyzing dump files, uncovering the secrets of memory leaks, and ultimately conquering these elusive beasts.

Preparing for the Hunt: Installing dotnet-dump

Our journey begins with the acquisition of the dotnet-dump tool, a valiant ally in our quest. This tool is a part of the .NET diagnostics toolkit, designed to collect and analyze dumps without requiring native debuggers. It’s a lifesaver on platforms like Alpine Linux, where traditional tools shy away.

To invite dotnet-dump into your arsenal, you have two paths:

The Global Tool Approach: Unleash the command dotnet tool install --global dotnet-dump into your terminal and watch as the latest version of the dotnet-dump NuGet package is summoned.
The Direct Download: Navigate to the mystical lands of the .NET website and download the tool executable that matches your platform’s essence.

The First Step: Collecting the Memory Dump

With dotnet-dump by your side, it’s time to collect a memory dump from the process that has been bewitched by the memory leak. Invoke dotnet-dump collect --process-id <PID>, where <PID> is the identifier of the cursed process. This incantation captures the essence of the process’s memory, storing it in a file for later analysis.

The Analytical Ritual: Unveiling the Mysteries of the Dump

Now, the real magic begins. Use dotnet-dump analyze <dump_path> to enter an interactive realm where the secrets of the dump file are yours to discover. This enchanted shell accepts various SOS commands, granting you the power to scrutinize the managed heap, reveal the relationships between objects, and formulate theories about the source of the memory leak.

Common Spells and Incantations:

clrstack: Summons a stack trace of managed code, revealing the paths through which the code ventured.
dumpheap -stat: Unveils the statistics of the objects residing in the managed heap, highlighting the most common culprits.
gcroot <address>: Traces the lineage of an object back to its roots, uncovering why it remains in memory.

The Final Confrontation: Identifying the Memory Leak

Armed with knowledge and insight from the dotnet-dump analysis, you’re now ready to face the memory leak head-on. By examining the relationships between objects and understanding their roots, you can pinpoint the source of the leak in your code.

Remember, the key to vanquishing memory leaks is patience and perseverance. With dotnet-dump as your guide, you’re well-equipped to navigate the complexities of .NET Core memory management and emerge victorious.

Examine managed memory usage

Before you start collecting diagnostic data to help root cause this scenario, make sure you’re actually seeing a memory leak (growth in memory usage). You can use the dotnet-counters tool to confirm that.

Open a console window and navigate to the directory where you downloaded and unzipped the sample debug target. Run the target:

dotnet run

From a separate console, find the process ID:

dotnet-counters ps

The output should be similar to:

4807 DiagnosticScena /home/user/git/samples/core/diagnostics/DiagnosticScenarios/bin/Debug/netcoreapp3.0/DiagnosticScenarios

Now, check managed memory usage with the dotnet-counters tool. The --refresh-interval specifies the number of seconds between refreshes:

dotnet-counters monitor --refresh-interval 1 -p 4807

The live output should be similar to:

Press p to pause, r to resume, q to quit.
    Status: Running

[System.Runtime]
    # of Assemblies Loaded                           118
    % Time in GC (since last GC)                       0
    Allocation Rate (Bytes / sec)                 37,896
    CPU Usage (%)                                      0
    Exceptions / sec                                   0
    GC Heap Size (MB)                                  4
    Gen 0 GC / sec                                     0
    Gen 0 Size (B)                                     0
    Gen 1 GC / sec                                     0
    Gen 1 Size (B)                                     0
    Gen 2 GC / sec                                     0
    Gen 2 Size (B)                                     0
    LOH Size (B)                                       0
    Monitor Lock Contention Count / sec                0
    Number of Active Timers                            1
    ThreadPool Completed Work Items / sec             10
    ThreadPool Queue Length                            0
    ThreadPool Threads Count                           1
    Working Set (MB)                                  83

Focusing on this line:

    GC Heap Size (MB)                                  4

You can see that the managed heap memory is 4 MB right after startup.

Now, go to the URL https://localhost:5001/api/diagscenario/memleak/20000.

Observe that the memory usage has grown to 30 MB.

GC Heap Size (MB)                                 30


By watching the memory usage, you can safely say that memory is growing or leaking. The next step is to collect the right data for memory analysis.

Generate memory dump

When analyzing possible memory leaks, you need access to the app’s memory heap to analyze the memory contents. Looking at relationships between objects, you create theories as to why memory isn’t being freed. A common diagnostic data source is a memory dump on Windows or the equivalent core dump on Linux. To generate a dump of a .NET application, you can use the dotnet-dump tool.

Using the sample debug target previously started, run the following command to generate a Linux core dump:

dotnet-dump collect -p 4807

The result is a core dump located in the same folder.

Writing minidump with heap to ./core_20190430_185145
Complete

For a comparison over time, let the original process continue running after collecting the first dump and collect a second dump the same way. You would then have two dumps over a period of time that you can compare to see where the memory usage is growing.

Restart the failed process

Once the dump is collected, you should have sufficient information to diagnose the failed process. If the failed process is running on a production server, now it’s the ideal time for short-term remediation by restarting the process.

In this tutorial, you’re now done with the Sample debug target and you can close it. Navigate to the terminal that started the server, and press Ctrl+C.

Analyze the core dump

Now that you have a core dump generated, use the dotnet-dump tool to analyze the dump:

dotnet-dump analyze core_20190430_185145

Where core_20190430_185145 is the name of the core dump you want to analyze.

If you see an error complaining that libdl.so cannot be found, you may have to install the libc6-dev package. For more information, see Prerequisites for .NET on Linux.

You’ll be presented with a prompt where you can enter SOS commands. Commonly, the first thing you want to look at is the overall state of the managed heap:

> dumpheap -stat

Statistics:
              MT    Count    TotalSize Class Name
...
00007f6c1eeefba8      576        59904 System.Reflection.RuntimeMethodInfo
00007f6c1dc021c8     1749        95696 System.SByte[]
00000000008c9db0     3847       116080      Free
00007f6c1e784a18      175       128640 System.Char[]
00007f6c1dbf5510      217       133504 System.Object[]
00007f6c1dc014c0      467       416464 System.Byte[]
00007f6c21625038        6      4063376 testwebapi.Controllers.Customer[]
00007f6c20a67498   200000      4800000 testwebapi.Controllers.Customer
00007f6c1dc00f90   206770     19494060 System.String
Total 428516 objects

Here you can see that most objects are either String or Customer objects.

You can use the dumpheap command again with the method table (MT) to get a list of all the String instances:

> dumpheap -mt 00007f6c1dc00f90

         Address               MT     Size
...
00007f6ad09421f8 00007faddaa50f90       94
...
00007f6ad0965b20 00007f6c1dc00f90       80
00007f6ad0965c10 00007f6c1dc00f90       80
00007f6ad0965d00 00007f6c1dc00f90       80
00007f6ad0965df0 00007f6c1dc00f90       80
00007f6ad0965ee0 00007f6c1dc00f90       80

Statistics:
              MT    Count    TotalSize Class Name
00007f6c1dc00f90   206770     19494060 System.String
Total 206770 objects

You can now use the gcroot command on a System.String instance to see how and why the object is rooted:

> gcroot 00007f6ad09421f8

Thread 3f68:
    00007F6795BB58A0 00007F6C1D7D0745 System.Diagnostics.Tracing.CounterGroup.PollForValues() [/_/src/System.Private.CoreLib/shared/System/Diagnostics/Tracing/CounterGroup.cs @ 260]
        rbx:  (interior)
            ->  00007F6BDFFFF038 System.Object[]
            ->  00007F69D0033570 testwebapi.Controllers.Processor
            ->  00007F69D0033588 testwebapi.Controllers.CustomerCache
            ->  00007F69D00335A0 System.Collections.Generic.List`1[[testwebapi.Controllers.Customer, DiagnosticScenarios]]
            ->  00007F6C000148A0 testwebapi.Controllers.Customer[]
            ->  00007F6AD0942258 testwebapi.Controllers.Customer
            ->  00007F6AD09421F8 System.String

HandleTable:
    00007F6C98BB15F8 (pinned handle)
    -> 00007F6BDFFFF038 System.Object[]
    -> 00007F69D0033570 testwebapi.Controllers.Processor
    -> 00007F69D0033588 testwebapi.Controllers.CustomerCache
    -> 00007F69D00335A0 System.Collections.Generic.List`1[[testwebapi.Controllers.Customer, DiagnosticScenarios]]
    -> 00007F6C000148A0 testwebapi.Controllers.Customer[]
    -> 00007F6AD0942258 testwebapi.Controllers.Customer
    -> 00007F6AD09421F8 System.String

Found 2 roots.

You can see that the String is directly held by the Customer object and indirectly held by a CustomerCache object.

You can continue dumping out objects to see that most String objects follow a similar pattern. At this point, the investigation provided sufficient information to identify the root cause in your code.

This general procedure allows you to identify the source of major memory leaks.

Epilogue: Cleaning Up After the Battle

With the memory leak defeated and peace restored to your application, take a moment to clean up the battlefield. Dispose of the dump files that served you well, and consider restarting your application to ensure it runs free of the burdens of the past.

Embark on this journey with confidence, for with dotnet-dump and the wisdom contained within this guide, you are more than capable of uncovering and addressing the memory leaks that challenge the stability and performance of your .NET Core applications. Happy debugging!

Sources:
https://learn.microsoft.com/en-us/dotnet/core/diagnostics/debug-memory-leak

Securing Docker Containers with a Risk-Based Approach

Embracing Pragmatism in Container Security

In the world of container orchestration, securing thousands of Docker containers is no small feat. But with a pragmatic approach and a keen understanding of risk assessment, it’s possible to create a secure environment that keeps pace with the rapid deployment of services.

The Risk Matrix: A Tool for Prioritization

At the heart of our security strategy is a Risk Matrix, a critical tool that helps us assess and prioritize vulnerabilities. The matrix classifies potential security threats based on the severity of their consequences and the likelihood of their occurrence. By focusing on Critical and High Common Vulnerabilities and Exposures (CVEs), we use this matrix to identify which issues in our Kubernetes clusters need immediate attention.

Risk Matrix – Use the following to deduce an action/outcome.

Likelihood: The Critical Dimension for SMART Security

To ensure our security measures are Specific, Measurable, Achievable, Relevant, and Time-bound (SMART), we add another dimension to the matrix: Likelihood. This dimension helps us pinpoint high-risk items that require swift action, balancing the need for security with the practicalities of our day-to-day operations.

DevSecOps: Tactical Solutions for the Security-Minded

As we journey towards a DevSecOps culture, we often rely on tactical solutions to reinforce security, especially if the organization is not yet mature in TOGAF Security practices. These solutions are about integrating security into every step of the development process, ensuring that security is not an afterthought but a fundamental component of our container management strategy.

Container Base Images

Often, you might find Critical and High CVEs that are not under your control but are due to a 3rd party base image; take Cert-Manager, External-DNS as prime examples, the backbone of many Kubernentes Clusters in the wild. These images will rely on Google’s GoLang Images, which in turn use a base image from jammy-tiny-stack. You see where I am going here? Many 3rd party images can lead you down a rabbit hole.

Remember, the goal is to MANAGE RISK, not eradicate risk; the latter is futile and leads to impractical security management. Look at ways to mitigate risks by reducing public service ingress footprints or improving North/South and East/West firewall solutions such as Calico Cloud. This allows you to contain security threats if a network segment is breached.

False Positives
Many CVEs contradict the security severity ratings, so always look at the Risk and likelihood e.g.
Though not every CVE is removed from the images, we take CVEs seriously and try to ensure that images contain the most up-to-date packages available within a reasonable time frame. For many of the Official Images, a security analyzer, like Docker Scout or Clair might show CVEs, which can happen for a variety of reasons:

The CVE has not been addressed in that particular image
- Upstream maintainers don’t consider a particular CVE to be a vulnerability that needs to be fixed and so won’t be fixed.
  - e.g., CVE-2005-2541 is considered a High severity vulnerability, but in Debian is considered “intended behavior,” making it a feature, not a bug.
- The OS Security team only has so much available time and has to deprioritize some security fixes over others. This could be because the threat is considered low or that it is too intrusive to backport to the version in “stable”.e.g., CVE-2017-15804 is considered a High severity vulnerability, but in Debian it is marked as a “Minor issue” in Stretch and no fix is available.
- Vulnerabilities may not have an available patch, and so even though they’ve been identified, there is no current solution.
The listed CVE is a false positive
- In order to provide stability, most OS distributions take the fix for a security flaw out of the most recent version of the upstream software package and apply that fix to an older version of the package (known as backporting).e.g., CVE-2020-8169 shows that curl is flawed in versions 7.62.0 though 7.70.0 and so is fixed in 7.71.0. The version that has the fix applied in Debian Buster is 7.64.0-4+deb10u2 (see security-tracker.debian.org and DSA-4881-1).
- The binary or library is not vulnerable because the vulnerable code is never executed. Security solutions make the assumption that if a dependency has a vulnerability, then the binary or library using the dependency is also vulnerable. This correctly reports vulnerabilities, but this simple approach can also lead to many false positives. It can be improved by using other tools to detect if the vulnerable functions are used. govulncheck is one such tool made for Go based binaries.e.g., CVE-2023-28642 is a vulnerability in runc less than version 1.1.5 but shows up when scanning the gosu 1.16 binary since runc 1.1.0 is a dependency. Running govulncheck against gosu shows that it does not use any vulnerable runc functions.
The security scanners can’t reliably check for CVEs, so it uses heuristics to determine whether an image is vulnerable. Those heuristics fail to take some factors into account:
- Is the image affected by the CVE at all? It might not be possible to trigger the vulnerability at all with this image.
- If the image is not supported by the security scanner, it uses wrong checks to determine whether a fix is included.
  - e.g., For RPM-based OS images, the Red Hat package database is used to map CVEs to package versions. This causes severe mismatches on other RPM-based distros.
  - This also leads to not showing CVEs which actually affect a given image.

Conclusion

By combining a risk-based approach with practical solutions and an eye toward DevSecOps principles, we’re creating a robust security framework that’s both pragmatic and effective. It’s about understanding the risks, prioritizing them intelligently, and taking decisive action to secure our digital landscape.

The TOGAF® Series Guide focuses on integrating risk and security within an enterprise architecture. It provides guidance for security practitioners and enterprise architects on incorporating security and risk management into the TOGAF® framework. This includes aligning with standards like ISO/IEC 27001 for information security management and ISO 31000 for risk management principles. The guide emphasizes the importance of understanding risk in the context of achieving business objectives and promotes a balanced approach to managing both negative consequences and seizing positive opportunities. It highlights the need for a systematic approach, embedding security early in the system development lifecycle and ensuring continuous risk and security management throughout the enterprise architecture.

Sometimes, security has to be driven from the bottom up. Ideally, it should be driven from the top down, but we are all responsible for security; if you own many platforms and compute runtimes in the cloud, you must ensure you manage risk under your watch. Otherwise, it is only a matter of time before you get pwned, something I have witnessed repeatedly.

The Real World:
1. Secure your containers from the bottom up in your CICD pipelines with tools like Snyk
2. Secure your containers from the top down in your cloud infrastructure with tools like Azure Defender – Container Security

3. Look at ways to enforce the above through governance and policies; this means you REDUCE the likelihood of a threat occurring from both sides of the enterprise.

4. Ensure firewall policies are in place to segment your network so that a breach in one area will not fan out in other network segments. This means you must focus initially on North / South Traffic (Ingress/EgresS) and then on East / West traffic (Traversing your network segments and domains).

There is a plethora of other risk management strategies from Penetration Testing, using honey pots to SIEM, ultimately you all can make a difference no matter where in the technology chart you sit.

Principles
The underlying ingredient for establishing a vision for your reorganisation in the TOGAF framework is defining the principles of the enterprise. In my view, protecting customer data is not just a legal obligation; it’s a fundamental aspect of building trust and ensuring the longevity of an enterprise.

Establishing a TOGAF principle to protect customer data during the Vision stage of enterprise architecture development is crucial because it sets the tone for the entire organization’s approach to cybersecurity. It ensures that data protection is not an afterthought but a core driver of the enterprise’s strategic direction, technology choices, and operational processes. With cyber threats evolving rapidly, embedding a principle of customer data protection early on ensures that security measures are integrated throughout the enterprise from the ground up, leading to a more resilient and responsible business.

Sources:
GitHub – docker-library/faq: Frequently Asked Questions

https://pubs.opengroup.org/togaf-standard/integrating-risk-and-security/integrating-risk-and-security_0.html

https://www.tigera.io/tigera-products/calico-cloud/

https://snyk.io/

GeekOut – Get vCores for Kubernetes

Often you will be working out licensing costs and more often than not, you will need to know the number of vCores - as a baseline use the following script.

Get-AKSVCores.ps1

<#

.SYNOPSIS

    Calculates total vCores for each Azure Kubernetes Service (AKS) cluster listed in a CSV file.



.DESCRIPTION

    This script imports a CSV file containing AKS cluster information, iterates through each cluster, and calculates the total number of virtual cores (vCores) based on the node pools associated with each cluster. It requires Azure CLI and assumes that the user has the necessary permissions to access the AKS clusters and VM sizes.



.PARAMETER CsvFilePath

    Full path to the CSV file containing AKS cluster information. The CSV file should have columns named 'ClusterName', 'Subscription', and 'ResourceGroup'.



.PARAMETER VmLocation

    Azure region to get VM sizes for the calculation. Default is 'Australia East'.



.PARAMETER PerformAzureLogin

    Indicates whether the script should perform Azure login. Set to $true if Azure login is required within the script; otherwise, $false. Default is $false.



.EXAMPLE

    .\Get-AKSVCores.ps1 -CsvFilePath "C:\path\to\aks_clusters.csv" -VmLocation "Australia East" -PerformAzureLogin $true



    This example runs the script with the specified CSV file path, VM location, and performs Azure login within the script.



.INPUTS

    CSV file



.OUTPUTS

    Console output of each AKS cluster's name, subscription, resource group, and total vCores.



.NOTES

    Version:        1.0

    Author:         Romiko Derbynew

    Creation Date:  2024-01-22

    Purpose/Change: Get Total VCores for Clusters

#>



param(

    [Parameter(Mandatory = $true)]

    [string]$CsvFilePath,



    [Parameter(Mandatory = $false)]

    [string]$VmLocation = "Australia East",



    [Parameter(Mandatory = $false)]

    [bool]$PerformAzureLogin = $true

)



# Azure login if required

if ($PerformAzureLogin) {

    az login

}



# Import the CSV file

$aksClusters = Import-Csv -Path $CsvFilePath



Write-Host "ClusterName,Subscription,ResourceGroup,TotalVCores"



# Iterate through each AKS cluster

foreach ($cluster in $aksClusters) {

    # Set the current subscription

    az account set --subscription $cluster.Subscription



    # Logic to get the resource group

    $resourceGroup = $cluster.ResourceGroup



    # Get the node pools for the AKS cluster

    $nodePools = az aks nodepool list --resource-group $resourceGroup --cluster-name $cluster.ClusterName --query "[].{name: name, count: count, vmSize: vmSize}" | ConvertFrom-Json



    $totalVCores = 0



    # Iterate through each node pool and calculate total vCores

    foreach ($nodePool in $nodePools) {

        # Get the VM size details

        $vmSizeDetails = az vm list-sizes --location $VmLocation --query "[?name=='$($nodePool.vmSize)'].{numberOfCores: numberOfCores}" | ConvertFrom-Json

        $vCores = $vmSizeDetails.numberOfCores



        # Calculate total vCores for the node pool

        $totalVCores += $vCores * $nodePool.count

    }



    # Output the total vCores for the cluster

    Write-Host "$($cluster.ClusterName),$($cluster.Subscription),$($cluster.ResourceGroup),$totalVCores"

}

Embracing Microservices Architecture with the TOGAF® Framework

Introduction to Microservices Architecture (MSA) in the TOGAF® Framework

In the ever-evolving digital landscape, the TOGAF® Standard, developed by The Open Group, offers a comprehensive approach for managing and governing Microservices Architecture (MSA) within an enterprise. This guide is dedicated to understanding MSA within the context of the TOGAF® framework, providing insights into the creation and management of MSA and its alignment with business and IT cultures.

What is Microservices Architecture (MSA)?

MSA is a style of architecture where systems or applications are composed of independent and self-contained services. Unlike a product framework or platform, MSA is a strategy for building large, distributed systems. Each microservice in MSA is developed, deployed, and operated independently, focusing on a single business function and is self-contained, encapsulating all necessary IT resources. The key characteristics of MSA include service independence, single responsibility, and self-containment.

The Role of MSA in Enterprise Architecture

MSA plays a crucial role in simplifying business operations and enhancing interoperability within the business. This architecture style is especially beneficial in dynamic market environments where companies seek to manage complexity and enhance agility. The adoption of MSA leads to better system availability and scalability, two crucial drivers in modern business environments.

Aligning MSA with TOGAF® Standards

The TOGAF® Standard, with its comprehensive view of enterprise architecture, is well-suited to support MSA. It encompasses all business activities, capabilities, information, technology, and governance of the enterprise. The Preliminary Phase of TOGAF® focuses on determining the architecture’s scope and principles, which are essential for MSA development. This phase addresses the skills, capabilities, and governance required for MSA and ensures alignment with the overall enterprise architecture.

Implementing MSA in an Enterprise

Enterprises adopting MSA should integrate it with their architecture principles, acknowledging the benefits of resilience, scalability, and reliability. Whether adapting a legacy system or launching a new development, the implications for the organization and architecture governance are pivotal. The decision to adopt MSA principles should be consistent with the enterprise’s overall architectural direction.

Practical Examples
TOGAF is independent on directing what tools to use. However I have found it very useful to couple Domain Driven Design with Event Storming in a Miro Board where you can get all stakeholders together and nut out the various domain, subdomains and events.

https://www.eventstorming.com/

*Event Storming – Business Process Example* – source Lucidcharts

Within each domain, you can start working on ensuring data is independent as well, with patterns such as Strangler-Fig or IPC.

Extract a service from a monolith

After you identify the ideal service candidate, you must identify a way for both microservice and monolithic modules to coexist. One way to manage this coexistence is to introduce an inter-process communication (IPC) adapter, which can help the modules work together. Over time, the microservice takes on the load and eliminates the monolithic component. This incremental process reduces the risk of moving from the monolithic application to the new microservice because you can detect bugs or performance issues in a gradual fashion.

The following diagram shows how to implement the IPC approach:

Figure 2. An IPC adapter coordinates communication between the monolithic application and a microservices module.

In figure 2, module Z is the service candidate that you want to extract from the monolithic application. Modules X and Y are dependent upon module Z. Microservice modules X and Y use an IPC adapter in the monolithic application to communicate with module Z through a REST API.

The next document in this series, Interservice communication in a microservices setup, describes the Strangler Fig pattern and how to deconstruct a service from the monolith.

Learn more about these patterns here – https://cloud.google.com/architecture/microservices-architecture-refactoring-monoliths

Conclusion

When integrated with the TOGAF® framework, Microservices Architecture (MSA) provides a strong and adaptable method for handling intricate, distributed architectures. Implementing MSA enables businesses to boost their agility, scalability, and resilience, thereby increasing their capacity to adjust to shifting market trends.

During the vision phase, establish your fundamental principles.

Identify key business areas to concentrate on, such as E-Commerce or online services.

Utilize domain-driven design (code patterns) and event storming (practical approach) to delineate domains and subdomains, using this framework as a business reference model to establish the groundwork of your software architecture.

Develop migration strategies like IPC Adapters/Strangler Fig patterns for database decoupling.

In the Technology phase of the ADM, plan for container orchestration tools, for example, Kubernetes.

Subsequently, pass the project to Solution Architects to address the remaining non-functional requirements from observability to security during step F of the ADM. This enables them to define distinct work packages adhering to SMART principles.

TIP: When migrating databases sometimes the legacy will be the main data and the microservice DB will be the secondary until a full migration, do not underestimate tools like CDC to assist.

Change Data Capture (CDC) is an approach used by microservices for tracking changes made to the data in a database. It enables microservices to be notified of any modifications in the data so that they can be updated accordingly. This real-time mechanism saves a lot of time that would otherwise be spent on regular database scans. In this blog post, we will explore how CDC can be used with microservices and provide some practical use cases and examples.

References

https://pubs.opengroup.org/togaf-standard/guides/microservices-architecture.html#:~:text=moving%20business%20environment.-,2%20Microservices%20Architecture%20Defined,for%20building%20large%20distributed%20systems.

https://www.lucidchart.com/blog/ddd-event-storming

https://waqasahmeddev.medium.com/how-to-migrate-to-microservices-with-the-strangler-pattern-64f6144ae4db

https://cloud.google.com/architecture/microservices-architecture-refactoring-monoliths

https://learn.microsoft.com/en-us/azure/architecture/patterns/strangler-fig

Navigating the TOGAF Government Reference Model (GRM)

Hey Tech Gurus!

Today, let’s decode the Government Reference Model (GRM) from the TOGAF Series Guide. This model is a game-changer for public sector organizations, aiming to standardize the maze of public sector business architecture.

What is the GRM? The GRM is an exhaustive, mutually exclusive framework designed for the public sector. It categorizes various government departments and provides a unified language to describe their business architecture. It’s split across sectors like Defense and Security, Health and Wellbeing, Education, and more.

Objective and Overview The GRM aims to provide a standard reference model template adaptable across different architectural approaches. It’s all about enabling collaboration between architecture service providers and fostering the Business Architecture profession.

Breaking Down the GRM The GRM is structured into three levels:

Level 1: Sectors defining business areas of the government.
Level 2: Functions detailing what the government does at an aggregated level.
Level 3: Services, further refining government functions at a component level.

Why does GRM matter? For tech folks in the public sector, the GRM is a toolkit to plan and execute effective transformational changes. It’s about understanding the big picture of public services and aligning technology to strategic objectives.

GRM and TOGAF ADM The GRM aligns with Phase B: Business Architecture of the TOGAF ADM (Architecture Development Method). It provides a pattern for accelerating the development of reference models within Business Architecture.

In a Nutshell, GRM is a breakthrough in organizing and understanding the complex ecosystem of public sector services. It’s about bringing consistency, collaboration, and clarity to how we view public sector architecture.

So, next time you’re navigating the complex world of public sector IT, remember that the GRM is your compass!

References
https://pubs.opengroup.org/togaf-standard/reference-models/government-reference-model.html

Untangling TOGAF’s C-MDM (Master Data): A Friendly Guide

Hey Tech Friends,

Let’s decode the TOGAF® Series Guide: Information Architecture – Customer Master Data Management (C-MDM). This document isn’t just about mastering data; it’s a journey into the heart of harmonizing customer data across an organization. C is for the stage in the ADM cycle, and MDM is all about the enterprises’ data.

The Core Idea: C-MDM is all about streamlining and enhancing how an organization manages its customer data. It’s like giving every customer information a VIP treatment, ensuring it’s accurate, accessible, and secure.

Generic Description of the Capabilities of the Organization

Sources: https://pubs.opengroup.org/togaf-standard/master-data-management/index.html (inspired by Michael Porter’s value chain)

Why It Matters: In our tech-driven world, customer data is gold. But it’s not just about having data; it’s about making it work efficiently. C-MDM is the toolkit for ensuring this data is managed smartly, reducing duplication, and enhancing access to this vital resource.

The TOGAF Twist: The guide integrates C-MDM within TOGAF’s Architecture Development Method (ADM). This means it’s not just a standalone concept but a part of the larger enterprise architecture landscape. It’s like having a detailed map for your journey in data management, ensuring every step aligns with the broader organizational goals.

Key Components:

Information Architecture Capability: Think of this as the foundation. It’s about understanding and handling the complexity of data across the organization.
Data Management Capabilities: This is where things get practical. It involves managing the lifecycle of data – from its creation to its retirement.
C-MDM Capability: The star of the show. This section delves into managing customer data as a valuable asset, focusing on quality, availability, and security.
Process and Methodology: Here, the guide adapts TOGAF ADM for C-MDM, offering a structured yet flexible approach to manage customer data.
Reference Models: These models provide a clear picture of what C-MDM entails, including the scope of customer data and detailed business functions.
Integration Methodologies: It’s about fitting C-MDM into the existing IT landscape, ensuring smooth integration and operation.

What’s in It for Tech Gurus? As a tech enthusiast, this guide offers a deep dive into managing customer data with precision. It’s not just about handling data; it’s about transforming it into an asset that drives business value.

Sources: https://pubs.opengroup.org/togaf-standard/master-data-management/index.html

So, whether you’re an enterprise architect, data manager, or just a tech aficionado, this guide is your compass in navigating the complex world of customer data management. It’s about making data not just big, but smart and efficient.

Happy Data Managing!

PS: Fostering a culture of data-driven decisions at all levels of your organisation, from value streams in the Business Domain to Observability in the Technology Domain, will allow your stakeholders and teams to make better strategic and tactical decisions. Invest wisely here and ensure insights are accessible to all key stakeholders – those stakeholders that have the influence and vested interest. This is where AI will revolutionise data-driven decisions; instead of looking at reports, you can “converse” with AI about your data in a customised reference vector DB.

References:

AI Chatbots to make Data-Driven Decisions

Sources: https://pubs.opengroup.org/togaf-standard/master-data-management/index.html

Demystifying TOGAF’s Guide to Enabling Enterprise Agility for Tech Enthusiasts

Hey Tech Wizards!

If you’ve ever wondered how enterprise architecture (EA) can be agile, you’re in for a treat. Let’s dive into the TOGAF® Series Guide on Enabling Enterprise Agility. This guide is not just about making EA more flexible; it’s about integrating agility into the very fabric of enterprise architecture.

First things first, agility in this context is all about being responsive to change, prioritizing value, and being practical. It’s about empowering teams, focusing on customer needs, and continuously improving. This isn’t just theory; it’s about applying these principles to real-life EA.

Agility at Different Levels of Architecture
Source: https://pubs.opengroup.org/togaf-standard/guides/enabling-enterprise-agility/

The guide stresses the importance of Enterprise Architecture in providing a structured yet adaptable framework for change. It’s about understanding and managing complexity, supporting continuous change, and minimizing risks.

One of the core concepts here is the TOGAF Architecture Development Method (ADM). Contrary to popular belief, the ADM isn’t a rigid, waterfall process. It’s flexible and can be adapted for agility. The ADM doesn’t dictate a sequential process or specific phase durations; it’s a reference model defining what needs to be done to deliver structured and rational solutions.

The guide introduces a model with three levels of detail for partitioning architecture development: Enterprise Strategic Architecture, Segment Architecture, and Capability Architecture. Each level has its specific focus and detail, allowing for more manageable and responsive architecture development.

Transition Architectures play a crucial role in Agile environments. They are architecturally significant states, often including several capability increments, providing roadmaps to desired outcomes. They are key to managing risk and understanding incremental states of delivery, especially when implemented through Agile sprints.

The guide also talks about a hierarchy of ADM cycles, emphasizing that ADM phases need not proceed in sequence. This flexibility allows for concurrent work on different segments and capabilities, aligning with Agile principles.

Key takeaways for the tech-savvy:

Enterprise Architecture and Agility can coexist and complement each other.
The TOGAF ADM is a flexible framework that supports Agile methodologies.
Architecture can be developed iteratively, with different levels of detail enabling agility.
Transition Architectures are essential in managing risk and implementing Agile principles in EA.
The hierarchy of ADM cycles allows for concurrent development across different architecture levels.

In short, this TOGAF Series Guide is a treasure trove for tech enthusiasts looking to merge EA with Agile principles. It’s about bringing structure and flexibility together, paving the way for a more responsive and value-driven approach to enterprise architecture. Happy architecting!

Sources:

https://pubs.opengroup.org/togaf-standard/guides/enabling-enterprise-agility/

Demystifying Business Models in TOGAF for the Technical Guru

Hey Techies,

Are you ready to level up your understanding of business models within the TOGAF framework? Perfect, because today we’re slicing through the complexity and serving up some easy-to-digest insights into how business models can supercharge your architecture endeavors.

Let’s kick off with the basics: a business model is essentially a blueprint for how an organization operates. It’s the behind-the-scenes rationale that shows us how a company creates, delivers, and captures value. Now, why does that matter to you, the tech-savvy mastermind? Because understanding this blueprint is crucial for aligning IT projects with business strategy – and we all know how vital that alignment is for success.

Source: Business Model Generation, Alexander Osterwalder, Yves Pigneur, 2010

Diving into the TOGAF Series Guide, we find that business models are not just about creating a common language for the C-suite but also about setting the stage for innovation and strategic execution. They’re like a high-level visual snapshot of the business – depicting the current state and future aspirations.

But here’s the kicker: while a business model paints the bigger picture, it’s the Business Architecture that adds the fine details. Think of the business model as the sketch of a grand painting, and Business Architecture is the process of bringing that sketch to life with color and texture. It breaks down the business into digestible chunks – capabilities, value streams, organization structures – so that you can see how everything fits together and where IT can play a starring role.

Now, let’s talk about the TOGAF ADM (Architecture Development Method) because that’s where the magic happens. During Phase B: Business Architecture, you’ll use the business model to craft a set of architecture blueprints that outline what the business needs to transform into and how to get there. This is where your technical prowess meets business savvy, as you help define the scope and dive into the details of what’s needed for that transformation.

But what about innovation, you ask? The guide shows us that business model innovation is about steering the ship through the rough seas of change. Whether it’s rethinking customer segments, value propositions, or even cost structures, business models provide the structure for ideation and the testing ground for new strategies.

For example, take a retail business (relatable, right?). Say they’re moving from a brick-and-mortar focus to an online shopping haven. The business model helps leaders visualize this shift and understand the implications across the business. And for you, the tech expert, it’s about understanding those changes to help plot the IT roadmap, identify capability gaps, and ensure that the technology architecture supports this new direction.

So, there you have it

– a quick tour through the world of business models in TOGAF. Whether you’re a Platform Manager, Solutions Architect, or any tech role in between, grasping the concept of business models is like finding the Rosetta Stone for enterprise architecture. It helps you translate business strategy into IT action, ensuring that your technical expertise is not just impressive, but impactful.

Remember, as technical people, we’re not just about the bits and bytes; we’re about shaping the business through technology. So, embrace the business model – it’s your secret weapon for making IT integral to business success.

And that’s a wrap on our friendly tech blog! Stay curious, keep learning, and let’s continue to bridge the gap between business and technology. Cheers to innovation and alignment!

P.S. Don’t forget, it’s not about changing the entire business model on a whim; it’s about making informed, strategic adjustments that keep the company agile and ahead of the game. Keep innovating, my friends!

References

https://pubs.opengroup.org/togaf-standard/business-architecture/business-models.html

Integrating TOGAF and Agile Development: A Symbiotic Approach for Effective Architecture

In the rapidly evolving world of software development, misconceptions often arise about the compatibility of different methodologies. A common misbelief is that TOGAF, a comprehensive framework for enterprise architecture, is inherently slow and rigid, akin to waterfall models. However, this overlooks TOGAF’s inherent flexibility and its potential synergy with Agile development practices.

In backlog grooming sessions, developers often prioritize creating a Minimum Viable Product (MVP) that may not align with established Business Architecture and Standards. For instance, they might opt for a custom authentication method instead of using standard protocols like OpenID/SAML and Code Authorization Flow with PKCE. To mitigate this, integrating architectural decisions and evaluations into backlog grooming and sprint planning, possibly extending to Scrum of Scrums, is crucial. This approach can significantly save time and effort by encouraging early collaboration and input from various teams, ensuring adherence to standards and a more cohesive project development phase.

TOGAF, with its structured approach in the Architecture Development Method (ADM), offers a solid foundation for long-term strategic planning. It ensures that all aspects of enterprise architecture are considered, from business strategy to technology infrastructure. Contrary to the notion of it being a static, waterfall-like process, TOGAF can be adapted to fit into Agile’s iterative and incremental model.

Agile, known for its flexibility and rapid response to change, complements TOGAF by injecting speed and adaptability into the architectural planning and execution process. The key lies in integrating Agile sprints within the phases of the ADM. This allows for continuous feedback and iterative development, ensuring that the architecture remains aligned with business needs and can adapt to changing requirements.

The synergy between TOGAF and Agile fosters a holistic approach to software development. It combines the strategic, big-picture perspective of TOGAF with the tactical, fast-paced nature of Agile. This integrated approach enables organizations to be both strategically aligned and agile in execution, ensuring that their architecture is not only robust but also responsive to the dynamic nature of business and technology.

In essence, TOGAF and Agile are not mutually exclusive but can be powerful allies in delivering effective and adaptable enterprise solutions. By understanding and leveraging the strengths of each, organizations can enhance their architectural practices, leading to more successful and sustainable outcomes.

E-Commerce ViewPoint

In an e-commerce setting, integrating Agile sprints within the TOGAF ADM cycle can be exemplified as follows:

Preliminary Phase: Define the scope and vision for the e-commerce project, focusing on key objectives and stakeholders.
Architecture Vision (Phase A): Develop a high-level vision of the desired architecture. An Agile sprint can be used to quickly prototype a customer-facing feature, like a new user interface for the shopping cart.
Business Architecture (Phase B): Detail the business strategy, governance, and processes. Sprints can focus on evolving business requirements, like integrating a new payment gateway.
Information Systems Architectures (Phase C): Define data and application architecture. Agile sprints could focus on implementing a recommendation system for products.
Technology Architecture (Phase D): Establish the technology infrastructure. Sprints might involve deploying cloud services for scalability.
Opportunities & Solutions (Phase E): Identify and evaluate opportunities and solutions. Use sprints to experiment with different solutions like chatbots for customer service.
Migration Planning (Phase F): Plan the move from the current to the future state. Agile methodologies can be used to incrementally implement changes.
Implementation Governance (Phase G): Ensure the architecture is being implemented as planned. Sprints can be used for continuous integration and deployment processes.
Architecture Change Management (Phase H): Manage changes to the new architecture. Agile sprints allow for quick adaptations to customer feedback or market trends.

This approach ensures that the strategic framework of TOGAF and the iterative, responsive nature of Agile work in tandem, driving the e-commerce project towards success with both long-term vision and short-term adaptability.

Agile Board Example

For the use case of integrating a shopping cart with a rewards program from an airline partner, here’s an example of Agile backlog items:

User Story: As a customer, I want to link my airline rewards account with my shopping profile so that I can earn miles on my purchases.
- Tasks:
  - Design UI/UX for account linking process.
  - Develop API integration with the airline’s rewards system.
User Story: As a user, I want to see how many miles I will earn for each purchase.
- Tasks:
  - Implement a system to calculate miles earned per purchase.
  - Update the shopping cart UI to display potential rewards.
User Story: As a customer, I want to redeem my miles for discounts on products.
- Tasks:
  - Create functionality to convert miles into store credits.
  - Integrate this feature into the checkout process.
User Story: As a system administrator, I need a dashboard to monitor the integration and track transactions.
- Tasks:
  - Develop a dashboard showing real-time data of linked accounts and transactions.
  - Implement reporting tools for transaction analysis.
User Story: As a customer, I want to securely unlink my airline rewards account when needed.
- Tasks:
  - Develop a secure process for unlinking accounts.
  - Ensure all customer data related to the rewards program is appropriately handled.
User Story: As a marketing manager, I want to create promotions exclusive to customers with linked airline rewards accounts.
- Tasks:
  - Develop a feature to create and manage exclusive promotions.
  - Integrate promotion visibility based on account link status.

These backlog items can be broken down into smaller tasks and tackled in sprints, allowing for iterative development and continuous feedback whilst still addressing the requirements at the Enterprise Level e.g. A Rewards reusable module that can be consumed across multiple brands within an enterprise addressing Business Architecture in a holistic fashion.

The approach described doesn’t necessarily have to follow a linear, waterfall methodology. It can be more interactive, with different stages addressed flexibly as the Product Owner deems appropriate, such as when defining new Epics.

Consider these examples:

Firstly, the core concept of the rewards program – should it span multiple brands for wider reusability and align with the Business Architecture, or should it concentrate on a single brand? This is where Enterprise is important, building solutions for business units in your organisation with a common goal? All too often there are silo’s with an organisation and this can be mitigated to a certain extent with a ADM framework such as TOGAF

Secondly, the choice of hosting environment for the compute runtime is crucial. Options range from VMs, Kubernetes, Azure Container Instances, AWS ECS to Micro-Kernels (High Frequency Trading solutions). Consulting the Technology Architecture phase will guide in allocating software runtime to the most suitable Compute platform.

Your choice of tooling is totally up to you; UML can often be restrictive due to the skills of an agile Squad focussed on rapid development; you can adapt, bin tools like UML, and opt for tools such as the C4 Model.

I hope this helps you bring some level of Architecture Governance to your organisation – no matter how big or small, and yes, you can leverage these principles in a start-up.

Sources:
https://pubs.opengroup.org/togaf-standard/
https://c4model.com/

The Cloud Chronicles: CAF Landing Zones and Levels Unveiled

Ladies and gentlemen, tech voyagers, and cloud explorers, fasten your seatbelts as we take you on a whimsical journey through the Cloud Adoption Framework (CAF) Landing Zones and Levels.

The Cloud Castle and Its Many Quirks
Imagine the cloud as a majestic castle in the digital skies. To conquer this castle effectively, you need more than just a map; you need a well-organized treasure hunt! That’s where the CAF steps in – it’s like the guidebook to the cloud’s hidden treasures.

Level 0: Core Platform Automation
Welcome to the cloud’s backstage – the place where the real magic happens, but you rarely see it. Level 0 is like the control room of a rock concert; it’s essential but hidden behind the scenes. Here, you’ll find the launchpad with storage accounts, Key Vault, RBAC, and more. It’s where Terraform state files are managed, subscriptions are created, and credentials are rotated. It’s basically the cloud’s secret lair.

Level 1: Core Platform Governance
Up we go to the governance level – it’s like the castle’s council chamber. Here, you’ll find Azure management groups and policies, the rule-makers of the kingdom. They’re like the architects of the castle, designing its layout and enforcing the laws. You’ll also meet the GitOps services creating pipelines and summoning Virtual Networks and compute nodes for DevOps self-hosted agents. It’s where the cloud’s rule-makers and enforcers gather.

Level 2: Core Platform Connectivity
This level is like the kingdom’s bustling market square. Here, you deal with virtual networking components, from classic Virtual Network-based Hubs to Azure Virtual WANs and ExpressRoute connections. It’s like managing the kingdom’s complex highway system. There are also additional identity and management subscription services to keep things running smoothly. It’s the kingdom’s backstage crew, making sure everything runs smoothly.

Level 3: Application Landing Zones (Vending Machine)
Level 3 is where applications come to life – it’s the cloud’s vending machine. It’s where application teams get their subscriptions for different environments – Development, Test, UAT, DR, you name it. This level is like the cloud’s automated snack bar. It also handles privileged infrastructure services, supporting the application platform. Think of it as the royal kitchen, providing ingredients for the culinary masters in Level 4.

Level 4: Applications Landing Zone
Welcome to the cloud’s gourmet restaurant! Here, you’ll find the application configurations delegated to application teams. It’s where Azure Kubernetes Services Cluster, API Management services, and other delicious offerings are prepared. This level is like the cloud’s Michelin-star restaurant, where each team creates their own cloud delicacies.

The following pictures illustrates the split between level 3 and 4:

How It All Operates
In this grand castle, deployments are like a well-choreographed ballet. There are pipelines for each level, each with its own unique role:

Level 0 and 1 are the castle’s gatekeepers, ensuring the foundation is solid and the rules are clear.
Level 2 springs into action when new regional hubs or connectivity needs arise – they’re like the kingdom’s travel agents.
Level 3 steps in when a new service needs to be served up – they’re the cloud’s maître d’.
Level 4, the gourmet kitchen, is always bustling with activity as application teams whip up their cloud creations.

Azure Subscription Vending Machine

The Cloud Comedy: Bringing It All Together
In this cloud comedy, we’ve explored the whimsical world of CAF Landing Zones and Levels. It’s like a magical castle with different floors, each with its own quirks and responsibilities. As you journey through the cloud, remember that while it may seem complex, it’s also an adventure filled with opportunities for innovation and transformation.

So, whether you’re the cloud wizard behind the scenes or the master chef creating cloud delicacies, embrace the cloud with a twinkle in your eye. You’ll find that conquering the cloud castle can be an enchanting and delightful experience!

TIPS:

Use Azure Container Instances to spin up Azure DevOps Agents when deploying subscriptions and low-level resources instead of VMs and VM Scale Sets!

Use Managed Identity where you can and only use Service Principals if you cannot find a solution with Managed Identity