Azure Streaming Analytics – Beware – JSON Engine

After investigating an issue with Azure Streamiung Analytics, we discovered it cannot deserialise JSON that have the same property names but differ in case e.g.

{
  "Proxy": "abc",
  "proxy": "def"
}

If you send the above payload to a Streaming Analytics Job, it will fail.

Source ‘<unknown_location>’ had 1 occurrences of kind ‘InputDeserializerError.InvalidData’ between processing times ‘2020-03-30T00:19:27.8689879Z’ and ‘2020-03-30T00:19:27.8689879Z’. Could not deserialize the input event(s) from resource ‘Partition: [8], Offset: [1], SequenceNumber: [1]’ as Json. Some possible reasons: 1) Malformed events 2) Input source configured with incorrect serialization format

We opened a ticket with Microsoft. This was the response.

“Hi Romiko,

Thank you for being patience with us. I had further discussion with our ASA PG and here’s our findings.

Findings

ASA unfortunately does not support case sensitive column. We understand it is possible for json documents to add to have two columns that differ only in case and that some libraries support it. However there hasn’t been a compelling use case to support it. We will update the documentation as well. 

We are sorry for the inconvenience. If you have any questions or concerns, please feel free to reach out to me. I will be happy to assist you.”

Indeed other libraries do support this, such as powershell, c#, python etc.

C#

using Newtonsoft.Json;

data = “{‘name’: ‘Ashley’, ‘Name’: ‘Romiko’}”

dynamic message = JsonConvert.DeserializeObject(data);

var text = JsonConvert.SerializeObject(message);

I have built the following tool on github as a workaround for streaming data to elastic. https://github.com/Romiko/EventHub-CheckMalformedEvents

A significant reason why Microsoft should support it – is the Elastic Common Schema. (ECS), a new specification that provides a consistent and customizable way to structure your data in Elasticsearch, facilitating the analysis of data from diverse sources. With ECS, analytics content such as dashboards and machine learning jobs can be applied more broadly, searches can be crafted more narrowly, and field names are easier to remember.

When introducing a new schema, there is always dealing with existing/custom data. Elastic have an ingenious way to solve this. All fields in ECS are lower case. So your existing data can be guarnteed to not conflict if you use an UpperCase.

Let us reference Elastic’s advice

https://www.elastic.co/guide/en/ecs/current/ecs-custom-fields-in-ecs.html

image.png

Elastic, who deal with Big Data all the time recommend using Proxy vs proxy to ensure migrations to ECS is a vaiable/conflic free solution.

Conclusion

If you are migrating huge amounts of data to Elastic Common Schema (ECS), Do not use Azure Streaming Analytics. You will get bitten and will have to make other comprimises which will signifcantly reduce your flexbiliy when dealing with conflicting fields in Elastic Indices.

Debugging Azure Event Hubs and Stream Analytics Jobs

When you are dealing with millions of events per day (Json format). You need a debugging tool to deal with events that do no behave as expected.

Recently we had an issue where an Azure Streaming analytics job was in a degraded state. A colleague eventually found the issue to be the output of the Azure Streaming Analytics Job.

The error message was very misleading.

[11:36:35] Source 'EventHub' had 76 occurrences of kind 'InputDeserializerError.TypeConversionError' between processing times '2020-03-24T00:31:36.1109029Z' and '2020-03-24T00:36:35.9676583Z'. Could not deserialize the input event(s) from resource 'Partition: [11], Offset: [86672449297304], SequenceNumber: [137530194]' as Json. Some possible reasons: 1) Malformed events 2) Input source configured with incorrect serialization format\r\n"

The source of the issue was CosmosDB, we need to increase the RU’s. However the error seemed to indicate a serialization issue.

We developed a tool that could subscribe to events at exactly the same time of the error, using the sequence number and partition.

We also wanted to be able to use the tool for a large number of events +- 1 Million per hour.

Please click link to the EventHub .Net client. This tool is optimised to use as little memory as possible and leverage asynchronous file writes for the an optimal event subscription experience (Console app of course).

Have purposely avoided the newton soft library for the final file write to improve the performance.

The output will be a json array of events.

The next time you need to be able to subscribe to event hubs to diagnose an issue with a particular event, I would recommend using this tool to get the events you are interested in analysing.

Thank you.

Troubleshooting Azure Event Hubs and Azure Streaming Analytics

When you are dealing with millions of events per day (Json format). You need a debugging tool to deal with events that do no behave as expected.

Recently we had an issue where Azure Streaming analytics was in a degraded state. A colleague eventually found the issue to be the output of the Azure Streaming Analytics Job.

The error message was very misleading.

[11:36:35] Source 'EventHub' had 76 occurrences of kind 'InputDeserializerError.TypeConversionError' between processing times '2020-03-24T00:31:36.1109029Z' and '2020-03-24T00:36:35.9676583Z'. Could not deserialize the input event(s) from resource 'Partition: [11], Offset: [86672449297304], SequenceNumber: [137530194]' as Json. Some possible reasons: 1) Malformed events 2) Input source configured with incorrect serialization format\r\n"

The source of the issue was CosmosDB, we need to increase the RU’s. However the error seemed to indicate a serialization issue.

We developed a tool that could subscribe to events at exactly the same time of the error, using the sequence number and partition.

We also wanted to be able to use the tool for a large number of events +- 1 Million per hour.

Please click link to the EventHub .Net client. This tool is optimised to use as little memory as possible and leverage asynchronous file writes for the an optimal event subscription experience (Console app of course).

Have purposely avoided the newton soft library for the final file write to improve the performance.

The output will be a json array of events.

using System;
using System.IO;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using Azure.Messaging.EventHubs.Consumer;
using Microsoft.Extensions.Configuration;
using Newtonsoft.Json;

namespace CheckMalformedEvents
{
    class GetMalformedEvents
    {
        private static string partitionId;
        private static IConfigurationRoot configuration;
        private static string connectionString;
        private static string consumergroup;
        private static EventHubConsumerClient eventHubClient;
        private static EventPosition startingPosition;
        private static DateTimeOffset processingEnqueueEndTimeUTC;

        static void Main(string[] args)

        {
            Init();
            ShowIntro();

            try
            {
                GetEvents(eventHubClient, startingPosition, processingEnqueueEndTimeUTC).Wait();
            }
            catch (AggregateException e)
            {
                Console.WriteLine($"{e.Message}");

            }
            catch (Exception e)
            {
                Console.WriteLine($"{e.Message}");
            }
        }

        private static void Init()
        {
            var builder = new ConfigurationBuilder()
                            .SetBasePath(Directory.GetCurrentDirectory())
                            .AddJsonFile("appsettings.json", optional: true, reloadOnChange: true);

            configuration = builder.Build();

            connectionString = configuration.GetConnectionString("eventhub");
            consumergroup = configuration.GetConnectionString("consumergroup");
            eventHubClient = new EventHubConsumerClient(consumergroup, connectionString);

            partitionId = configuration["partitionId"];
            if (long.TryParse(configuration["SequenceNumber"], out long sequenceNumber) == false)
                throw new ArgumentException("Invalid SequenceNumber");

            processingEnqueueEndTimeUTC = DateTimeOffset.Parse(configuration["ProcessingEnqueueEndTimeUTC"]);
            startingPosition = EventPosition.FromSequenceNumber(sequenceNumber);
        }

        private static void ShowIntro()
        {
            Console.WriteLine("This tool is used to troubleshoot malformed messages in an Azure EventHub");
            Console.WriteLine("Sample Error Message to troubleshoot - First get the errors from the Streaming Analytics Jobs Input blade.\r\n");
        }

        private static async Task<CancellationTokenSource> GetEvents(EventHubConsumerClient eventHubClient, EventPosition startingPosition, DateTimeOffset endEnqueueTime)
        {
            var cancellationSource = new CancellationTokenSource();
            if (int.TryParse(configuration["TerminateAfterSeconds"], out int TerminateAfterSeconds) == false)
                throw new ArgumentException("Invalid TerminateAfterSeconds");

            cancellationSource.CancelAfter(TimeSpan.FromSeconds(TerminateAfterSeconds));
            string path = Path.Combine(Directory.GetCurrentDirectory(), $"{Path.GetRandomFileName()}.json");

            int count = 0;
            byte[] encodedText;
            using FileStream sourceStream = new FileStream(path, FileMode.Append, FileAccess.Write, FileShare.Write, bufferSize: 4096, useAsync: true);
            {
                encodedText = Encoding.Unicode.GetBytes("{\r\n\"events\": [" + Environment.NewLine);
                await sourceStream.WriteAsync(encodedText, 0, encodedText.Length);
                encodedText = Encoding.Unicode.GetBytes("");
                await foreach (PartitionEvent receivedEvent in eventHubClient.ReadEventsFromPartitionAsync(partitionId, startingPosition, cancellationSource.Token))
                {
                    if (encodedText.Length > 0)
                        await sourceStream.WriteAsync(encodedText, 0, encodedText.Length);

                    count++;
                    using var sr = new StreamReader(receivedEvent.Data.BodyAsStream);
                    var data = sr.ReadToEnd();
                    var partition = receivedEvent.Data.PartitionKey;
                    var offset = receivedEvent.Data.Offset;
                    var sequence = receivedEvent.Data.SequenceNumber;

                    try
                    {
                        IsEventValidJson(count, receivedEvent, data, partition, offset, sequence);
                    }
                    catch (Exception ex)
                    {
                        Console.WriteLine($"Serialization issue Partition: { partition}, Offset: {offset}, Sequence Number: { sequence }");
                        Console.WriteLine(ex.Message);
                    }

                    if (receivedEvent.Data.EnqueuedTime > endEnqueueTime)
                    {
                        Console.WriteLine($"Last Message EnqueueTime: {receivedEvent.Data.EnqueuedTime:o}, Offset: {receivedEvent.Data.Offset}, Sequence: {receivedEvent.Data.SequenceNumber}");
                        Console.WriteLine($"Total Events Streamed: {count}");
                        Console.WriteLine($"-----------");
                        break;
                    }

                    encodedText = Encoding.Unicode.GetBytes(data + "," + Environment.NewLine);
                    await sourceStream.WriteAsync(encodedText, 0, encodedText.Length);
                }
                encodedText = await FinaliseFile(encodedText, sourceStream);
            }

            Console.WriteLine($"\r\n Output located at: {path}");
            return cancellationSource;
        }

        private static async Task<byte[]> FinaliseFile(byte[] encodedText, FileStream sourceStream)
        {
            await sourceStream.WriteAsync(encodedText, 0, encodedText.Length - 6); //Remove ,\r\n on last line
            encodedText = Encoding.Unicode.GetBytes("]\r\n}" + Environment.NewLine);
            await sourceStream.WriteAsync(encodedText, 0, encodedText.Length);
            return encodedText;
        }

        private static void IsEventValidJson(int count, PartitionEvent receivedEvent, string data, string partition, long offset, long sequence)
        {
            dynamic message = JsonConvert.DeserializeObject(data);
            message.AzureEventHubsPartition = partition;
            message.AzureEventHubsOffset = offset;
            message.AzureEventHubsSequence = sequence;
            message.AzureEnqueuedTime = receivedEvent.Data.EnqueuedTime.ToString("o");

            if (count == 0)
                Console.WriteLine($"First Message EnqueueTime: {message.AzureEnqueuedTime}, Offset: {message.AzureEventHubsOffset}, Sequence: {message.AzureEventHubsSequence}");
        }
    }
}

The next time you need to be able to subscribe to event hubs to diagnose an issue with a particular event, I would recommend using this tool to get the events you are interested in analysing.

Thank you.

What is Devops – Part 1

Patrick Debois from Belgium is the actual culprit to blame for the term Devops, he wanted more synergy between developers and operations back in 2007.

Fast-forward a few years and now we have “Devops” everywhere we go. If you using the coolest tools in town such as Kubernetes, Azure Devops Pipelines, Jenkins, Grafana etc – then you probably reckon that you are heavy into Devops. This can not be further from the truth.

The fact is that Devops is more about a set of patterns and practices within a culture that nurtures shared responsibilities across all teams during the software development life-cycle.

Put it this way, if you only have 1 dude in your team that is “doing Devops”, then you may want to consider if you are really implementing Devops or one of it’s anti-patterns. Ultimately you need to invest in everyone within the SDLC teams to get on board with the cultural shift.

If we cannot get the majority of engineers involved in the SDLC to share responsibilities, then we have failed at our objectives regarding Devops, even if we using the latest cool tools from Prometheus to AKS/GKE. In a recent project that I was engaged in there was only 1 devops dude, when he fell ill nobody from any of the other engineering teams could perform his duties. Despite the fact that confluence has numerous playbooks and “How To’s”. Why?

It comes down to people, process & culture. All of which can be remedied with strong technical leadership and encouraging your engineers to work with the process and tools in their daily routine. Hence why I encourage developers that are hosting their code on Kubernetes to use Minikube on their laptops.

If there is any advice that I can provide teams that want to implement Devops – Focus on People then Process and finally the Tools.

In order to setup the transition for success – we will discuss in the next part of this series the pillars of Devops.

Installing Kubernetes – The Hard Way – Visual Guide

This is a visual guide to compliment the process of setting up your own Kubernetes Cluster on Google Cloud. This is a visual guide to Kelsey Hightower GIT project called Kubernetes The Hard Way. It can be challenging to remember all the steps a long the way, I found having a visual guide like this valuable to refreshing my memory.

Provision the network in Google Cloud

VPC

Provision Network

Firewall Rules

External IP Address

Provision Controllers and Workers – Compute Instances

Controller and Worker Instances

Workers will have pod CIDR

10.200.0.0/24

10.200.1.0/24

10.200.2.0/24

Provision a CA and TLS Certificates

Certificate Authority

Client & Server Certificates

Kubelet Client Certificates

Controller Manager Client Certificates

Kube Proxy Client Certificates

Scheduler Client Certificates

Kubernetes API Server Certificate

Reference https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/04-certificate-authority.md

Service Account Key Pair

Certificate Distribution – Compute Instances

Generating Kubernetes Configuration Files for Authentication

Generating the Data Encryption Config and Key

Bootstrapping etcd cluster

Use TMUX set synchronize-panes on to run on multiple instances at same time. Saves time!

Notice where are using TMUX in a Windows Ubuntu

Linux Subsystem and running commands in parallel to save a lot of time.

The only manual command is actually ssh into each controller, once in, we activate tmux synchronize feature. So what you type in one panel will duplicate to all others.

Bootstrapping the Control Pane (services)

Bootstrapping the Control Pane (LB + Health)

Required Nginx as Google health checks does not support https

Bootstrapping the Control Pane (Cluster Roles)

Bootstrapping the Worker Nodes

Configure kubectl remote access

Provisioning Network Routes

DNS Cluster Add-On

First Pod deployed to cluster – using CoreDNS

Smoke Test

Once you have completed the install of your kubernetes cluster, ensure you tear it down after some time to ensure you do not get billed for the 6 compute instances, load balancer and public statis ip address.

A big thank you to Kelsey for setting up a really comprehensive instruction guide.

Minikube + CloudCode + VSCode – WindDevelopment Environment

As a developer you can deploy your docker containers to a local Kubernetes cluster on your laptop using minikube. You can then use Google Cloud Code extension for Visual Studio Code.

You can then make real time changes to your code and the app will deploy in the background automatically.

  1. Install kubectl – https://kubernetes.io/docs/tasks/tools/install-kubectl/
  2. Install minikube – https://kubernetes.io/docs/tasks/tools/install-minikube/
    1. For Windows users, I recommend the Chocolaty approach
  3. Configure Google Cloud Code to use minikube.
  4. Deploy your application to your local minikube cluster in Visual Studio Code
  5. Ensure you add your container registry in the .vscode\launch.json file – See Appendix

Ensure you running Visual Studio Code as Administrator.

Once deployed, you can make changes to your code and it will automatically be deployed to the cluster.

Quick Start – Create minikube Cluster in Windows (Hyper-V) and deploy a simple web server.

minikube start --vm-driver=hyperv
kubectl create deployment hello-minikube --image=k8s.gcr.io/echoserver:1.10
kubectl expose deployment hello-minikube --type=NodePort --port=8080
kubectl get pod
minikube service hello-minikube --url
minikube dashboard

Grab the output from minikube service hello-minikube –url and browse your web app/service.

Appendix

Starting the Cluster and deploying a default container.

VS Code Deployment

  • Setup your Container Registry in the .vscode\launch.json
  • Click Cloud Code on the bottom tray
  • Click “Run on Kubernetes”
  • Open a separate command prompt as administrator

.vscode\launch.json

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Run on Kubernetes",
            "type": "cloudcode.kubernetes",
            "request": "launch",
            "skaffoldConfig": "${workspaceFolder}\\skaffold.yaml",
            "watch": true,
            "cleanUp": true,
            "portForward": true,
            "imageRegistry": "romikocontainerregistry/minikube"
        },
        {
            "podSelector": {
                "app": "node-hello-world"
            },
            "type": "cloudcode",
            "language": "Node",
            "request": "attach",
            "debugPort": 9229,
            "localRoot": "${workspaceFolder}",
            "remoteRoot": "/hello-world",
            "name": "Debug on Kubernetes"
        }
    ]
}
minikube dashboard

We can see our new service is being deployed by VSCode Cloud Code extension. Whenever we make changes to the code, it will automatically deploy.

minikube service nodejs-hello-world-external --url

The above command will give us the url to browse the web app.

If I now change the text for Hello, world! It will automatically deploy. Just refresh the browser 🙂

Here in the status bar we can see deployments as we update code.

Debugging

Once you have deployed your app to Minikube, you can then kick off debugging. This is pretty awesome. Basically your development environment is now a full Kubernetes stack with attached debugging proving a seamless experience.

Check out https://cloud.google.com/code/docs/vscode/debug#nodejs for more information.

You will notice in the launch.json file we setup the debugger port etc. Below I am using port 9229. So all I need to do is start the app with

CMD [“node”, “–inspect=9229”, “app.js”]

or in the launch.json set the “args”: [“–inspect=9229”]. Only supported in launch request type.

Also ensure the Pod Selector is correct. You can use the pod name or label. You can confirm the pod name using the minikube dashboard.

http://127.0.0.1:61668/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/#/pod?namespace=default

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Run on Kubernetes",
            "type": "cloudcode.kubernetes",
            "request": "launch",
            "skaffoldConfig": "${workspaceFolder}\\skaffold.yaml",
            "watch": true,
            "cleanUp": true,
            "portForward": true,
            "imageRegistry": "dccausbcontainerregistry/minikube",
            "args": ["--inspect=9229"]
        },
        {
            "name": "Debug on Kubernetes",
            "podSelector": {
                "app": "nodejs-hello-world"
            },
            "type": "cloudcode",
            "language": "Node",
            "request": "attach",
            "debugPort": 9229,
            "localRoot": "${workspaceFolder}",
            "remoteRoot": "/hello-world"
        }
    ]
}

Now we can do the following

  1. Click Run on Kubernetes
  2. Set a Break Point
  3. Click Debug on Kubernetes

TIPS

  • Run command prompt, powershell and vscode as Administrator
  • Use Choco for Windows installs
  • If you going to reboot/sleep/shutdown your machine. Please run:
minikube stop

If you do not, you risk corrupting hyper-v and you will get all sorts of issues.

AzureAD Identity – Implicit Grant between a SPA and API Gateway

Overview

This article will discuss a minimum complex design if you have several SPA Apps or other Apps that call a global API . This design is future proof for when you later need to introduce an API gateway, since an API gateway uses a similar concept of 1 access token for all API’s.

If you have a new in house client app that requires AAD integration for Authentication and Authorization and your apps are all exposed within your organisation network only.

Anyone in the organisation can create apps, however it is important to adhere to some conventions, expecially if you want to reduce the costs around onboarding new apps and the administrative overhead.

If you can affort it, I would recommend https://www.okta.com/ , currently AAD still has a far higher OPEX cost associated with the day to day operations regarding Identity Management versus Azure AD. So lets get started with the Implicit Grant Flow for SPA Apps. Hopefully Microsoft will support the Code Authorization with PKCE for SPA’s soon.

Diagram showing the Implciti Grant Flow in an AAD environment

Design

  • App registrations need to be created per environment (Dev, QA, UAT, Prod).
  • All physical API’s you host are linked to 1 App Registration in AAD (Same Audience)
  • All Clients have their own App registration using the API Audience
  • API exposes a default scope – Later when you implement an API gateway SCOPE logic can reside on the gateway. This keeps the design future proof.

This is for SPA apps that use Implicit Grant Flow. AAD currently only supports this flow for SPA apps; otherwise Code Authorization Flow with PKCE is superior, which OKTA supports 🙂

  1. Users are added to Azure AD Groups
  2. 1 Global API App Registration per environment (Global API)
  3. Application Roles are ONLY created in the Global API APp registration
  4. AD Groups are assigned to Application Roles

Diagram Illustration the AAD Relationship in this design. We only support 1 Audience (The Global API).


Diagram illustrating the AAD implementation Stategy regarding Users, Groups, Apps and Application Roles (API)

Implicit Grant SPA ->Global API

Prerquisite – You already have a Global API App Registration that exposes an API in AAD.

  • Go to the Azure Portal
  • Click “App Registrations”Click “New Registration”
  • Provide a name that adheres to the convention – Department <env> <AppName>SPA
  • Once the app is created
  • Go to the Branding Blade and ensure the Domain is verified and select your domaiun name.
  • Add any custom branding e.g. LogoClick Authentication Blade
  • Enable Implicit Grant Flow by ticking – ID Token, Access Token
  • Record the APP ID by clicking Overview and copying the Application (client) ID

If you do not want a consent dialog to show when the user uses the app. You need to add your shiny new app to the Global API app as a trusted application. Each environment has a GLOBAL API e.g. Department <envcode> API

Navigate to Department<envcode> API

Click “Expose an API

”Click “Add a client Application”

There is another method to add API permissions via the SPA App instead of trusted clients, however this required Global Tenant Admin grants.

  • Add the API’s in the API Permissions blade then
  • Click the Grant/revoke admin consent for XYZ Company Limited button, and then select Yes when you are asked if you want to grant consent for the requested permissions for all account in the tenant. You need to contact an Azure AD tenant admin to do this.

You are now ready to use your new APP for Authentication.Update your code base to use a supported OAuth2.0 and OpenID library e.g. MSAL.JS
https://docs.microsoft.com/en-us/azure/active-directory/develop/reference-v2-librariesEnsure you are compliant e.g. Tokens are being verified

  • All Access Tokens provided by the Global API are valid for 1 hour. You will need to refresh the token for a new one. The library will do this for you.
  • Ensure all Tokens are Verified on the SPA side (ID Token and Access Token)
  • NEVER send the ID Token to the API. Only the Access Token (Bearer)
  • AVOID storing the access token in an insecure storage location

Global API Manifest (Global API App Registration)

  • Create your Global API
  • Ensure the Branding is correct
  • Expose an API e.g. default
  • Add Trusted Client Apps like the SPA above
  • Edit the Manfiest to include any Application or User Roles. Application Roles are great for service to service communication which is not covered in this article.

Here is where you define roles in the app manifest on the Global API App.

"appRoles": [
{
"allowedMemberTypes": [
"User"
],
"description": "Standard users of MyApp1 SPA.",
"displayName": "MyApp1-Payroll",
"id": "12343b9d-2e21-4ac3-b3cd-f0227588fc03",
"isEnabled": true,
"lang": null,
"origin": "Application",
"value": "MyApp1-Payroll"
},
...

Your API will implement RBAC based on the App Roles defined in the Global API Manifest. In AAD Portal, add additional App Roles

[Authorize(Roles = "Manager")]
public class ManagerController : Controller
{
}
 
[Authorize(Roles = "Administrator")]
public class AdministrationController : Controller
{
}
 
public class HomeController : Controller
{
}
 
//Policy based
[Authorize(Policy = "AtLeast21")]
public class AlcoholPurchaseController : Controller
{
    public IActionResult Index() => View();
}

Ensure you are using Microsoft Identity V2!

This design will provide the least amount of administrative overhead for apps that need to be publish internally within the organisation and provides a decent level of security coverage.

Creating a Cloud Architecture Roadmap

Image result for cloud architecture jpg

Overview

When a product has been proved to be a success and has just come out of a MVP (Minimal Viable Product) or MMP (Minimal Marketable Product) state, usually a lot of corners would have been cut in order to get a product out and act on the valuable feedback. So inevitably there will be technical debt to take care of.

What is important is having a technical vision that will reduce costs and provide value/impact/scaleable/resilient/reliable which can then be communicated to all stakeholders.

A lot of cost savings can be made when scaling out by putting together a Cloud Architecture Roadmap. The roadmap can then be communicate with your stakeholders, development teams and most importantly finance. It will provide a high level “map” of where you are now and where you want to be at some point in the future.

A roadmap is every changing, just like when my wife and I go travelling around the world. We will have a roadmap of where want to go for a year but are open to making changes half way through the trip e.g. An earthquake hits a country we planned to visit etc. The same is true in IT, sometimes budgets are cut or a budget surplus needs to be consumed, such events can affect your roadmap.

It is something that you want to review on a regular schedule. Most importantly you want to communicate the roadmap and get feedback from others.

Feedback from other engineers and stakeholders is crucial – they may spot something that you did not or provide some better alternative solutions.

Decomposition

The first stage is to decompose your ideas. Below is a list that helps get me started in the right direction. This is by no means an exhausted list, it will differ based on your industry.

Component Description Example
Application Run-timeWhere apps are hostedAzure Kubernetes
Persistent StorageNon-Volatile DataFile Store
Block Store
Object Store
CDN
Message
Database
Cache
Backup/RecoveryBackup/Redundant SolutionsManaged Services
Azure OMS
Recovery Vaults
Volume Images
GEO Redundancy
Data/IOTConnected Devices / SensorsStreaming Analytics
Event Hubs
AI/Machine Learning
GatewayHow services are accessedAzure Front Door, NGIX, Application Gateway, WAF, Kubernetes Ingress Controllers
Hybrid ConnectivityOn-Premise Access
Cross Cloud
Express Route
Jumpboxes
VPN
Citrix
Source ControlWhere code lives
Build – CI/CD
Github, Bitbucket
Azure Devops, Octopus Deploy, Jenkins
Certificate ManagementSSL CertificatesAzure Key Vault
SSL Offloading strategies
Secret ManagementStore sensitive configurationPuppet (Hiera), Azure Keyvault, Lastpass, 1Password
Mobile Device ManagementGoogle Play
AppStore
G-Suite Enterprise MDM etc

Once you have an idea of all your components. The next step is to breakdown your road-map into milestones that will ultimately assist in reaching your final/target state. Which of course will not be final in a few years time 😉 or even months!

Sample Roadmap

Below is a link to a google slide presentation that you can use for your roadmap.

https://docs.google.com/presentation/d/1Hvw46vcWJyEW5b7o4Xet7jrrZ17Q0PVzQxJBzzmcn2U/edit?usp=sharing

Architecture Decisions – Keep a record

Image result for decisions

There are several decision we make every day some conscious and many sub conscious. We have a bit more control over the conscious decisions we make in the work place from an Architecture perspective.

If you are into Development, Devops, Site reliability, Technical Product Owner, Architect or event a contract/consultant; you will be contributing to significant decision regarding engineering.

  1. What Database Technology should we use?
  2. What Search Technology will we use that can scale and do we leverage eventual consistency?
  3. What Container Orchestration or Micro-Service platform shall we use?

When making a decision in 2016, the decision may have been perfectly valid for the technology choices for that time. Fast forward to 2019 and if faced with the exact same decision your solution may be entirely different.

This is absolutely normal and this why it is important to have a “journal” where you outline the key reasons/rationale for a significant architecture decision.

It lays the foundation to effectively communicate with stakeholders and to “sell’ your decisions to others; even better to collaborate with others in a manner that is constructive to evaluating feedback and adjusting key decisions.

I keep a journal of decisions and use a powershell inspired naming convention of Verb-Noun. Secondly I will look at what is trending in the marketplace to use as a guide post. So for a logging/Tracking/Metrics stack, I might start off with reference materials.

https://12factor.net/logs – Generalized Concepts around the stack

https://docs.microsoft.com/en-us/azure/architecture/patterns/ – More specific towards the technology we will use

This allows me to keep on track with what the industry is doing and forces me to keep up to date with best practices.

Below is sample Decision Record that I use. I hope you may find it useful. I often use them when on-boarding consultants/contractors or new members of the team. It is a great way for them to gain insights into the past and where we going.

In the next blog post, I will discuss formulating an Architecture Roadmap and how to effectively communicate your vision with key stakeholders. Until then, happy decisions and do not be surprised when you sometimes convince yourself out of a bad decision that you made 😉

Now…How do I tell my wife we should do this at home when buying the next sofa?

TITLE (Verb-Description-# e.g. Choose-MetricsTracingLoggingStack)

Status

Context

<what is the issue that we’re seeing that is motivating this decision or change.>

Constraints

<what boundaries are in place e.g. cost, technology knowledge/resources at hand>

Decision

<what is the change/transformation that we’re actually proposing or doing.>

Consequences

OAuth 2.0 – Authorization Code with PKCE vs Implicit Grant

Image result for authorization

A lot of organisations are still using the Implicit Flow for authorization when their client applications are browser based e.g. ReactJS/NodeJS. The problem with this workflow is that it was designed when browsers had a lot less capabilities.

Implicit Grant

Implicit Grant flow leverages a redirect with the access token in the url.

If someone gains access to your browser history and your token has not yet expired. They can then gain full access to your resources.

As we can see above, this is in the url on a redirect. So a simple script can find these tokens in your browser history. Step 6 below in the Implicit grant is where the issue occurs and the token is recorded in your history.

If your clients are using modern browsers that support CORS via javascript. Then the solution is to use a flow where step 6 is not an HTTP Get Redirect (302). Ideally we want an http post.

Authorization Workflow with PKCE

Proof Key for Code Exchange – The PKCE extension prevents an attack where the authorization code is intercepted and exchanged for an access token by a malicious client, by providing the authorization server with a way to verify the same client instance that exchanges the authorization code is the same one that initiated the flow.

Secondly instead of an HTTP GET, an HTTP POST is used to send the token over the wire. Thus the exchange is not recorded in the browser history at all. The token is sent as a payload in the HTTP data section and not the URL.

Notice below the token is requested via an HTTP POST (ClientID, (v), (a) on step 8.

  1. The user clicks Login within the native/mobile application.
  2. Auth0’s SDK creates a cryptographically-random code_verifier and from this generates a code_challenge.
  3. Auth0’s SDK redirects the user to the Auth0 Authorization Server (/authorize endpoint) along with the code_challenge.
  4. Your Auth0 Authorization Server redirects the user to the login and authorization prompt.
  5. The user authenticates using one of the configured login options and may see a consent page listing the permissions Auth0 will give to the mobile application.
  6. Your Auth0 Authorization Server stores the code_challenge and redirects the user back to the application with an authorization code.
  7. Auth0’s SDK sends this code and the code_verifier (created in step 2) to the Auth0 Authorization Server (/oauth/token endpoint).
  8. Your Auth0 Authorization Server verifies the code_challenge and code_verifier.
  9. Your Auth0 Authorization Server responds with an ID Token and Access Token (and optionally, a Refresh Token).
  10. Your application can use the Access Token to call an API to access information about the user.
  11. The API responds with requested data.

So… I should just use Authorization Workflow with PKCE? Not so fast. If you have a large customer base that are using older browsers that do not support CORS via javascript, you might be stuck with implicit grant.

Another consideration is that the token endpoint on your Authorization Server of choice MUST support CORS for the trick to come together; not every major vendor supports it yet.

Figure 2: Authorization code grant in a SPA application

However, if you can influence you customers and client browsers to use later versions and security is a big item on your list. This might be the best case to put forward an upgrade plan.

In Summary

  • Does your Authorization Server supprot CORS?
  • Can your clients use modern browsers that support CORS?

If the answer is yes to both, then there is no need to use Implicit Grant with OAuth 2.0

OKTA ands Auth0 are some of the ID providers that support PKCE in SPA clients.

Note: Microsoft Identity V2.0 does not currently support Auth Code Workflow with SPA. This may change in the future as it is a new product and MS are investing in V2.

image2019-9-12_9-2-23.png