Creating a Cloud Architecture Roadmap

Image result for cloud architecture jpg

Overview

When a product has been proved to be a success and has just come out of a MVP (Minimal Viable Product) or MMP (Minimal Marketable Product) state, usually a lot of corners would have been cut in order to get a product out and act on the valuable feedback. So inevitably there will be technical debt to take care of.

What is important is having a technical vision that will reduce costs and provide value/impact/scaleable/resilient/reliable which can then be communicated to all stakeholders.

A lot of cost savings can be made when scaling out by putting together a Cloud Architecture Roadmap. The roadmap can then be communicate with your stakeholders, development teams and most importantly finance. It will provide a high level “map” of where you are now and where you want to be at some point in the future.

A roadmap is every changing, just like when my wife and I go travelling around the world. We will have a roadmap of where want to go for a year but are open to making changes half way through the trip e.g. An earthquake hits a country we planned to visit etc. The same is true in IT, sometimes budgets are cut or a budget surplus needs to be consumed, such events can affect your roadmap.

It is something that you want to review on a regular schedule. Most importantly you want to communicate the roadmap and get feedback from others.

Feedback from other engineers and stakeholders is crucial – they may spot something that you did not or provide some better alternative solutions.

Decomposition

The first stage is to decompose your ideas. Below is a list that helps get me started in the right direction. This is by no means an exhausted list, it will differ based on your industry.

Component Description Example
Application Run-timeWhere apps are hostedAzure Kubernetes
Persistent StorageNon-Volatile DataFile Store
Block Store
Object Store
CDN
Message
Database
Cache
Backup/RecoveryBackup/Redundant SolutionsManaged Services
Azure OMS
Recovery Vaults
Volume Images
GEO Redundancy
Data/IOTConnected Devices / SensorsStreaming Analytics
Event Hubs
AI/Machine Learning
GatewayHow services are accessedAzure Front Door, NGIX, Application Gateway, WAF, Kubernetes Ingress Controllers
Hybrid ConnectivityOn-Premise Access
Cross Cloud
Express Route
Jumpboxes
VPN
Citrix
Source ControlWhere code lives
Build – CI/CD
Github, Bitbucket
Azure Devops, Octopus Deploy, Jenkins
Certificate ManagementSSL CertificatesAzure Key Vault
SSL Offloading strategies
Secret ManagementStore sensitive configurationPuppet (Hiera), Azure Keyvault, Lastpass, 1Password
Mobile Device ManagementGoogle Play
AppStore
G-Suite Enterprise MDM etc

Once you have an idea of all your components. The next step is to breakdown your road-map into milestones that will ultimately assist in reaching your final/target state. Which of course will not be final in a few years time 😉 or even months!

Sample Roadmap

Below is a link to a google slide presentation that you can use for your roadmap.

https://docs.google.com/presentation/d/1Hvw46vcWJyEW5b7o4Xet7jrrZ17Q0PVzQxJBzzmcn2U/edit?usp=sharing

Advertisements

Architecture Decisions – Keep a record

Image result for decisions

There are several decision we make every day some conscious and many sub conscious. We have a bit more control over the conscious decisions we make in the work place from an Architecture perspective.

If you are into Development, Devops, Site reliability, Technical Product Owner, Architect or event a contract/consultant; you will be contributing to significant decision regarding engineering.

  1. What Database Technology should we use?
  2. What Search Technology will we use that can scale and do we leverage eventual consistency?
  3. What Container Orchestration or Micro-Service platform shall we use?

When making a decision in 2016, the decision may have been perfectly valid for the technology choices for that time. Fast forward to 2019 and if faced with the exact same decision your solution may be entirely different.

This is absolutely normal and this why it is important to have a “journal” where you outline the key reasons/rationale for a significant architecture decision.

It lays the foundation to effectively communicate with stakeholders and to “sell’ your decisions to others; even better to collaborate with others in a manner that is constructive to evaluating feedback and adjusting key decisions.

I keep a journal of decisions and use a powershell inspired naming convention of Verb-Noun. Secondly I will look at what is trending in the marketplace to use as a guide post. So for a logging/Tracking/Metrics stack, I might start off with reference materials.

https://12factor.net/logs – Generalized Concepts around the stack

https://docs.microsoft.com/en-us/azure/architecture/patterns/ – More specific towards the technology we will use

This allows me to keep on track with what the industry is doing and forces me to keep up to date with best practices.

Below is sample Decision Record that I use. I hope you may find it useful. I often use them when on-boarding consultants/contractors or new members of the team. It is a great way for them to gain insights into the past and where we going.

In the next blog post, I will discuss formulating an Architecture Roadmap and how to effectively communicate your vision with key stakeholders. Until then, happy decisions and do not be surprised when you sometimes convince yourself out of a bad decision that you made 😉

Now…How do I tell my wife we should do this at home when buying the next sofa?

TITLE (Verb-Description-# e.g. Choose-MetricsTracingLoggingStack)

Status

Context

<what is the issue that we’re seeing that is motivating this decision or change.>

Constraints

<what boundaries are in place e.g. cost, technology knowledge/resources at hand>

Decision

<what is the change/transformation that we’re actually proposing or doing.>

Consequences

OAuth 2.0 – Authorization Code with PKCE vs Implicit Grant

Image result for authorization

A lot of organisations are still using the Implicit Flow for authorization when their client applications are browser based e.g. ReactJS/NodeJS. The problem with this workflow is that it was designed when browsers had a lot less capabilities.

Implicit Grant

Implicit Grant flow leverages a redirect with the access token in the url.

If someone gains access to your browser history and your token has not yet expired. They can then gain full access to your resources.

As we can see above, this is in the url on a redirect. So a simple script can find these tokens in your browser history. Step 6 below in the Implicit grant is where the issue occurs and the token is recorded in your history.

If your clients are using modern browsers that support CORS via javascript. Then the solution is to use a flow where step 6 is not an HTTP Get Redirect (302). Ideally we want an http post.

Authorization Workflow with PKCE

Proof Key for Code Exchange – The PKCE extension prevents an attack where the authorization code is intercepted and exchanged for an access token by a malicious client, by providing the authorization server with a way to verify the same client instance that exchanges the authorization code is the same one that initiated the flow.

Secondly instead of an HTTP GET, an HTTP POST is used to send the token over the wire. Thus the exchange is not recorded in the browser history at all. The token is sent as a payload in the HTTP data section and not the URL.

Notice below the token is requested via an HTTP POST (ClientID, (v), (a) on step 8.

  1. The user clicks Login within the native/mobile application.
  2. Auth0’s SDK creates a cryptographically-random code_verifier and from this generates a code_challenge.
  3. Auth0’s SDK redirects the user to the Auth0 Authorization Server (/authorize endpoint) along with the code_challenge.
  4. Your Auth0 Authorization Server redirects the user to the login and authorization prompt.
  5. The user authenticates using one of the configured login options and may see a consent page listing the permissions Auth0 will give to the mobile application.
  6. Your Auth0 Authorization Server stores the code_challenge and redirects the user back to the application with an authorization code.
  7. Auth0’s SDK sends this code and the code_verifier (created in step 2) to the Auth0 Authorization Server (/oauth/token endpoint).
  8. Your Auth0 Authorization Server verifies the code_challenge and code_verifier.
  9. Your Auth0 Authorization Server responds with an ID Token and Access Token (and optionally, a Refresh Token).
  10. Your application can use the Access Token to call an API to access information about the user.
  11. The API responds with requested data.

So… I should just use Authorization Workflow with PKCE? Not so fast. If you have a large customer base that are using older browsers that do not support CORS via javascript, you might be stuck with implicit grant.

Another consideration is that the token endpoint on your Authorization Server of choice MUST support CORS for the trick to come together; not every major vendor supports it yet.

Figure 2: Authorization code grant in a SPA application

However, if you can influence you customers and client browsers to use later versions and security is a big item on your list. This might be the best case to put forward an upgrade plan.

In Summary

  • Does your Authorization Server supprot CORS?
  • Can your clients use modern browsers that support CORS?

If the answer is yes to both, then there is no need to use Implicit Grant with OAuth 2.0

OKTA ands Auth0 are some of the ID providers that support PKCE in SPA clients.

Note: Microsoft Identity V2.0 does not currently support Auth Code Workflow with SPA. This may change in the future as it is a new product and MS are investing in V2.

image2019-9-12_9-2-23.png

Automate your Azure Kubernetes Upgrades – AKS

Recently a security patch by Microsoft has been released. We wanted to ensure we can have a predictable upgrade path.

Below is a Bash script that leverages the AzureCLI to control the uprgrade process.

It will:
* Detect upgradable versions
* Automatically selects the latest upgradable version

Test on Ubuntu 18

#!/usr/bin/env bash

set -e
echo "------------------------------------------------------------------------------------------------------------------"
echo "When you upgrade an AKS cluster, Kubernetes minor versions cannot be skipped."
echo "For example, upgrades between 1.12.x -> 1.13.x or 1.13.x -> 1.14.x are allowed, however 1.12.x -> 1.14.x is not."
echo "To upgrade, from 1.12.x -> 1.14.x, first upgrade from 1.12.x -> 1.13.x, then upgrade from 1.13.x -> 1.14.x."
echo "------------------------------------------------------------------------------------------------------------------"

while ! [[ "$env" =~ ^(sb|dv|ut|pd)$ ]]
do
  echo "Please specifiy environment [sb, dv,ut,pd]?"
  read -r env
done

case $env in

  dv)
    az account set --subscription 'RangerRom DEV'
    subscriptionid=$(az account show --subscription 'RangerRom DEV' --query id | sed  's/\"//g')
    ;;

  sb)
    az account set --subscription 'RangerRom SANDBOX'
    subscriptionid=$(az account show --subscription 'RangerRom SANDBOX' --query id | sed  's/\"//g')
    ;;

  ut)
    az account set --subscription 'RangerRom TEST'
    subscriptionid=$(az account show --subscription 'RangerRom TEST' --query id | sed  's/\"//g')
    ;;

  pd)
    az account set --subscription 'RangerRom PROD'
    subscriptionid=$(az account show --subscription 'RangerRom PROD' --query id | sed  's/\"//g')
    ;;
  *)
    echo "environment not found"
    exit
    ;;
esac

env="dccau${env}"

az aks get-credentials --resource-group "${env}-k8s-rg" --name "${env}-k8s-cluster" --overwrite-existing

echo "Getting the upgrade versions available for a managed AKS: ${env}-k8s-cluster."
az aks get-upgrades --resource-group "${env}-k8s-rg" --name "${env}-k8s-cluster" --output table

echo "Detecting the next minor version to upgrade to."
versionToUpgradeTo=$(az aks get-upgrades --resource-group "${env}-k8s-rg" --name "${env}-k8s-cluster" | grep "kubernetesVersion" | cut -d'"' -f4 | sort | tail -n1)
echo "Upgrading to version $versionToUpgradeTo"

az aks upgrade --resource-group "${env}-k8s-rg" --name "${env}-k8s-cluster" --kubernetes-version $versionToUpgradeTo
echo "Upgrade complete. Please run this again if you need to upgrade to the next minor version."

Python – Virtual Environments

A lot of people learning coding by starting with the traditional “hello world” application. I am intrigued that not a lot of time goes into discussing the coding environment setup.

When I have the luxury to work on a greenfields project. I will set the expectations to spend at least 3 weeks geting the “process” right.

Week 1 – Setup the Agile/Kanban board and plan PBI’s around CI/CD, Infrastructure as code.

Week 2 – Development environment setup

Week 3 – Fully automated deployment of a “hello world” application to the cloud. Encompassing – Automatyed Builds, Gated Releases, Centralised Containers (microservices) etc.

Coming back to the development environment. This alone can increase developer productivity by ensuring they are setup correctly. Otherwise they may be spending hours trying to resolve shared package libraries conflicts.

pyenv-virtualenv

pyenv-virtualenv is a pyenv plugin that provides features to manage virtualenvs and conda environments for Python on UNIX-like systems.

pyenv lets you easily switch between multiple versions of Python. It’s simple, unobtrusive, and follows the UNIX tradition of single-purpose tools that do one thing well.

This project was forked from rbenv and ruby-build, and modified for Python.

It allows us to have complete isolation between projects. So you can focus on code and not package hell. Remember the dll hell days?

// Create Environment
$ pyenv virtualenv 3.7.3 data-collector && pyenv activate data-collector

// Do some Coding!

// Exit virtual environment
(data-collector)$ pyenv deactivate

Now we need to work on another project, we do not need to worry about what data-collector package dependecies are installed, we can just switch to a new environment.

$ pyenv virtualenv 3.7.3 libor-bank-rates && pyenv activate libor-bank-rates

// Do some Coding!

(libor-bank-rates)$ pyenv deactivate

So use pyenv-virtualenv to auto-activate your environments as you work from one project to the next.

$ pyenv virtualenv 3.7.3 data-collector
$ pyenv virtualenv 3.7.3 libor-bank-rates
$ cd ~/data-collector
$ pyenv local data-collector
(data-collector)$ cd ~/libor-bank-rates
$ pyenv local libor-bank-rates
(libor-bank-rates)$ pyenv versions
  system
  3.7.3
  3.7.3/envs/data-collector
  3.7.3/envs/libor-bank-rates
  data-collector
* libor-bank-rates (set by /home/romiko/libor-bank-rates/.python-version)

Also check out Conda.

Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language

// Create virtual environment
$ conda create --name rangerrom python=3.7.3

// Activate virtual environment
$ conda activate rangerrom

// Exit virtual environment
(ranerrom)$ conda deactivate

Conda will install the version of Python if it isn’t installed. You do not need to run conda install python=3.7.3 first. It has full support for managing virtual environments.

Summary

So the next time you decide you need to wip up some new scripts. Have a good think about how you want the environment to be setup and how package management/dependecies should be handled, before writing the infamous “hello world”.

Check out https://realpython.com for awesome tips, tricks and inspiration.

Automate the Deployment of Azure Kubernetes Services Cluster + Application Gateway Ingress Controller

This post will demonstrate how to deploy a AKS cluster using Advanced Networking. We will then deploy an Application Gateway Ingress Controller. Essentially this will install a dedicated ingress POD that fully manages the Application gateway.

This means all entries in the Application gateway are 100% managed by AKS. If you manually add an entry to the AG, it will be removed by AKS Ingress Controller.

Overview

Considerations

  • Decided to dedicate an entire /16 IP range to the AKS cluster for simplicity e.g. 10.69.0.0/16.
  • Leverage AKS with Advanced Networking (CNI).
  • CNI provided the use of Application Gateway with WAFv2.
  • SSL offloading is configured. The actual private key (PEM – Base64 encoded) is stored in the default namespace in AKS. Whenever you deploy a new application, just –export (Deprecated) the key to the new namespace. The AG Ingress Controller will automatically be configured with the SSL certificate.
  • We will apply RBAC rules so AKS can manage the application gateway and VMSS scaleset.
  • RBAC to access container registry.

By using an Application Gateway, we can leverage additional benefits such as Web Application Firewall (V2), OWASP 3.0 firewall detection/prevention rules. Microsoft have totally refactors the AG WAF2 technology stack. It is much faster to provision and can deal with much larger amounts of traffic now.

By combining Load Balancing with WAF, we get the best of both worlds. If you have heavy traffic, it might be good to first do a performance test before making a final decision on AG + AKS stack.

Environment Setup + Tools

We are using AKS VMSS preview feature. Azure Virtual Machine Scale Sets have been around for a long time, and are in fact used by Microsoft Service Fabric. It makes total sense that this auto-scaling architecture is leveraged by AKS.

Due to the preview status of Container Services and VMSS+AKS, we will choose Azure CLI.

You can use Ubuntu Windows Shell or a Linux Ubuntu Shell.

Run the following code to setup your bash environment.

#!/bin/bash
echo "Updating system..."
sudo apt-get update
sudo apt-get upgrade

echo "Installing AzureCLI"
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

echo "Installing helm for AKS Admin"
curl -LO https://git.io/get_helm.sh
chmod 700 get_helm.sh
./get_helm.sh
helm init --service-account tiller --history-max 200

Helm is a client side tool to provide configuration settings to AKS. Tiller is a server side setting that runs on AKS that applies configuration settings that are applied from a helm client.

Create a config folder with 2 files.

Replace THECERTIFICATECHAIN with the contents of your base64 encoded .cer certificate chain. The script will replace <THEPRIVATEKEY> when you paste your private key. Future namespace or apps, will be able to find this in the default namespace. Thus a 1 time operations

e.g. (kubectl get secret rangerrom-tls –namespace=default ….).

apiVersion: v1
kind: Secret
metadata:
  name: rangerrom-tls
type: kubernetes.io/tls
data:
  tls.crt: THECERTIFICATECHAIN
  tls.key: THEPRIVATEKEY

rangerrom-tls.yml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system

tiller-rbac.yml

Azure Prerequisites

Ensure you have
* VNET in a resounce group – ${env}-network-rg
* Subnet with a name ${env}-aks-cluster-subnet matching the IP rules

Install AKS into existing VNET

#!/bin/bash
aksversion='1.13.7'
while ! [[ "$env" =~ ^(sb|dv|ut|pd)$ ]] 
do
  echo "Please specifiy environment [sb, dv,ut,pd]?"
  read -r env
done 

case $env in

  dv)
    servicecidr="10.66.64.0/18"
    dnsserver="10.66.64.10"
    az account set --subscription 'RangerRom DEV'
    subscriptionid=$(az account show --subscription 'RangerRom DEV' --query id | sed  's/\"//g')
    ;;

  sb)
    servicecidr="10.69.64.0/18"
    dnsserver="10.69.64.10"
    az account set --subscription 'RangerRom SANDBOX'
    subscriptionid=$(az account show --subscription 'RangerRom SANDBOX' --query id | sed  's/\"//g')
    ;;

  ut)
    servicecidr="10.70.64.0/18"
    dnsserver="10.70.64.10"
    az account set --subscription 'RangerRom TEST'
    subscriptionid=$(az account show --subscription 'RangerRom TEST' --query id | sed  's/\"//g')
    ;;

  pd)
    servicecidr="10.68.64.0/18"
    dnsserver="10.68.64.10"
    az account set --subscription 'RangerRom PROD'
    subscriptionid=$(az account show --subscription 'RangerRom PROD' --query id | sed  's/\"//g')
    ;;  
  *)
    echo "environment not found"
    exit
    ;;
esac

env="rrau${env}"
location="australiaeast"

az group create --location $location --name "${env}-aks-rg"
sleep 5

az feature register -n VMSSPreview --namespace Microsoft.ContainerService
az provider register -n Microsoft.ContainerService

az aks create \
    --resource-group "${env}-aks-rg" \
    --name "${env}-aks-cluster" \
    --enable-vmss \
    --node-count 2 \
    --kubernetes-version $aksversion \
    --generate-ssh-keys \
    --network-plugin azure \
    --service-cidr $servicecidr \
    --dns-service-ip $dnsserver \
    --vnet-subnet-id "/subscriptions/${subscriptionid}/resourceGroups/${env}-network-rg/providers/Microsoft.Network/virtualNetworks/${env}-network/subnets/${env}-aks-cluster-subnet"



clusterprincipalid=$(az ad sp list --display-name ${env}-aks-cluster --query [0].objectId)
resourceGroupid=$(az group show --name ${env}-network-rg --query 'id')
echo "Configuring cluster to owner ${resourceGroupid}"
cmd="az role assignment create --role Contributor --assignee $clusterprincipalid --scope $resourceGroupid"
eval $cmd


echo "Configuring AKS Cluster with Tiller"
az aks get-credentials --resource-group "${env}-aks-rg" --name "${env}-aks-cluster" --overwrite-existing
kubectl apply -f ./config/tiller-rbac.yml
helm init --service-account tiller

while ! [[ ${#privatekey} -gt 2000 ]]
do
  echo "Please provide TLS Private Key - BASE64 Encoded PEM?"
  read -r privatekey
done

kubectl create namespace scpi-${env}
cat ./config/rangerrom-tls.yml  | sed "s/THEPRIVATEKEY/$privatekey/" > temptls.yml
kubectl apply -f temptls.yml -n default
#The Flag --export is going to be deprecated - Below is workaround.
kubectl get secret rangerrom-tls --namespace=default -o yaml | \
   sed '/^.*creationTimestamp:/d' |\
   sed '/^.*namespace:/d' |\
   sed '/^.*resourceVersion:/d' |\
   sed '/^.*selfLink:/d' |\
   sed '/^.*uid:/d' |\
   kubectl apply --namespace=scpi-${env} -f -

rm -f ./temptls.yml
privatekey=""

echo "Setup container registry permissions"
az acr create -n "${env}containerregistry" -g "${env}-common-rg" --sku Premium
containerid=$(az acr show -n ${env}containerregistry --query id)
principalidaks=$(az ad sp list --all --query "([?contains(to_string(displayName),'"${env}-aks-cluster"')].objectId)[0]")
cmd1="az role assignment create --role acrpull --assignee $principalidaks --scope $containerid"
eval $cmd1

echo "Setup container registry permissions - Centralised"
containerid=$(az acr show --subscription 'RangerRom PROD' -n rraupdcontainerregistry --query id)
cmd2="az role assignment create --role acrpull --assignee $principalidaks --scope $containerid"
eval $cmd2

Install and Configure AKS to control the Ingress

#!/bin/bash
while ! [[ "$env" =~ ^(sb|dv|ut|pd)$ ]] 
do
  echo "Ensure you have owner permissions on the subscription before you continue."
  echo "Please specifiy environment [sb, dv,ut,pd]?"
  read -r env
done 

case $env in

  dv)
    az account set --subscription 'RangerRom DEV'
    ;;

  sb)
    az account set --subscription 'RangerRom SANDBOX'
    ;;

  ut)
    az account set --subscription 'RangerRom TEST'
    ;;

  pd)
    az account set --subscription 'RangerRom PROD'
    ;;  
  *)
    echo "Invalid Environment"
    exit
    ;;
esac

env="rrau${env}"

ipAddressName="${env}-aks-application-gateway-ip"
resourcegroup="MC_${env}-aks-rg_${env}-aks-cluster_australiaeast"
gatewayname="${env}-aks-application-gateway"
location="australiaeast"
vnet="${env}-network"
subnet="${env}-aks-application-gateway-subnet"

az network public-ip create \
  --resource-group $resourcegroup \
  --name $ipAddressName \
  --allocation-method Static \
  --sku Standard

sleep 20

subnetid=$(az network vnet subnet show -g "${env}-network-rg" -n "${env}-aks-application-gateway-subnet" --vnet-name ${vnet} --query id)
cmd="az network application-gateway create --name $gatewayname \
                                      --resource-group $resourcegroup \
                                      --capacity 2  \
                                      --sku "WAF_v2" \
                                      --subnet $subnetid \
                                      --http-settings-cookie-based-affinity Disabled \
                                      --location $location \
                                      --frontend-port 80 \
                                      --public-ip-address $ipAddressName"
eval $cmd
az network application-gateway waf-config set -g $resourcegroup --gateway-name $gatewayname \
                            --enabled true --firewall-mode Detection --rule-set-version 3.0

#Setup AAD POD Identity to manage application gateway
az identity create -g $resourcegroup -n "${env}-aks-aad_pod_identity"
sleep 20

principalid=$(az identity show -g $resourcegroup -n "${env}-aks-aad_pod_identity" --query 'principalId')
appgatewayid=$(az network application-gateway show -g $resourcegroup -n $gatewayname --query 'id')

echo "Assign Role so AKS can manage the Application Gateway"

echo "Configuring Create Role for identity - $principalid - for gateway"
cmd="az role assignment create --role Contributor --assignee $principalid --scope $appgatewayid"
eval $cmd

resourceGroupid=$(az group show --name $resourcegroup --query 'id')

echo "Configuring Read Role for identity - $principalid - for gateway resourcegroup"
cmd="az role assignment create --role Reader --assignee $principalid --scope $resourceGroupid"
eval $cmd

az identity show -g $resourcegroup -n "${env}-aks-aad_pod_identity"
echo "Please use the azure identity details above to configure AKS via Help for the AG Ingress Controller"
echo "Careful with copy and paste. Hidden characters can affect the values!"


echo "Ingress Controller for Azure Application Gateway"
az aks get-credentials --resource-group "${env}-aks-rg" --name "${env}-aks-cluster"
kubectl create -f https://raw.githubusercontent.com/Azure/aad-pod-identity/master/deploy/infra/deployment-rbac.yaml
helm repo add application-gateway-kubernetes-ingress https://appgwingress.blob.core.windows.net/ingress-azure-helm-package/
helm repo update

subscriptionid=$(az account show --query id | sed  's/\"//g')
appGatewayResourceId=$(az network application-gateway show -g $resourcegroup -n $gatewayname --query resourceGroup  | sed  's/\"//g')
identityClientid=$(az identity show -g $resourcegroup -n "${env}-aks-aad_pod_identity" --query clientId  | sed  's/\"//g')
aksfqdn=$(az aks show --resource-group "${env}-aks-rg" --name "${env}-aks-cluster" --query fqdn  | sed  's/\"//g')

cmd="helm upgrade ingress-azure application-gateway-kubernetes-ingress/ingress-azure \
     --install \
     --namespace default \
     --debug \
     --set appgw.name=$gatewayname \
     --set appgw.resourceGroup=$appGatewayResourceId \
     --set appgw.subscriptionId=$subscriptionid \
     --set appgw.shared=false \
     --set armAuth.type=aadPodIdentity \
     --set armAuth.identityResourceID=/subscriptions/$subscriptionid/resourcegroups/$appGatewayResourceId/providers/Microsoft.ManagedIdentity/userAssignedIdentities/$env-aks-aad_pod_identity \
     --set armAuth.identityClientID=$identityClientid \
     --set rbac.enabled=true \
     --set verbosityLevel=3 \
     --set aksClusterConfiguration.apiServerAddress=$aksfqdn"
eval $cmd
kubectl get pods


Conclusion

This post should provide a guide post to setup your infrastructure as code. By leveraging a rock solid naming convention, you can leverage fully automated scripts to deploy your environments. The above scripts for AKS and AG are also idempotent, so they can be run on a scheduled basis e.g. Azure Devops.

Writing a Singleton Class in .NET

Usually you will register your classes in your Dependency Container of choice.

However if you really need to do it manually. Here is sample code that always ensures there is only one taxi instance (Poor Business is not going to do too well – Definitely not a scaleable business).


using System;
using System.Collections.Generic;

namespace Patterns.Singleton
{
    class TaxiSchedule
    {
        static void Main()
        {
            var t1 = Taxi.GetTaxi();
            var t2 = Taxi.GetTaxi();
            var t3 = Taxi.GetTaxi();
            var t4 = Taxi.GetTaxi();

            if (t1 == t2 && t2 == t3 && t3 == t4)
                Console.WriteLine("They are the same!\n");

            var taxi = Taxi.GetTaxi();
            for (int i = 0; i < 24; i++)
                Console.WriteLine("Wake Up: " + taxi.NextDriver.Name);

            Console.ReadKey();
        }
    }

    /// 

<summary>
    /// Singleton Taxi
    /// </summary>


    sealed class Taxi
    {
        static readonly Taxi instance = new Taxi();
        IList<Driver> drivers;
        Random random = new Random();
        int currentDriver = 0;

        private Taxi()
        {
            drivers = new List<Driver>
                {
                  new Driver{ Name = "Joe" },
                  new Driver{ Name = "Bob" },
                  new Driver{ Name = "Harry" },
                  new Driver{ Name = "Ford" },
                };
        }

        public static Taxi GetTaxi()
        {
            return instance;
        }

        public Driver NextDriver
        {
            get
            {
                if (currentDriver >= drivers.Count)
                    currentDriver = 0;
                currentDriver++;
                return drivers[currentDriver -1];
            }
        }
    }

    class Driver
    {
        public string Name { get; set; }
    }
}
<span data-mce-type="bookmark" style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" class="mce_SELRES_start"></span>

 

 

Query Azure AppInsights with Powershell

In order to query AppInsights using powershell, you will need your AppInsights AppId and APIKey.

The important consideration is to ensure your JSON is valid, so always run it through a parser and use the correct escape characters for both JSON and PowerShell. Have a look at the string in $queryData.

The following code will query appinsights and generate csv files based on the batch size. It also using paging by leveraging:

| serialize | extend rn = row_number()

Happy DevOps Reporting 🙂

param (
[Parameter(Mandatory = $true)]
[string] $AppInsightsId,

[Parameter(Mandatory = $true)]
[string] $apiKey,

[Parameter(Mandatory = $false)]
[string]
$Timespan = "P7D",

[Parameter(Mandatory = $false)]
[int]
$batchSize = "10000",

[Parameter(Mandatory = $false)]
[string]
$OutputFolder = "C:\Output\",

[Parameter(Mandatory = $false)]
[string]$logFileName = "AppQuery.log",

[Parameter(Mandatory = $false)]
[string]$logFolder = "C:\Logs\"
)

Add-Type -AssemblyName System.Web

Import-Module .\Helpers.psm1 -Force -ErrorAction Stop
Import-Module .\Shared.Logging.psm1 -Force -Global
Import-Module .\Security.Helpers.psm1 -Force -Global

function prepareFileHeader($filenumber, $columnNames) {
$csvString = ""
ForEach ( $Property in $columnNames )
{
$csvString += "$($Property.Name),"
}
$csvString = $csvString.Substring(0,$csvString.Length -1)
$file = Join-Path $OutputFolder "batch-$i.csv"
$csvString | Out-File -filepath $file -Encoding utf8
$csvString = $null
}

function writeRecordsToFile($records) {
ForEach ( $record in $records )
{
$csvString = ""
foreach($cell in $record) {
$csvString += "$cell,"
}
$csvString = $csvString.Substring(0,$csvString.Length -1)
$file = Join-Path $OutputFolder "batch-$i.csv"
$csvString | Out-File -filepath $file -Encoding utf8 -Append -NoClobber
$csvString = $null
}
}

$logFilePath = PrepareToLog $logFolder $logFileName

try {
$url = "https://api.applicationinsights.io/v1/apps/$AppInsightsId/query"
$headers = @{"Content-Type" = "application/json"}
$headers.add("x-api-key", $apiKey)
$queryString = "?timespan=$Timespan"
$fullUrl = $url + $queryString

$queryTotalMessageCount = "traces\r\n | where message contains \`"Max Retry Count reached\`" and message contains \`"MessageService\`"\r\n | summarize count()"
$queryTotalMessageCountBody = "{
`"query`": `"$queryTotalMessageCount`"
}"
$resultCount = Invoke-WebRequest -Uri $fullUrl -Headers $headers -Method POST -Body $queryTotalMessageCountBody -ErrorAction Continue
$totalObject = ConvertFrom-Json $resultCount.Content

$totalRecords = $totalObject.tables.rows[0]
$pages = [math]::ceiling($totalRecords/$batchSize)
$startRow = 0
$endRow = $batchSize

Write-Host "Total Files: $pages for Batch Size: $batchSize"
For ($i=1; $i -le $pages; $i++) {
Write-Host "Processing File: C:\batch-$i.csv"
$queryData = "traces\r\n | extend TenantId = extract(\`"Tenant Id \\\\[[a-z0-9A-Z-.]*\\\\]\`", 0, message) | extend UniqueTransactionId = extract(\`"\\\\[[a-z0-9A-Z-. _\\\\^]*\\\\]\`",0 ,extract(\`"Message Transaction \\\\[[a-z0-9A-Z-._]*\\\\]\`", 0, message))\r\n | extend TransactionId = trim_start(\`"\\\\[\`", tostring(split(UniqueTransactionId, \`"_\`") [0]))\r\n | extend TransactionDateTicks = tostring(split(UniqueTransactionId, \`"_\`") [1])\r\n | extend PrincipalId = trim_end(\`"\\\\]\`", tostring(split(UniqueTransactionId, \`"_\`") [2]))\r\n | where message contains \`"Max Retry Count reached\`" and message contains \`"MessageService\`"\r\n | project TransactionId, TransactionDateTicks, PrincipalId, TenantId\r\n | summarize ErrorCount = count(TransactionId) by TransactionId, TransactionDateTicks, PrincipalId, TenantId\r\n | serialize | extend rn = row_number()\r\n | where rn > $startRow and rn <= $endRow"
$queryBody = "{
`"query`": `"$queryData`"
}"
$result = Invoke-WebRequest -Uri $fullUrl -Headers $headers -Method POST -Body $queryBody -ErrorAction Continue
$data = ConvertFrom-Json $result.Content
$startRow += $batchSize
$endRow += $batchSize

if($i -eq 1) {
$columnNames = $data.tables.columns | select name
}

prepareFileHeader $i $columnNames
writeRecordsToFile $data.tables.rows
}
} catch {
LogErrorMessage -msg $error[0] -filePath $logFilePath -fatal $true
}

ARM – Modular Templates – Reference resources already created

Hi,

I noticed the Microsoft documentation related to the following function is a little bit vague.

reference(resourceName or resourceIdentifier, [apiVersion], [‘Full’])

The second issue is see a lot of people having is how do you reference a resource already created in ARM and get some of that objects properties e.g. FQDN on a public IP already created etc.

The clue to solve this issue, so that ARM Template B can reference a resource created in ARM Template A can be found here:

By using the reference function, you implicitly declare that one resource depends on another resource if the referenced resource is provisioned within same template and you refer to the resource by its name (not resource ID). You don’t need to also use the dependsOn property. The function isn’t evaluated until the referenced resource has completed deployment.

Or use linked templates (Linked templates is a huge rework and you need to host the files on the net). Lets see if we can do it via resourceId.

Therefore if we do reference a resource by resourceId, we will remove the implicit “depends on”, allowing ARM Template B to use a resource created in a totally different ARM template.

A great example might be the FQDN on an IP Address.

Imagine ARM Template A creates the IP Address


"resources": [
{
"apiVersion": "[variables('publicIPApiVersion')]",
"type": "Microsoft.Network/publicIPAddresses",
"name": "[variables('AppPublicIPName')]",
"location": "[variables('computeLocation')]",
"properties": {
"dnsSettings": {
"domainNameLabel": "[variables('AppDnsName')]"
},
"publicIPAllocationMethod": "Dynamic"
},
"tags": {
"resourceType": "Service Fabric",
"scaleSetName": "[parameters('scaleSetName')]"
}
}]

Now Imagine we need to get the FQDN of the IP Address in a ARM Template B

What we going to do is try this:

reference(resourceIdentifier, [apiVersion]) ->
reference(resourceId(), [apiVersion]) ->
e.g.

Here is an example where ARM template B references a resource in A and gets a property.


"managementEndpoint": "[concat('https://',reference(resourceId('Microsoft.Network/publicIPAddresses/',variables('AppPublicIPName')), variables('publicIPApiVersion')).dnsSettings.fqdn,':',variables('nodeTypePrimaryServiceFabricHttpPort'))]",

The important thing here is to ensure you always include the API Version. This pattern is a very powerful way to create smaller and more modular ARM templates.

Note: In the above pattern, you do not need to define DependsOn in ARM Template B, as we are explicitly defining a reference to an existing resource. ARM Template B is not responsible for creating a public IP. If you need it, you run ARM Template A.

So if you need a reference to existing resources use the above. If you need a reference to resources created in the SAME ARM template use:

reference(resourceName)

Cheers

Service Fabric – Upgrading VMSS Disks, Operating System on Primary Node Type

How do you upgrade the existing Data Disk on a primary Node Type Virtual Machine ScaleSet in Service Fabric?

How do you upgrade the existing Operating System on a primary Node Type VMSS in Service Fabric?

How do you move the Data Disk on a primary Node Type VMSS in Service Fabric?

How do you monitor the status during the upgrade, so you know exactly how many seed nodes have migrated over to the new scale set?

note – We successfully increased the SKU size as well, however this is not supported by Microsoft. However just increase your SKU in ARm and later, after the successful transfer to the new VMSS, run Update-AzureRmServiceFabricDurability.

Considerations

  • You have knowledge to use ARM to deploy an Azure Load Balancer
  • You have knowledge to use ARM to deploy a VMSS Scale Set
  • Service Fabric Durability Tier/Reliability Tier must be at least Silver
  • Keep the original Azure DNS name on the Load Balancer that is used to connect to the Service Fabric Endpoint. Very Important to write it down as a backup
  • You will need to reduce the TTL of all your DNS settings to reduce downtime during the upgrade which will just be the TTL value e.g. 10 minutes. (Ensure you have access to your primary DNS provider to do this)
  • Prepare an ARM template to add the new Azure Load Balancer that the new VMSS scaleset will attach to (Backend Pool)
  • Prepare an ARM template to add the new VMSS to an existing Service Fabric primary Node Type
  • Deploy the new Azure Load Balancer + Virtual Machine Scale Set to the Service Fabric Primary node
  • Run the RemoveScaleSetFromClusterController.ps1 – Run this script on the NEW node in the NEW VMSS. This script will monitor and facilitate moving the Primary Node Type to the new VMSS for you.  It will show you the status of the Seed nodes moving from the original Primary Node Type to the new VMSS.
  • When it completed, the last part will be to update DNS.
  • Run MoveDNSToNewPublicIPController.ps1

ARM Templates

You will need only 2 templates. One to Deploy a new Azure Load Balancer and one to Deploy the new VMSS Scale Set to the existing Service Fabric Cluster.

You will also need a powershell script that will run a custom script extension.

Custom Script – prepare_sf_vm.ps1


$disks = Get-Disk | Where partitionstyle -eq 'raw' | sort number

$letters = 70..89 | ForEach-Object { [char]$_ }
$count = 0
$label = "datadisk"

foreach ($disk in $disks) {
    $driveLetter = $letters[$count].ToString()
    $disk | 
    Initialize-Disk -PartitionStyle GPT -PassThru |
    New-Partition -UseMaximumSize -DriveLetter $driveLetter |
    Format-Volume -FileSystem NTFS -NewFileSystemLabel "$label$count" -Confirm:$false -Force
$count++
}

# Disable Windows Update
Set-ItemProperty -Path 'HKLM:\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate\AU' -Name NoAutoUpdate -Value 1

 

Load Balancer – azuredeploy_servicefabric_loadbalancer.json

Use your particular Load Balancer ARM Templates. No need to attached a backend pool, as this will be done by the VMSS script below.

Service Fabric attach new VMSS – azuredeploy_add_new_VMSS_to_nodeType.json

Create your own VMSS scaleset that you attach to Service fabric. The important aspect are the following.

nodeTypeRef (To attach VMSS to existing PrimaryNodeType).
dataPath (To use a new Disk for data)
dataDisk (to add a new managed physical disk)

We use F:\ onwards as D is reserved for Temp storage and E: is reserved for a CD ROM in Azure VM’s.


{
                                "name": "[concat('ServiceFabricNodeVmExt',variables('vmNodeType0Name'))]",
                                "properties": {
                                    "type": "ServiceFabricNode",
                                    "autoUpgradeMinorVersion": true,
                                    "protectedSettings": {
                                        "StorageAccountKey1": "[listKeys(resourceId('Microsoft.Storage/storageAccounts', variables('supportLogStorageAccountName')),'2015-05-01-preview').key1]",
                                        "StorageAccountKey2": "[listKeys(resourceId('Microsoft.Storage/storageAccounts', variables('supportLogStorageAccountName')),'2015-05-01-preview').key2]"
                                    },
                                    "publisher": "Microsoft.Azure.ServiceFabric",
                                    "settings": {
                                        "clusterEndpoint": "[parameters('existingClusterConnectionEndpoint')]",
                                        "nodeTypeRef": "[parameters('existingNodeTypeName')]",
                                        "dataPath": "F:\\\\SvcFab",
                                        "durabilityLevel": "Silver",
                                        "enableParallelJobs": true,
                                        "nicPrefixOverride": "[variables('subnet0Prefix')]",
                                        "certificate": {
                                            "thumbprint": "[parameters('certificateThumbprint')]",
                                            "x509StoreName": "[parameters('certificateStoreValue')]"
                                        }
                                    },
                                    "typeHandlerVersion": "1.0"
                                }
                            },
....
.......
.........
"storageProfile": {
                        "imageReference": {
                            "publisher": "[parameters('vmImagePublisher')]",
                            "offer": "[parameters('vmImageOffer')]",
                            "sku": "2016-Datacenter-with-Containers",
                            "version": "[parameters('vmImageVersion')]"
                        },
                        "osDisk": {
                            "managedDisk": {
                                "storageAccountType": "[parameters('storageAccountType')]"
                            },
                            "caching": "ReadWrite",
                            "createOption": "FromImage"
                        },
                        "dataDisks": [
                            {
                                "managedDisk": {
                                    "storageAccountType": "[parameters('storageAccountType')]"
                                },
                                "lun": 0,
                                "createOption": "Empty",
                                "diskSizeGB": "[parameters('dataDiskSize')]",
                                "caching": "None"
                            }
                        ]
                    }

...
....
.....
 "virtualMachineProfile": {
                    "extensionProfile": {
                        "extensions": [
                            {
                                "name": "PrepareDataDisk",
                                "properties": {
                                    "publisher": "Microsoft.Compute",
                                    "type": "CustomScriptExtension",
                                    "typeHandlerVersion": "1.8",
                                    "autoUpgradeMinorVersion": true,
                                    "settings": {
                                    "fileUris": [
                                        "[variables('vmssSetupScriptUrl')]"
                                    ],
                                    "commandToExecute": "[concat('powershell -ExecutionPolicy Unrestricted -File prepare_sf_vm.ps1 ')]"
                                    }
                                }
                            },


 

Once you have a new VMSS scale set attached to the existing NodeType, you should see in Service Fabric the extra nodes. the next step is to disable and remove the existing VMSS scaleset. This is an online operation, so you should be fine. However later we will need to update DNS for the Cluster Endpoint. This is important for Powershell Admin tools to still connect to the Service Fabric cluster.

RemoveScaleSetFromClusterController.ps1

Remote into one of the NEW VMSS virtual machines and run the following command. It will make dead sure that your seed nodes migrate over. it can take a long time (Microsoft docs say it takes a long time, how long?). it depends, for a cluster with 5 seed nodes, it took nearly 4 hours! So be patient and update the loop timeout to match your environment, increase the timeout if you have more than 5 seed nodes. My general rule is allow 45 minutes per seed node transfer.


#Requires -Version 5.0
#Requires -RunAsAdministrator



param (
    [Parameter(Mandatory = $true)]
    [string]
    $subscriptionName,

    [Parameter(Mandatory = $true)]
    [string] 
    $scaleSetToDisable,

    [Parameter(Mandatory = $true)]
    [string]
    $scaleSetToEnable,

    [Parameter(Mandatory = $true)]
    [string] 
    $resourceGroupName
)

Install-Module AzureRM.Compute -Force

Import-Module ServiceFabric -Force
Import-Module AzureRM.Compute -Force

function Disable-InternetExplorerESC {
    $AdminKey = "HKLM:\SOFTWARE\Microsoft\Active Setup\Installed Components\{A509B1A7-37EF-4b3f-8CFC-4F3A74704073}"
    $UserKey = "HKLM:\SOFTWARE\Microsoft\Active Setup\Installed Components\{A509B1A8-37EF-4b3f-8CFC-4F3A74704073}"
    Set-ItemProperty -Path $AdminKey -Name "IsInstalled" -Value 0
    Set-ItemProperty -Path $UserKey -Name "IsInstalled" -Value 0
    Stop-Process -Name Explorer
    Write-Host "IE Enhanced Security Configuration (ESC) has been disabled." -ForegroundColor Green
}

function Enable-InternetExplorerESC {
    $AdminKey = "HKLM:\SOFTWARE\Microsoft\Active Setup\Installed Components\{A509B1A7-37EF-4b3f-8CFC-4F3A74704073}"
    $UserKey = "HKLM:\SOFTWARE\Microsoft\Active Setup\Installed Components\{A509B1A8-37EF-4b3f-8CFC-4F3A74704073}"
    Set-ItemProperty -Path $AdminKey -Name "IsInstalled" -Value 1
    Set-ItemProperty -Path $UserKey -Name "IsInstalled" -Value 1
    Stop-Process -Name Explorer
    Write-Host "IE Enhanced Security Configuration (ESC) has been enabled." -ForegroundColor Green
}

$ErrorActionPreference = "Stop"

Disable-InternetExplorerESC

Login-AzureRmAccount -SubscriptionName $subscriptionName

Write-Host "Before you continue:  Ensure IE Enhanced Security is off."
Write-Host "Before you continue:  Ensure your new scaleset is ALREADY added to the Service Fabric Cluster"
Pause

try {
    Connect-ServiceFabricCluster
    Get-ServiceFabricClusterHealth
} catch {
    Write-Error "Please run this script from one of the new nodes in the cluster."
}

Write-Host "Please do not continue unless the Cluster is healthy and both Scale Sets are present in the SFCluster."
Pause

$nodesToDisable = Get-ServiceFabricNode | Where NodeName -match "_($scaleSetToDisable)_\d+"
$OldSeedCount = ( $nodesToDisable | Where IsSeedNode -eq  $true | Measure-Object).Count
$nodesToEnable = Get-ServiceFabricNode | Where NodeName -match "_($scaleSetToEnable)_\d+"

if($OldSeedCount -eq 0){
    Write-Error "Node Seed count must be greater than zero."
    exit
}

if($nodesToDisable.Count -eq 0){
    Write-Error "No nodes to disable found."
    exit
}

if($nodesToEnable.Count -eq 0){
    Write-Error "No nodes to enable found."
    exit
}

If (-not ($nodesToEnable.Count -ge $OldSeedCount)) {
    Write-Error "The new VM Scale Set must have at least $OldSeedCount nodes in order for the Seed Nodes to migrate over."
    exit
}

Write-Host "Disabling nodes in VMSS $scaleSetToDisable. Are you sure?"
Pause

foreach($node in $nodesToDisable){
    Disable-ServiceFabricNode -NodeName $node.NodeName -Intent RemoveNode -Force
}

Write-Host "Checking node status..."
$loopTimeout = 360
$loopWait = 60
$oldNodesDeactivated = $false
$newSeedNodesReady = $false

while ($loopTimeout -ne 0) {
    Get-Date -Format o
    Write-Host
    Write-Host "Nodes To Remove"

    foreach($nodeToDisable in $nodesToDisable) {
        $state = Get-ServiceFabricNode -NodeName $nodeToDisable.NodeName
        $msg = "{0} NodeDeactivationInfo: {1} IsSeedNode: {2} NodeStatus {3}" -f $nodeToDisable.NodeName, $state.NodeDeactivationInfo.Status, $state.IsSeedNode, $state.NodeStatus
        Write-Host $msg
    }

    $oldNodesDeactivated = ($nodesToDisable |  Where-Object { ($_.NodeStatus -eq [System.Fabric.Query.NodeStatus]::Disabled) -and ($_.NodeDeactivationInfo.Status -eq "Completed") } | Measure-Object).Count -eq $nodesToDisable.Count

    Write-Host
    Write-Host "Nodes To Add Status"

    foreach($nodeToEnable in $nodesToEnable) {
        $state = Get-ServiceFabricNode -NodeName $nodeToEnable.NodeName
        $msg = "{0} IsSeedNode: {1}, NodeStatus: {2}" -f $nodeToEnable.NodeName, $state.IsSeedNode, $state.NodeStatus
        Write-Host $msg
    }
    $newSeedNodesReady = ($nodesToEnable |  Where-Object { ($_.NodeStatus -eq [System.Fabric.Query.NodeStatus]::Up) -and $_.IsSeedNode} | Measure-Object).Count -ge $OldSeedCount
    if($oldNodesDeactivated -and $newSeedNodesReady) {
        break
    }
    $loopTimeout -= 1
    Start-Sleep $loopWait
}

if (-not ($oldNodesDeactivated)) {
    Write-Error "A node failed to deactivate within the time period specified."
    exit
}

$loopTimeout = 180
while ($loopTimeout -ne 0) {
    Write-Host
    Write-Host "Nodes To Add Status"

    foreach($nodeToEnable in $nodesToEnable) {
        $state = Get-ServiceFabricNode -NodeName $nodeToEnable.NodeName
        $msg = "{0} IsSeedNode: {1}, NodeStatus: {2}" -f $nodeToEnable.NodeName, $state.IsSeedNode, $state.NodeStatus
        Write-Host $msg
    }
    $newSeedNodesReady = ($nodesToEnable |  Where-Object { ($_.NodeStatus -eq [System.Fabric.Query.NodeStatus]::Up) -and $_.IsSeedNode} | Measure-Object).Count -ge $OldSeedCount
    if($newSeedNodesReady) {
        break
    }
    $loopTimeout -= 1
    Start-Sleep $loopWait
}

$NewSeedNodes = Get-ServiceFabricNode | Where-Object {($_.NodeName -match "_($scaleSetToEnable)_\d+") -and ($_.IsSeedNode -eq $True)}
Write-Host "New Seed Nodes are:"
$NewSeedNodes | Select NodeName
$NewSeedNodesCount = ($NewSeedNodes  | Measure-Object).Count

if($NewSeedNodesCount -ge $OldSeedCount) {
    Write-Host "Removing the scale set $scaleSetToDisable"
    Remove-AzureRmVmss -ResourceGroupName $ResourceGroupName -VMScaleSetName $scaleSetToDisable -Force
    Write-Host "Removed scale set $scaleSetToDisable"

    Write-Host "Removing Node State for old nodes"
    $nodesToDisable | Remove-ServiceFabricNodeState -Force
    Write-Host "Done"

    Get-ServiceFabricClusterHealth
    Get-ServiceFabricNode
} else {
    Write-Host "New Seed Nodes do not match the minimum requirements $NewSeedNodesCount."
    Write-Host "Manually run  Remove-AzureRmVmss"
    Write-Host "Then Manually run  Remove-ServiceFabricNodeState"
    Get-ServiceFabricClusterHealth
    Get-ServiceFabricNode
}

Enable-InternetExplorerESC

This script is extremely useful, you can see the progress of the transfer of seed nodes and disabling of existing primary node types.

You know it is successful, when the old nodes have ZERO seed nodes. All SEED nodes must transfer over to the new nodes, and all nodes in the old  scale set shoul dbe set to false by the end of the script execution.

MoveDNSToNewPublicIPController.ps1

Lastly you MUST update DNS to use the original CNAME . This script can help with this, what it does is actually detach the original internal Azure CNAME from the old public IP and move it to your new public IP attached to the new load balancer.




param (
        [Parameter(Mandatory = $true)]
        [string]
        $subscriptionName,

        [Parameter(Mandatory = $true)]
        [string]
        $oldLoadBalancerName,

        [Parameter(Mandatory = $true)]
        [string]
        $resourceGroupName=,

        [Parameter(Mandatory = $true)]
        [string]
        $oldPublicIpName=,

        [Parameter(Mandatory = $true)]
        [string]
        $newPublicIpName=
)

    Install-Module AzureRM.Network -Force
    Import-Module AzureRM.Network -Force

    $ErrorActionPreference = "Stop"
    Login-AzureRmAccount -SubscriptionName $subscriptionName

    Write-Host "Are you sure you want to do this. There will be brief connectivty downtime?"
    Pause

    $oldprimaryPublicIP = Get-AzureRmPublicIpAddress -Name $oldPublicIpName -ResourceGroupName $resourceGroupName
    $primaryDNSName = $oldprimaryPublicIP.DnsSettings.DomainNameLabel
    $primaryDNSFqdn = $oldprimaryPublicIP.DnsSettings.Fqdn
    
    if($primaryDNSName.Length -gt 0 -and $primaryDNSFqdn -gt 0) {
        Write-Host "Found the Primary DNS Name" $primaryDNSName
        Write-Host "Found the Primary DNS FQDN" $primaryDNSFqdn
    } else {
        Write-Error "Could not find the DNS attached to Old IP $oldprimaryPublicIP"
        Exit
    }
    
        Write-Host "Moving the Azure DNS Names to the new Public IP"
    $PublicIP = Get-AzureRmPublicIpAddress -Name $newPublicIpName -ResourceGroupName $resourceGroupName
    $PublicIP.DnsSettings.DomainNameLabel = $primaryDNSName
    $PublicIP.DnsSettings.Fqdn = $primaryDNSFqdn
    Set-AzureRmPublicIpAddress -PublicIpAddress $PublicIP

    Get-AzureRmPublicIpAddress -Name $newPublicIpName -ResourceGroupName $resourceGroupName
    Write-Host "Transfer Done"

    Write-Host "Removing Load Balancer related to old Primary NodeType."
    Write-Host "Are you sure?"
    Pause

    Remove-AzureRmLoadBalancer -Name $oldLoadBalancerName -ResourceGroupName $resourceGroupName -Force
    Remove-AzureRmPublicIpAddress -Name $oldPublicIpName -ResourceGroupName $resourceGroupName -Force

    Write-Host "Done"

Summary

In this article you followed the process to:

  • Configure ARM to add a new VMSS with OS, Data Disk and Operating System
  • Add a new Virtual Machine Scale Set to an Existing Service Fabric Node Type
  • Ran a powershell script controller to monitor the outcome of the VMSS transfer.
  • Transferred the original management DNS CNAME to the new Public IP Address

Conclusion

This project requires a lot of testing for your environment, allocate at least a a few days to test the entire process before you try it out on your production services.

HTH