A simplified Azure Landing Zones alternative

Since before the plague a number of reference architectures for a Azure Landing Zones have emerged. From a Microsoft perspective it seems to have started with the North Star project which eventually became Azure Landing Zones (Enterprise-Scale) - Reference Implementation (first commit May 2020) using ARM templates. A Terraform version – Azure landing zones Terraform module – and a Bicep version Azure Landing Zones (ALZ) - Bicep soon followed.

The current guidance is to use Azure Verified Modules (AVM) to deploy an Azure Landing Zones implementation.

To monitor your Azure platform, deploying an additional project: Azure Monitor Baseline Alerts (AMBA) seems to be the official recommendation.

Complexity

All the reference implementations above suffer from the authors’ incessant need to continuously add more stuff. The implementations have very large and daunting code bases, which means that they are almost impossible to get a grip on - let alone understand how to extend.

To remedy these challenges we introduce a simplified implementation which should allow platform teams to much more easily reason about and understand what they are trying to build.

The simplified version can be found at Azure Landing Zones Demo.

To compare the complexity and maintainability of the solutions mentioned, we can use cloc to get an overall idea of the number of files and lines of code in each implementation. We will only count infrastructure as code and scripts: JSON, HCL, Bicep, Bash, and PowerShell:

cloc --include-lang=JSON,PowerShell,Bourne\ Shell,HCL,Standard\ ML,YAML --force-lang="Standard ML,bicep" [path]

Language	ARM	Terraform	Bicep	AVM	AMBA	Simplified
Bicep	420		11,692	172,920	103,483	1,150
HCL		9,142
JSON	119,990	41,072	75,722	593,406	738,156	1,328
YAML	740	801	3,063	17,225	18,120	886
PowerShell	5,431	685	1,439	12,912	2,030	455
Bash		406	13	331
SUM	126,581	52,106	91,929	796,794	861,789	3,819
Files	441	442	690	2,475	3,066	82

Assuming you prefer Terraform, you need to inherit, support, understand, and reason about at least 52,106 lines of code across 442 files! Then extend the code with your own requirements. This is going to be really hard even with a reasonably sized team (4-6 people)

Worst case scenario: You have deployed the original Enterprise Scale version using the Portal Experience [read: ClickOps] and added Baseline Alerts. You now need somehow reverse engineer your setup into Infrastructure as Code while trying to support, understand, and reason about and alert framework consisting of 796,794 lines of code across 2,475 files! This is not hard. This is completely impossible regardless of team size.

Compare this to the simplified version with 3,819 lines of code across 82 files.

Which version would you rather start with?

What does simplified mean here?

To quote the docs:

The conceptual architecture is greatly simplified compared to the official one, as we empower DevOps teams to build and run their own thing.

We do not want to manage network from a centralized perspective. All applications will be deployed as islands with no inter-network connectivity.

We adopt a Zero Trust approach where identity and encryption trumps and often replaces Network Security.

We do not require nor encourage the use of Azure Private Link.

We allow most services to have Public Network Access: Enabled because we rely on enforcing Entra ID authentication and HTTPS/TLS 1.2+.

Online Landing Zones

These are the most important landing zones - all newer applications should be deployed here - even if data resides on-premises.

Connection to on-premises resources should be managed using zero-trust approaches with resources like:

Azure Relay

Azure Service Bus

Azure API Management

Azure Arc

Corp Landing Zones

Corp landing zones should exclusively be used for lift-and-shift scenarios (and avoided all together if possible). This is reserved for applications which do not support modern authentication and relies on Kerberos (Windows Active Directory).

– Azure Landing Zones Demo

Comparing policy-driven governance to verified modules

Using Azure Policy we supply a number of number of policies for popular resources: Web Apps, Blob Storage, Key Vault, and SQL.

Having deployed these policies we enforce the following security defaults on storage accounts:

HTTPS only (supportsHttpsTrafficOnly)
TLS 1.2 (minimumTlsVersion)
Disallow blob public access (allowBlobPublicAccess)
Disallow cross tenant replication (allowCrossTenantReplication)
Disallow shared key access (allowSharedKeyAccess)
Default to OAuth (defaultToOAuthAuthentication)
Enable Defender for Storage

NB: We use modify and deploy if not exists policy effects to ensure that issues with existing storage accounts are automatically remediated.

NBB: Security relies on zero trust principles of identity-based security (disabling keys) and encryption in transit (HTTPS/TLS 1.2).

Having done this, a storage account can be deployed with a very simple Bicep template:

param location string = resourceGroup().location
param storageAccountName string

resource storageAccount 'Microsoft.Storage/storageAccounts@2023-05-01' = {
  name: storageAccountName
  location: location
  kind: 'StorageV2'
  sku: {
    name: 'Standard_LRS'
  }
  properties: {}
}

or using Azure CLI:

az storage account create -n storage42 -g group -l swedencentral --sku Standard_LRS

The policies ensure that the platform enforces a reasonable set of security defaults, relieving developers from the task.

Compare the 12 lines of code in Bicep above to the Azure Verified Module version which contains 3,531 lines of Bicep across 29 files (738 lines in the root file).

Yes, the official module can do more stuff (mostly YAGNI), but we must ask the question: Which implementation would you rather reason about and support going forward?

The same principles apply for web apps, key vaults, and SQL. This can be extended quite easily but we deliberately want to keep the reference implementation simple. Pull requests are welcome, though.

What about the corporate network?

Cloud applications should never be connected to the on-premises network on the network layer. Doing so adds an unnecessary dependency and makes things less secure. Even for lift and shift of legacy applications where a connection to the on-premises network seems like the only option there are often more secure alternatives like Microsoft Entra Domain Services. If all else fails and you must connect on-premises with IPv4 this will be equal parts expensive and complex while relying on your organisation’s existing network setup. Because of this we do not want to or mandate a reference architecture. This must be done bespoke every time.

Once again, we still recommend to not connect the corporate network at all and rely on Azure Relay and Azure Service Bus instead.

Conclusion

We hope this project can serve as a reminder that often less is more and getting started should never require you to deploy almost a million lines of code you don’t understand.

Check out Azure Landing Zones Demo and let us know what you think using Issues, Stars, and Pull Requests.