VMware Configuration Server for ASR gets unresponsive

Overview

During one of my last projects I had to work with Azure Site Recovery (ASR) to migrate servers from VMware environment into Azure. As you may know this involves several components both from Azure and on-premises to work together properly:

  • Azure Site Recovery Vault
  • Azure storage account associated with the ASR configuration
  • Configuration server for VMware running on-premises
  • vCenter connection and permission

These portions all need to work without issue to give you the adequate Azure migration experience. In my case it was a configuration with single ASR vault and a single on-premises VMware configuration server which was performing sufficiently for weeks helping me to migrate a number of servers into Azure

Issue

From one day to another the configuration server started to report issues with Process Server, Master/Target server and vCenter connection. The connection from the configuration server to the ASR vault was OK and reported as “Connected”. It shows actual and current times as last heartbeat in Azure portal. However, when I selected the configuration server in <ASR_Vault>->Site Recovery Infrastructure->Configuration Servers-><Configuration_Server> connection to all 3 associated servers were marked with a red X and “Last Heartbeat” time was showing and old date and time.

Investigation

Of course, the first idea was a usual Windows Solution. “Have you tried to turn it off and on again?”. I did, but no change in the status. Then the next obvious choice it to upgrade the configuration server, because the “Agent Version” in Azure portal showed “update available”. So, I downloaded the new version and run the installer. The upgrade went without any issue. However, it did not solve the problem. So, it was time to dig myself more into the problem and start a deeper investigation. And I did. It was eventually almost two days of checking and testing with all the components like IIS, network, etc. I could go into details how much things I did for finding the solution, but it would just make this article boring and maybe the reader would close the site without reaching the most important part, “The Solution”

Solution

So, after two days of work on the configuration server, I finally find out that the service has problem when it wants to access a file in C:\ProgramData\ASR\… which actually a hard folder link to the installation folder of the configuration server component, which actually can be reached from 2 different path. In my case the permission inheritance on C:\ProgramData\ASR folder link was turned off and set to individual users. And because of the services was trying to access the file using this path it applied with the wrong permission. After I turned on the permission inheritance on that folder link, the service started without error and the Configuration server in Azure came back online and started to report healthy again

Actually, this was the first time I experienced if you use these kind of folder links, you can give different type of access to files, so I think I will start to use it in other cases where it does not break things

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.