Troubleshooting

SRM Test Recovery fails with the error “Failed to create snapshots of replica devices. Timed out (300 seconds) while waiting for SRA to complete ‘testFailoverStart’ command”

I have always received multiple cases for this issue so thought of writing a tiny article on this to keep the viewers updated.

Whenever we perform a test recovery during the process where writeable snapshots are being created the plan fails with the error as mentioned below

“Error – Failed to create snapshots of replica devices. Timed out (300 seconds) while waiting for SRA to complete ‘testFailoverStart’ command.”

 

So irrespective of the SRM or SRA version being used we have witnessed this error in the SRM environment.

You can verify the below screenshot for your better understanding. This is a sample recovery plan report extracted post the plan failed to execute.

In this scenario SRA communicates with the storage to take a snapshot of the replicated lun but during this process if the recovery plan times out , SRA commands are terminated immediately.

 

So how to fix it ?

In the SRM advance settings we have a parameter called Storage.commandtimeout which is by default set to 300. Now the instructions sent by SRA to storage and the amount of time required for SRA commands to complete varies from storage to storage.

The above parameter can be found under Home—Site Recovery—Sites—Protected & Recovery Site—Manage—Advance settings—Storage tab. This needs to be changed for both the sites one at a time.

 

Based on my experience, I would recommend to increase the timeout value to 600 seconds and restart the SRM services on both the sites.

Re-run the plan. The test recovery should now complete successfully.

 

Hope this article was helpful. Watch out for more.

 

Ritesh Shenoy
Hey, My name is Ritesh Shenoy working a Tech Support Engineer for VMware. Had an idea on blogging tasks faced on my daily basis which would ideally help other on their daily lives.

Leave a Response