Upgrading SRM 6.x to 6.x.x, service fails with the error Details: “Failed to start service. VMware vCenter Site Recovery Manager service failed to start. Check that all required Windows services are running”
These days I have been working more of under wild card products like Site Recovery Manager and vSphere Replication so you would find most of the blogs related to SRM and VR.
I would like to share my recent experience with one of the customer where the upgrade from SRM 6.1.1 to 6.1.2 was failing during the upgrade. SRM used to register the upgraded service under PSC but right at the end when it is about the start the service we used to encounter the error mentioned in below screen shot.
As always being a good engineer we prefer looking at the logs, I straight away went through the logs available in below path. I could find a backtrace every time I try to restart the service.
C:\ProgramData\VMware\VMware Site Recovery Manager\Logs\
–> Panic: FAILURE: “Deserialize failed for data item (persistence id: ##global##_pvmi.protected-vm-8094): std::exception ‘class boost::archive::archive_exception’ “input stream error”” @ d:/build/ob/bora-3884620/srm/src/jobs/jobs.cpp:304
–> [backtrace begin] product: VMware vCenter Site Recovery Manager, version: 6.1.1, build: build-3884620, tag: –
–> backtrace vmacore.dll[0x001C568A]
–> backtrace vmacore.dll[0x0005CA8F]
–> backtrace vmacore.dll[0x0005DBDE]
–> backtrace vmacore.dll[0x001D7405]
–> backtrace vmacore.dll[0x001D74FD]
–> backtrace vmacore.dll[0x0004D83C]
–> backtrace dr-jobs.dll[0x00035DB7]
–> backtrace MSVCR90.dll[0x00074830]
–> backtrace MSVCR90.dll[0x00043B3C]
–> backtrace ntdll.dll[0x00050C51]
–> backtrace dr-jobs.dll[0x0000390C]
–> backtrace dr-jobs.dll[0x00005408]
–> backtrace dr-recovery.dll[0x0016F153]
–> backtrace dr-recovery.dll[0x0016AC81]
–> backtrace dr-recovery.dll[0x002B0488]
–> backtrace dr-recovery.dll[0x002AEF33]
–> backtrace dr-recovery.dll[0x002B545E]
–> backtrace dr-recovery.dll[0x00031A19]
–> backtrace functional.dll[0x00028089]
–> backtrace vmacore.dll[0x00159CCE]
–> backtrace vmacore.dll[0x0015D53F]
–> backtrace vmacore.dll[0x0015EA91]
–> backtrace vmacore.dll[0x001607C5]
–> backtrace vmacore.dll[0x00065FEB]
–> backtrace vmacore.dll[0x0015BC50]
–> backtrace vmacore.dll[0x001D2A5B]
–> backtrace MSVCR90.dll[0x00002FDF]
–> backtrace MSVCR90.dll[0x00003080]
–> backtrace kernel32.dll[0x0001652D]
–> backtrace ntdll.dll[0x0002C541]
–> [backtrace end]
After thorough research I managed to find the cause of the problem where the issue is caused due to corrupted database table pdj_dataitem under SRM DB.
In my case, customer was using embedded postgres DB so I followed the below procedure to get rid of the stale records.
Note: Before Proceeding further please consider the points mentioned in bullet points.
- Take a backup of the database if its external to SRM
- For embedded DB ensure you take a snapshot before you proceed to delete the records
- Since the upgrade gets halted during starting the service, Do not close the upgrade installer window, Minimize and perform the below task, Once done maximize the installer window and click on retry.
Opened a command prompt and went to the below mentioned path.
C:\ProgramData\VMware\VMware Site Recovery Manager Embedded Database\bin
Connected to the database using the command
psql -p 5678 -U user_name -d Databasename –> This can be retrieved from ODBC configuration
Once connected executed the below query
select * from pdj_dataitem; –> I got the output with couple of records in place
Based on the record retrieved deleted the records accordingly based on db_id.
delete from pdj_dataitem where db_id=value retrieved;
Retried to start the service as upgrade installer was not closed. Service started successfully and SRM was upgraded to 6.1.2.
I hope this article was helpful. Watch out for more.