MSDB in suspect state

JitheshJithesh
3 min read

We were unable to connect to SQL server through MMS (Microsoft Management Studio). Users were getting below warning:

Cannot display policy health state at the server level, because the user doesn’t have permission. Permission to access the msdb database is required for this feature to work correctly.

Then an error

Database MSDB cannot be opened. It has been marked SUSPECT by recovery. See the SQL Server errorlog for more information. (Microsoft SQL Server, Error: 926)

SQL server logs showed

An error occurred during recovery, preventing the database ‘msdb‘ from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup.

Database msdb was shutdown due to error 9001 in routine. Restart for non-snapshot databases will be attempted after al connection to the database are aborted.

The log for database ‘msdb‘ is not available. Check the operating system error log for related error message. Resolve any errors and restart database.

SQLServerLogMgr::LogWriter. Operating system error 170(The requested resource in use) encountered.

Error: 17053, Severity:16, State: 1

Since SQL log mentioned about operating system error, went ahead and checked event viewer logs and found NTFS file system warning around the same time.

Event ID: 140

The system failed to flush data to transaction log. Corruption may occur in VolumeId:<>, DeviceName:<> {{Device Busy}} They device is currently busy.

Observations

  1. SQL server error 9001 and underlying OS errors 170 indicates the files on disk are being used or locked by some other program. If you encounter a similar scenario, address the underlying OS error. In this case work with your system administrator to ensure that the database and log files are’’t locked by other programs.

  2. Event viewer ID 140 indicates the volume or file is read-only and the operation failed

  3. Volume has become read-only due to something happened around the time.

  4. This has caused SQL server not able to flush transaction logs, which caused msdb to shutdown and an unsuccessful restart

Solution

First thing is to check for the file system errors and fix it.

If problem is with D: drive
> chkdsk D: 
> chkdsk D: /f

In case of WSFC cluster you need to stop the cluster from failover cluster manager before running it on shared disks.

Next step is to run check CHECKDB on MSDB and fix any issues. Below T-SQL command check the integrity of the MSDB database.

USE Master;
      GO
        ALTER Database MSDB
        SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
        GO
        DBCC CHECKDB (msdb, NOINDEX);
        GO
        ALTER DATABASE MSDB
        SET-MULTI_USER;
        GO

Then run the below command for a “soft recovery“ with no data loss.

USE Master;
      GO
        ALTER Database MSDB
        SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
        GO
        DBCC CHECKDB (msdb, REPAIR_REBUILD);
        GO
        ALTER DATABASE MSDB
        SET-MULTI_USER;
        GO

Above steps recovered MSDB from its suspect state and users were able to reconnect to the databases.

References

https://www.dell.com/support/kbdoc/en-uk/000200606/microsoft-sql-server-msdb-database-recovery

https://learn.microsoft.com/en-us/sql/relational-databases/errors-events/mssqlserver-17053-database-engine-error?view=sql-server-ver17#example-with-sql-server-error-9001-and-underlying-os-errors-170

0
Subscribe to my newsletter

Read articles from Jithesh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Jithesh
Jithesh