Do you use Linux machines in your data center? If so, it's probably a good idea to regularly check the health of the SSD drives used on those machines. Here's how to do it.
If your data center makes use of Linux machines, one of the administrative tasks you'll want to undertake is regularly checking the health of the SSD drives used on those machines. Why? Because, even though solid state drives will dramatically outlast rotating platter drives, they do have a finite lifespan. The last thing you want to do is fall victim to that particular end of days.
How do you check the health of those drives? As with everything in Linux, there are options. Although a GUI solution exists (GNOME Disks), I highly recommend going with a command line tool for this task. Why? Most of the time, your Linux servers won't include a GUI; with the command line, you can easily make use of it by securely shelling into your remote Linux server and running your tests from the terminal.
The tool in question is smartctl. With this command, you can quickly glimpse your SSD health. Of course, how much mileage you get from the command will depend upon the make/model of SSD you employ. Unfortunately, the S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) tools aren't always up to date with every SSD drive.
Because of this, you cannot be certain of the number of times your SSD chips have been written to. Even with that in mind, you can get a good estimate of the wear and tear on your drives.
I will be demonstrating with the Ubuntu platform. The required package is found on all the standard repositories, so adjust the installation command to fit your particular distribution of choice.
The smartctl utility is a part of the smartmontools package. This can be installed with a single command:
Do note, the above command will also install libgsasl7, libkyotocabinet16v5, libmailutils5, libntlm0, mailutils, mailutils-common, and postfix.
To use the smartctl tool, the first thing you will want to do is gather information about the drive, which is done via the command:
The above command will print out the details associated with your drive.
As you can see, the drive in question is in the smartctl database, so information should be up-to-date.
Let's run a short test on the drive. These tests will actually give you the most accurate data on your drive (so it's important to use these included tools). Issue the command:
This will immediately report some bits of information.
I recommend you run a short and a long test weekly or (monthly) on your drives. To run a long test, the command is:
One of the first things you should see is the results of the SMART overall health self-assessment test. That should say PASSED. If not, you know, right away there's something wrong with your SSD.
The short test will examine the following:
The long test runs everything included with the short test, while adding:
The short test takes approximately two minutes to complete, whereas the long test will require between 20-60 minutes (depending upon your hardware). To view the results of the test, issue the command sudo smartctl -a /dev/sdX (Where sdX is the name of the drive tested).
The command will print out the test results and all of the information you need to verify the health of your SSD.
Beyond the self-test log, there are two values in the output to be examined:
It is important to look at the value and worst value columns. As you can see, my Samsung SSD is currently at a 99 for Wear_Leveling_Count, which is a very healthy drive.
One thing to keep in mind is that different manufacturers will report different data with smartctl. For example, I have an older Intel and Kingston SSD drives attached to the same machine. Both of these drives report similar (and more comprehensive) data. However, neither report the Wear_Leveling_Count. Why? These are both older drives and do not report ID 177 (Wear_Leveling_Count). Instead, your best bet is to run both the short and long tests and verify the health of your drives via those reports.
First off, it's easy to misinterpret the reported data. Because of this, you must know the make and model of the drive you are testing. Once you have that information in hand, you can research any anomalies with reported data.
Second, it is crucial to make use of the testing tools. Although you can run a command like smartctl -A /dev/sdX, you don't get the added benefit of the testing results. Make sure to regularly run the short and long tests, to get the most up-to-date information on your SSD drives as you can.