Hard Drive Exercises
These exercises require this ZIP file of harddrive data collected from the Backblaze storage racks in 2016.
Annualized Failure Rate
Calculate the annualized failure rate of all drives in 2016.
You can calculate the annualized drive failure rate by:
FAILURES / DRIVE_DAYS * 365
Where:
FAILURES is the total number of drive failures experienced (a drive failure occurs when the
failurecolumn is1)DRIVE_DAYS is the total number of drives in the system on each day (number of data rows in all CSV files)
Top 10 Models
What were the 10 most common hard drive models in Backblazes system in 2016?
Failure Rate Per Drive
Calculate the annualized failure rate for each drive model.
Only include models with at least 4TB of storage and 10,000 drive days in 2016.
Note that Hitachi changed its name to HGST.
Read & Truncate Original CSV Files
Download the four 2016 Q1 through Q4 data ZIP files from the Backblaze Hard Drive Test Data page.
Create a Python script that:
Opens these ZIP files
Reads the CSV files within them
Writes the CSV files back to disk with the same name, writing only the second through fifth columns (
serial_number,model,capacity_bytes,failure)
Each CSV file should end up being about 3.1MB in size, for a total of about 1.1GB of CSV files.