Table of Contents

Introduction

The first section references where we got this data, and the second shows how to manipulate the source data into something Python likes better. This dataset was then loaded online for public acces. So you can skip these sections and jump straight to plotting the time series data (which will reload the now "cleaned" data).

Data Source

Data was downloaded from the Case Western Reserve Bearing Data Center where they did some seeded fault testing and shared the data. This data can be downloaded here. An overview of the procedure is here.

For simplicity we will look at 4 data files:

All of these were when the motor was running at approximately 1772 RPM. The numbers in the fault file names signify the fault diameter in thousands of an inch. For example Fault_014 is a fan end bearing fault in the inner race measuring 0.014" in diameter.

In the appendix I show you how to pull out the data of interest from the source .MAT files. But I've uploaded the processed data into a CSV file which can be loaded directly from online. So we can skip those earlier steps (but they've been included for reference).

Plot Time Series Data

First let's plot the full dataset but render it as a SVG because it is so big.

Now let's get the full benefit of Plotly and render a the middle 1 second of the file as an interactive element. We'll also subtract 4.5 seconds from the time domain data so that these middle plots start at 0 and look good.

Analysis in the Time Domain

Easy Calculations of Peak, RMS and Crest Factor

The peak will be applied after first finding the absolute value to ensure we don't ignore large negative values.

$$peak=max(\left | a \right |)$$

Then the RMS is a simple square root of the mean of all the values squared.

$$rms = \sqrt{(\frac{1}{n})\sum_{i=1}^{n}(a_{i})^{2}}$$

The crest factor is equal to the peak divided by the RMS. $$crest = \frac{peak}{rms}$$

Notice how close standard deviation is to RMS, let's quantify what percentage error exists between the two.

Calculate for Pure Sine Wave

Integration to Velocity & Displacement

Now that we did the "easy" calculations let's get the integral of the acceleration data for velocity & displacement. The following functions define how these calculations are done which include doing some filters and applying a windowing function.

First Compare Simple Sine Wave

To demonstrate the need for filters & a taper, let's compute the integral with and without applying those first.

Calculate for Bearing Data

Now let's bring these numpy arrays back into a pandas dataframe for both velocity and displacement and add time back in. In this step we'll also apply a scaler to convert units to in/s and in. Our units are currently in g's which equals 9.81 m/s^2. So to convert to inches we need to first get to meters, then to inches: $$meters/sec2 = g*9.81$$ $$inch = meters*39.3701$$ $$inch/sec2=g*9.81*39.3701=g*386.2205$$

Plot Integrated Velocity & Displacement

First let's plot the full time range to make sure we don't have any funny business with the start and end of the signal. If there are issues, you'll need to adjust the taper (closer to 1), filter order (lower) and filter frequencies (bring the highpass up and the lowpass down).

The following can be used to plot the full signal, but this may take a few moments.

fig = px.line(df_vel,
             labels={
                     "Time": "Time (s)",
                     "value": "Velocity (in/s)",
                     "variable": "Data Set"
                 },
             title="Comparison of Bearing Velocity Data")
fig.show(renderer='svg',width=900, height=450)

fig = px.line(df_displ,
             labels={
                     "Time": "Time (s)",
                     "value": "Displacement (in)",
                     "variable": "Data Set"
                 },
             title="Comparison of Bearing Displacement Data")
fig.show(renderer='svg',width=900, height=450)

Looks good! Now let's plot a more interactive version in the middle section.

Calculating RMS of Velocity and Displacement

Now that we've calculated the integrated time series of velocity and displacement we can easily calculate the RMS of these. Note that we will focus on the middle 8 seconds to ignore the first and last second because of some weird artifacts that can occur during integration.

Analysis in the Frequency Domain

FFT Function

I don't know why but the scipy FFT function isn't as clean to interface with as the PSD one. So this function will take in a dataframe with the index values being time and return a dataframe with the column values being the real FFT results. I added the ability to return a dataframe of the phase response but... take that with a grain of salt.

Compare FFT and PSD Results for Simple Sine Wave

Compare FFT and PSD of Bearing Data

FFT of Bearing Data

Calculate PSD

The trouble with the FFT is how many points we need to present and that this peak we're finding doesn't seem to have any correlation to the actual conditions. It's too dependent on a bit of chance with where the FFT frequencies lie. A PSD on the other hand normalizes the amplitude to a frequency bin allowing us to widen this bin so we can find the energy density.

In the following function we loop through and create several FFTs defined by length nFFT, then average them together before normalizing to the bin width. I like doing bin widths of 1 Hz which could be accomplished by setting nFFT to the sample rate.

Peak from PSD

Show How PSD Relates to RMS

The nice thing about a PSD (in addition to the easy control of the bin width) is that the area directly relates to the RMS level in the time domain. The equation is as follows.

$$ g_{RMS}=\sqrt{\int PSD(f),df} $$

Let's demonstrate by quickly using the PSD just calculated, integrating, and taking the square root and compare to the values we calculated from the time domain.

That is pretty spot on! This also let's us conveniently plot over frequency how the RMS builds.

Calculate Velocity & Displacement from PSD

Another nice thing about the PSD is how easy it is to calculate a velocity and displacement PSD from the acceleration one. We showed you can calculate velocity and displacement in the time domain but remember how finicky the calculation was to how we applied the filters and window? It also was relatively heavy from a computation perspective. If we have to get the PSD anyways, the added computation to get velocity and displacement is trivial. Remember we need to do a unit conversion here too and take special care to remember that we are in g^2.

$$Velocity\hspace{3mm}PSD = \frac{Acceleration\hspace{3mm}PSD}{w^{2}}=\frac{Acceleration\hspace{3mm}PSD}{(2*f*\pi)^{2}}$$$$Displacement\hspace{3mm}PSD = \frac{Acceleration\hspace{3mm}PSD}{w^{4}}=\frac{Acceleration\hspace{3mm}PSD}{(2*f*\pi)^{4}}$$

PSD

The following will first calculate the PSD for velocity and displacement, and then plot it.

RMS from PSD

Now that we have the PSD, calculating the RMS is easy as before. But there is a big caveat... we need to pay attention to the low frequency "stuff" like we did when integrating in the time domain. To illustrate let's first calculate the RMS and compare to what we calculated in the time domain for the entire PSD.

You can see that the comparison of the two waveforms that had a lot of lower frequency content is WAY off compared to what we did in the time domain. But remember in the time domain we applied a pretty aggressive 5 Hz high pass filter, so let's compare what the RMS value is if we skip all data at or below 5 Hz. This let's us get within +/- 10% with all metrics - not bad!

So which approach is more correct? Well... it depends. When you go the time domain route, it can be a little easier to see your integration is bad when it is "running away" from 0. So you can adjust the filters and repeat. But this can be computationally heavy. When you do it with the PSD you can readily see how the frequency content is impacting your results. But the results aren't necessarily as intuitive to understand. So both approaches are right! It's good to do this check between the two as I've done. One note for those that are paying especially close attention to detail... I used a hanning window with the PSD calcuation after some trial and error to find good agreement with the two different calculation techniques. At first I was using a rectangular window which has terrible frequency leakage that resulted in my integrated results being especially bad.

Build Octave Spaced PSD

You can see from above how easy it is to calculate the RMS based of the PSD. This lets us pick bin sizes that are logarithmically spaced for convenient plotting.

Build Frequency Ranges

So first we'll use a function to build a log spaced octave frequency ranges. The function let's you pass in what spacing you'd like, for example 1/3 octave is often used.

Now that we have the lower and upper ranges defined we can find the RMS within each frequency bin. This function requires that the PSD being passed in is linearly spaced.

Compare PSD of Linear and Log Spaced Bins

Now we can plot the two PSDs together to see how they look on one plot together which requires Plotly's graph objects and some helper functions. Note that there may be some frequency ranges that have 0s so we will drop those so they don't mess up our pretty plot.

Plot Cumulative RMS of Linear & Log Spaced PSDs

Remember how spot on our RMS calculation was from the linear spaced PSD, let's see how it looks with the log spaced.

Find RMS within Arbitrary Range

Now that we have verified the cumulative RMS is spot on, let's build a dataframe that has values for the RMS within each range and add appropriate labels to pull the upper and lower frequency range into the label.

Reorganize the dataframe so it will be easy to combine with other stats later.

Calculate Pseudo Velocity Shock Spectrum & Peak

There is some funny business going on with lower frequencies that aren't going to be very relevant to us. So let's ignore everything under 20 Hz, and then find the peak pseudo velocity and corresponding frequency.

Compare Stats

Table

First we can do a simple table to compare by combining it all together and transposing it, then printing the dataframe.

Plot Each Variable with a Drop Down

Now we can expose a way to have a bar chart of each metric, but let you chose in a dropdown which metric to display. This is from a nice person on stack overflow - amazing.

Plot All Stats in One Dashboard

The great thing about plotly and dataframes is how easy it is to generate SWEET plots. In this final example, I am going to generate a simple dashboard to show each plot separately in one view. It first requires us to restructure the dataframe to have just three columns: value, stat, dataset. Then generating the facet plot is pretty easy!

Appendix

Installation

You can get most of what you need with the free anaconda edition that will include some of the core libraries like pandas, numpy and scipy. To download visit the anaconda individual edition page.

There are several different interfaces for writing your code but this was all generated in Jupyter which lends itself well to documenting a test/analysis report. It also allows for easy export to HTML for sharing interactive reports.

Modules

There will be some libraries that aren't included that you may need. To install type the following in Jupyter and replace plotly (which you'll need) with whatever library name you're hoping to install.

!pip install plotly

Ploting in Jupyter

Matplotlib

Matplotlib is the default plotting library and built into pandas. You can simply add .plot() to the end of your dataframe to plot it. These won't be interactive though and the default size is too small. So the following shows how to make the plot bigger if you'd like to use this.

import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (20,10)
df_accel.plot()
df_vel.plot()
df_displ.plot()

Plotly (My Favorite)

Save Plot to Chart Studio

You will need to replace the api_key with your own account information. I didn't end up going this route but wanted to show how this can be done.

import chart_studio
chart_studio.tools.set_credentials_file(username='your_user_name', api_key='your_api_key')
import chart_studio.plotly as py
py.plot(fig, filename = 'your_filename', auto_open=True)

Save Plot as HTML and PNG

fig.write_html("C:/Users/shanly/bearing-time-series.html",full_html=False,include_plotlyjs='cdn')
fig.write_image("C:/Users/shanly/bearing-time-series.png")
fig.write_image("C:/Users/shanly/bearing-time-series.svg")

Manipulating .MAT Files from Case Western into Pandas CSV

The data is saved as .mat files so we need to load them and convert to a handy DataFrame. This is a bit convoluted based on how they named the variables within each .mat file. We are going to only be getting the "Drive End" accelerometer data which has "DE" in the variable name.

import scipy.io
import pandas as pd
import numpy as np
def load_mat(name,var):
    dic = scipy.io.loadmat("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/"+name+".mat")
    con_list = [[element for element in upperElement] for upperElement in dic[var]]
    df = pd.DataFrame(con_list)
    return df

Normal = load_mat('Normal','X098_DE_time')
Fault_007 = load_mat('Fault_007','X279_DE_time')
Fault_014 = load_mat('Fault_014','X275_DE_time')
Fault_021 = load_mat('Fault_021','X271_DE_time')

#Merge into 1 Dataframe
df = pd.concat([Fault_021,Fault_014,Fault_007,Normal],axis=1)

#Truncate to first 120000 points
df=df.loc[:119999]

# Create Column Names
df.columns = ['Fault_021','Fault_014','Fault_007','Normal']

# Add Column of Time
df['Time'] = np.linspace(0,120000,num=120000)/12000
df = df.set_index('Time')
print(df.head())
df.reset_index().to_csv(path_or_buf="C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/Bearing_Data.csv",index=False)