The first section references where we got this data, and the second shows how to manipulate the source data into something Python likes better. This dataset was then loaded online for public acces. So you can skip these sections and jump straight to plotting the time series data (which will reload the now "cleaned" data).
Data was downloaded from the Case Western Reserve Bearing Data Center where they did some seeded fault testing and shared the data. This data can be downloaded here. An overview of the procedure is here.
For simplicity we will look at 4 data files:
All of these were when the motor was running at approximately 1772 RPM. The numbers in the fault file names signify the fault diameter in thousands of an inch. For example Fault_014 is a fan end bearing fault in the inner race measuring 0.014" in diameter.
In the appendix I show you how to pull out the data of interest from the source .MAT files. But I've uploaded the processed data into a CSV file which can be loaded directly from online. So we can skip those earlier steps (but they've been included for reference).
import pandas as pd
df = pd.read_csv("https://info.endaq.com/hubfs/Plots/bearing_data.csv")
df = df.set_index('Time')
fs = len(df)/(df.index[-1]-df.index[0])
print(fs)
df
12000.0
Fault_021 | Fault_014 | Fault_007 | Normal | |
---|---|---|---|---|
Time | ||||
0.000000 | -0.105351 | -0.074395 | 0.053116 | 0.046104 |
0.000083 | 0.132888 | 0.056365 | 0.116628 | -0.037134 |
0.000167 | -0.056535 | 0.201257 | 0.083654 | -0.089496 |
0.000250 | -0.193178 | -0.024528 | -0.026477 | -0.084906 |
0.000333 | 0.064879 | -0.072284 | 0.045319 | -0.038594 |
... | ... | ... | ... | ... |
9.999667 | 0.095754 | 0.145055 | -0.098923 | 0.064254 |
9.999750 | -0.123083 | 0.092263 | -0.067573 | 0.070721 |
9.999833 | -0.036508 | -0.168120 | 0.005685 | 0.103265 |
9.999917 | 0.097006 | -0.035898 | 0.093400 | 0.124335 |
10.000000 | -0.008762 | 0.165846 | 0.130923 | 0.114947 |
120000 rows × 4 columns
First let's plot the full dataset but render it as a SVG because it is so big.
import plotly.express as px
fig = px.line(df,
labels={
"Time": "Time (s)",
"value": "Acceleration (g)",
"variable": "Data Set"
},
title="Comparison of Bearing Acceleration Data")
fig.write_image("C:/Users/shanly/bearing-time-series.svg")
fig.show(renderer='svg',width=900, height=450)
Now let's get the full benefit of Plotly and render a the middle 1 second of the file as an interactive element. We'll also subtract 4.5 seconds from the time domain data so that these middle plots start at 0 and look good.
plot_start = 3.5
plot_end = 4.001
fig = px.line(df.loc[plot_start:plot_end],
labels={
"Time": "Time (s)",
"value": "Acceleration (g)",
"variable": "Data Set"
},
title="Interactive Comparison of Bearing Acceleration Data")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/bearing-time-series.html",full_html=False,include_plotlyjs='cdn')
fig.show()
The peak will be applied after first finding the absolute value to ensure we don't ignore large negative values.
$$peak=max(\left | a \right |)$$Then the RMS is a simple square root of the mean of all the values squared.
$$rms = \sqrt{(\frac{1}{n})\sum_{i=1}^{n}(a_{i})^{2}}$$The crest factor is equal to the peak divided by the RMS. $$crest = \frac{peak}{rms}$$
def calc_rms(df):
rms = df.copy()**2
rms = rms.mean()**0.5
return rms
def stat_calc(df):
df_stats = pd.concat([df.abs().max(),calc_rms(df)],axis=1)
df_stats.columns = ['Acceleration Peak (g)','Acceleration RMS (g)']
df_stats['Crest Factor'] = df_stats['Acceleration Peak (g)'] / df_stats['Acceleration RMS (g)']
df_stats['Standard Deviation (g)'] = df.std()
df_stats.index.name = 'Data Set'
return df_stats
df_stats = stat_calc(df)
df_stats.round(3)
Acceleration Peak (g) | Acceleration RMS (g) | Crest Factor | Standard Deviation (g) | |
---|---|---|---|---|
Data Set | ||||
Fault_021 | 1.038 | 0.199 | 5.222 | 0.198 |
Fault_014 | 1.339 | 0.158 | 8.484 | 0.158 |
Fault_007 | 0.650 | 0.121 | 5.361 | 0.121 |
Normal | 0.269 | 0.066 | 4.081 | 0.065 |
Notice how close standard deviation is to RMS, let's quantify what percentage error exists between the two.
(df_stats['Acceleration RMS (g)']-df_stats['Standard Deviation (g)'])/df_stats['Acceleration RMS (g)']*100
Data Set Fault_021 0.189715 Fault_014 0.014539 Fault_007 0.029208 Normal 1.338447 dtype: float64
import numpy as np
def build_sine_wave(fs,length,offset):
x = np.arange(0, length, 1/fs)
y = np.sin(x*7*2*np.pi)+offset
name = str(fs)+' Hz, '+str(offset) + 'g Bias'
df = pd.DataFrame({'Time (s)':x,
name:y})
return df.set_index('Time (s)')
df_sine = pd.concat([build_sine_wave(100,2.01,0),
build_sine_wave(29,2.01,0),
build_sine_wave(29,2.01,1),
build_sine_wave(15,2.01,0),
],axis=1)
import plotly.graph_objects as go
fig = go.Figure()
for c in df_sine.columns:
df_t = df_sine[c].dropna()
fig.add_trace(go.Scatter(
x=df_t.index,
y=df_t,
mode='lines+markers',
name=c
))
fig.update_layout(
title="Compare 1g 7 Hz Sine Wave Sampled at Different Rates",
xaxis_title="Time (s)",
yaxis_title="Acceleration (g)",
legend_title="Sample Frequency and DC Bias",
)
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/pure-sine-wave.html",full_html=False,include_plotlyjs='cdn')
fig.show()
stat_calc(df_sine).round(3)
Acceleration Peak (g) | Acceleration RMS (g) | Crest Factor | Standard Deviation (g) | |
---|---|---|---|---|
Data Set | ||||
100 Hz, 0g Bias | 1.000 | 0.705 | 1.418 | 0.707 |
29 Hz, 0g Bias | 0.999 | 0.701 | 1.424 | 0.707 |
29 Hz, 1g Bias | 1.999 | 1.221 | 1.636 | 0.707 |
15 Hz, 0g Bias | 0.995 | 0.696 | 1.430 | 0.707 |
Now that we did the "easy" calculations let's get the integral of the acceleration data for velocity & displacement. The following functions define how these calculations are done which include doing some filters and applying a windowing function.
import numpy as np
import scipy.integrate
import scipy.signal
import itertools
def _highpass(array, fs, cutoff=[1.,2000], axis=-1,filt_order=3):
if cutoff[1]>=fs/2:
cutoff[1]=fs/2.1
if cutoff[0]==0:
return array
"""Apply a highpass filter to an array."""
array = np.moveaxis(array, axis, -1)
sos_coeffs = scipy.signal.butter(
N=filt_order, Wn=cutoff, btype='bandpass', fs=fs, output='sos',
)
init_state = scipy.signal.sosfilt_zi(sos_coeffs)
for _ in range(2):
init_fwd = init_state * array[(Ellipsis, 0) + ((None,)*init_state.ndim)]
init_fwd = np.moveaxis(init_fwd, array.ndim-1, 0)
array, _zo = scipy.signal.sosfilt(sos_coeffs, array, axis=-1, zi=init_fwd)
array = array[..., ::-1]
return np.moveaxis(array, -1, axis)
def _integrate(array, dt, axis=-1, cutoff=[1.,2000],filt_order=3,alpha=1):
"""Integrate data over an axis."""
window = scipy.signal.tukey(len(array),alpha=alpha)
array = np.transpose(window*np.transpose(array))
result = scipy.integrate.cumtrapz(array, dx=dt, initial=0, axis=axis)
result = _highpass(result, 1/dt, cutoff=cutoff, axis=axis,filt_order=filt_order)
return result
def iter_integrals(array, dt, axis=-1, cutoff=[1.,2000],filt_order=3,alpha=1):
"""Iterate over conditioned integrals of the given original data."""
array = _highpass(array, fs=1/dt, cutoff=cutoff,axis=axis,filt_order=filt_order)
while True:
array.setflags(write=False) # should NOT mutate shared data
yield array
array.setflags(write=True) # array will be replaced below -> now ok to edit
array = _integrate(array, dt, axis=axis, cutoff=cutoff,filt_order=filt_order,alpha=alpha)
To demonstrate the need for filters & a taper, let's compute the integral with and without applying those first.
df_sine = build_sine_wave(100,2.41,0)
df_sine.columns = ['Pure']
df_sine['Noise'] = np.random.normal(0, .5, df_sine['Pure'].shape) + df_sine['Pure']
df_sine = df_sine[['Pure','Noise']].dropna()
fig = px.line(df_sine,
labels={
"Time": "Time (s)",
"value": "Acceleration (g)",
"variable": "Data Set"
},
title="Comparison of 1g 7 Hz Sine Wave With & Without Noise")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/pure-sine-noise-time-series.html",full_html=False,include_plotlyjs='cdn')
fig.show()
accel, vel, displ = itertools.islice(iter_integrals(df_sine.to_numpy(),
dt=1/100,
cutoff=[0,100/3],
filt_order=3,
alpha=.0, #apply a taper at start and end, 1 is a hanning, 0 is a rectangular
axis=0), 3)
def build_df(df,array,scale):
df[df.columns] = array*scale
return df
df_accel_sine = build_df(df_sine.copy(),accel,1.0)
df_vel_sine = build_df(df_sine.copy(),vel,386.2205)
df_displ_sine = build_df(df_sine.copy(),displ,386.2205)
accel, vel, displ = itertools.islice(iter_integrals(df_sine.to_numpy(),
dt=1/100,
cutoff=[0,100/3],
filt_order=8,
alpha=.2, #apply a taper at start and end, 1 is a hanning, 0 is a rectangular
axis=0), 3)
df_accel_sine2 = build_df(df_sine.copy(),accel,1.0)
df_vel_sine2 = build_df(df_sine.copy(),vel,386.2205)
df_displ_sine2 = build_df(df_sine.copy(),displ,386.2205)
accel, vel, displ = itertools.islice(iter_integrals(df_sine.to_numpy(),
dt=1/100,
cutoff=[3,100/3],
filt_order=8,
alpha=.2, #apply a taper at start and end, 1 is a hanning, 0 is a rectangular
axis=0), 3)
df_accel_sine3 = build_df(df_sine.copy(),accel,1.0)
df_vel_sine3 = build_df(df_sine.copy(),vel,386.2205)
df_displ_sine3 = build_df(df_sine.copy(),displ,386.2205)
df_vel_sine_combo = pd.concat([df_vel_sine,df_vel_sine2,df_vel_sine3],axis=1)
df_vel_sine_combo.columns = ['Pure','Noise','Pure with a Taper','Noise with a Taper','Pure with Filters & Taper','Noise with Filters & Taper']
fig = px.line(df_vel_sine_combo,
labels={
"Time": "Time (s)",
"value": "Velocity (in/s)",
"variable": "Data Set"
},
title="Comparison of Pure Sine Wave Integrated Velocity Data")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/pure-sine-velocity-time-series.html",full_html=False,include_plotlyjs='cdn')
fig.show()
df_displ_sine_combo = pd.concat([df_displ_sine,df_displ_sine2,df_displ_sine3],axis=1)
df_displ_sine_combo.columns = ['Pure','Noise','Pure with a Taper','Noise with a Taper','Pure with Filters & Taper','Noise with Filters & Taper']
fig = px.line(df_displ_sine_combo,
labels={
"Time": "Time (s)",
"value": "Displacement (in)",
"variable": "Data Set"
},
title="Comparison of Pure Sine Wave Integrated Displacement Data")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/pure-sine-displacement-time-series.html",full_html=False,include_plotlyjs='cdn')
fig.show()
Now let's bring these numpy arrays back into a pandas dataframe for both velocity and displacement and add time back in. In this step we'll also apply a scaler to convert units to in/s and in. Our units are currently in g's which equals 9.81 m/s^2. So to convert to inches we need to first get to meters, then to inches: $$meters/sec2 = g*9.81$$ $$inch = meters*39.3701$$ $$inch/sec2=g*9.81*39.3701=g*386.2205$$
accel, vel, displ = itertools.islice(iter_integrals(df.to_numpy(),
dt=1/fs,
cutoff=[5,fs/3],
filt_order=5,
alpha=.1, #apply a taper at start and end, 1 is a hanning, 0 is a rectangular
axis=0), 3)
def build_df(df,array,scale):
df[df.columns] = array*scale
return df
df_accel = build_df(df.copy(),accel,1.0)
df_vel = build_df(df.copy(),vel,386.2205)
df_displ = build_df(df.copy(),displ,386.2205)
First let's plot the full time range to make sure we don't have any funny business with the start and end of the signal. If there are issues, you'll need to adjust the taper (closer to 1), filter order (lower) and filter frequencies (bring the highpass up and the lowpass down).
The following can be used to plot the full signal, but this may take a few moments.
fig = px.line(df_vel,
labels={
"Time": "Time (s)",
"value": "Velocity (in/s)",
"variable": "Data Set"
},
title="Comparison of Bearing Velocity Data")
fig.show(renderer='svg',width=900, height=450)
fig = px.line(df_displ,
labels={
"Time": "Time (s)",
"value": "Displacement (in)",
"variable": "Data Set"
},
title="Comparison of Bearing Displacement Data")
fig.show(renderer='svg',width=900, height=450)
Looks good! Now let's plot a more interactive version in the middle section.
fig = px.line(df_vel.loc[plot_start:plot_end],
labels={
"Time": "Time (s)",
"value": "Velocity (in/s)",
"variable": "Data Set"
},
title="Comparison of Bearing Velocity Data")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/bearing-velocity-time-series.html",full_html=False,include_plotlyjs='cdn')
fig.show()
fig = px.line(df_displ.loc[plot_start:plot_end],
labels={
"Time": "Time (s)",
"value": "Displacement (in)",
"variable": "Data Set"
},
title="Comparison of Bearing Displacement Data")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/bearing-displacement-time-series.html",full_html=False,include_plotlyjs='cdn')
fig.show()
Now that we've calculated the integrated time series of velocity and displacement we can easily calculate the RMS of these. Note that we will focus on the middle 8 seconds to ignore the first and last second because of some weird artifacts that can occur during integration.
df_stats_2 = pd.concat([calc_rms(df_vel[-4:4]),calc_rms(df_displ[-4:4])],axis=1)
df_stats_2.columns = ['Velocity RMS (in/s)','Displacement RMS (in)']
df_stats_2
Velocity RMS (in/s) | Displacement RMS (in) | |
---|---|---|
Fault_021 | 0.007972 | 0.000054 |
Fault_014 | 0.008430 | 0.000010 |
Fault_007 | 0.006666 | 0.000011 |
Normal | 0.026376 | 0.000083 |
I don't know why but the scipy FFT function isn't as clean to interface with as the PSD one. So this function will take in a dataframe with the index values being time and return a dataframe with the column values being the real FFT results. I added the ability to return a dataframe of the phase response but... take that with a grain of salt.
from scipy.fft import fft, fftfreq
def get_fft(df):
N=len(df)
fs = len(df)/(df.index[-1]-df.index[0])
x_plot= fftfreq(N, 1/fs)[:N//2]
df_fft = pd.DataFrame()
df_phase = pd.DataFrame()
for name in df.columns:
yf = fft(df[name].values)
y_plot= 2.0/N * np.abs(yf[0:N//2])
'''
phase = np.unwrap(2 * np.angle(yf)) / 2 * 180/np.pi
df_phase = pd.concat([df_phase,
pd.DataFrame({'Frequency (Hz)':x_plot[1:],
name:phase[1:n]}).set_index('Frequency (Hz)')],axis=1)
'''
df_fft = pd.concat([df_fft,
pd.DataFrame({'Frequency (Hz)':x_plot[1:],
name:y_plot[1:]}).set_index('Frequency (Hz)')],axis=1)
return df_fft
def build_two_sine_wave(fs,length):
x = np.arange(0, length, 1/fs)
y = np.sin(x*7*2*np.pi)+np.sin(x*13*2*np.pi)
name = str(length) + ' s'
df = pd.DataFrame({'Time (s)':x,
name:y})
return df.set_index('Time (s)')
two_sine = build_two_sine_wave(100,10.0)
print(calc_rms(two_sine))
fig = px.line(two_sine,
labels={
"value": "Acceleration (g)",
"variable": "Data Set"
},
title="Time History of a Signal with Two Sine Tones (7 & 13 Hz)")
fig.update_layout(showlegend=False)
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/time-two-tone-sine-waves.html",full_html=False,include_plotlyjs='cdn')
fig.show()
10.0 s 1.0 dtype: float64
fig = px.line(get_fft(build_two_sine_wave(100,10.0)),
labels={
"value": "Acceleration (g)",
"variable": "Data Set"
},
title="FFT of a Signal with Two Sine Tones")
fig.update_layout(showlegend=False)
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/fft-simple-sine-waves.html",full_html=False,include_plotlyjs='cdn')
fig.show()
df_sine_fft = pd.concat([
get_fft(build_two_sine_wave(100,1.1)),
get_fft(build_two_sine_wave(100,5.5)),
get_fft(build_two_sine_wave(100,10.5)),
],axis=1)
fig = go.Figure()
for c in df_sine_fft.columns:
df_t = df_sine_fft[c].dropna()
fig.add_trace(go.Scatter(
x=df_t.index,
y=df_t,
#mode='lines+markers',
name=c
))
fig.update_layout(
title="Compare FFT of a Signal with Two Sine Tones for Different Lengths",
xaxis_title="Frequency (Hz)",
yaxis_title="Acceleration (g)",
legend_title="Recording Length",
)
#fig.update_yaxes(type="log")
#fig.update_yaxes(range=[-3, 0])
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/pure-sine-wave-fft-leakage.html",full_html=False,include_plotlyjs='cdn')
fig.show()
import scipy.signal as signal
def get_psd(df,bin_width):
fs = len(df)/(df.index[-1]-df.index[0])
f, psd = signal.welch(df.to_numpy(),
fs=fs,
nperseg=fs/bin_width,
window='hanning',
axis=0
)
df_psd = pd.DataFrame(psd,columns=df.columns)
df_psd.columns
df_psd['Frequency (Hz)'] = f
df_psd = df_psd.set_index('Frequency (Hz)')
return df_psd[1:] #drop the first value because it makes the plots look bad and is effectively 0
df_sine_psd = pd.concat([
get_psd(build_two_sine_wave(100,1.1),1),
get_psd(build_two_sine_wave(100,5.5),1),
get_psd(build_two_sine_wave(100,10.5),1),
],axis=1)
fig = go.Figure()
for c in df_sine_psd.columns:
df_t = df_sine_psd[c].dropna()
fig.add_trace(go.Scatter(
x=df_t.index,
y=df_t,
#mode='lines+markers',
name=c
))
fig.update_layout(
title="Compare PSD of a Signal with Two Sine Tones for Different Lengths",
xaxis_title="Frequency (Hz)",
yaxis_title="Acceleration (g^2/Hz)",
legend_title="Recording Length",
)
#fig.update_yaxes(type="log")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/pure-sine-wave-psd.html",full_html=False,include_plotlyjs='cdn')
fig.show()
psd_explained = df_sine_psd.copy()
psd_explained['PSD Value (g^2/Hz)'] = psd_explained['10.5 s']
psd_explained['Cumulative Sum (g^2/Hz)'] = psd_explained['10.5 s'].cumsum()
psd_explained = psd_explained[['PSD Value (g^2/Hz)','Cumulative Sum (g^2/Hz)']].dropna()
psd_explained['Multiply By Bin (g^2)'] = psd_explained['Cumulative Sum (g^2/Hz)']*(psd_explained.index[1]-psd_explained.index[0])
psd_explained['Square Root (g)'] = psd_explained['Multiply By Bin (g^2)']**0.5
psd_explained.index = np.around(psd_explained.index)
psd_explained[5:9].round(3)
PSD Value (g^2/Hz) | Cumulative Sum (g^2/Hz) | Multiply By Bin (g^2) | Square Root (g) | |
---|---|---|---|---|
Frequency (Hz) | ||||
5.0 | 0.000 | 0.000 | 0.000 | 0.000 |
6.0 | 0.083 | 0.083 | 0.083 | 0.289 |
7.0 | 0.333 | 0.416 | 0.417 | 0.645 |
8.0 | 0.083 | 0.500 | 0.500 | 0.707 |
9.0 | 0.000 | 0.500 | 0.500 | 0.707 |
df_fft = pd.concat([
get_fft(df[['Normal']][0:.125]),
get_fft(df[['Normal']]),
],axis=1)
df_fft.columns=['0.125 Second','10 Seconds']
fig = go.Figure()
for c in df_fft.columns:
df_t = df_fft[c].dropna()
fig.add_trace(go.Scatter(
x=df_t[20:1000].index,
y=df_t[20:1000],
#mode='lines+markers',
name=c
))
fig.update_layout(
title="Compare FFT of Normal Bearing for Different Lengths",
xaxis_title="Frequency (Hz)",
yaxis_title="Acceleration (g)",
legend_title="Recording Length",
)
fig.update_yaxes(type="log")
fig.update_xaxes(type="log")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/bearing-fft-compare.html",full_html=False,include_plotlyjs='cdn')
fig.show()
df_psd = pd.concat([
get_psd(df[['Normal']][0:.125],16),
get_psd(df[['Normal']],16),
],axis=1)
df_psd.columns=['0.125 Second','10 Seconds']
fig = go.Figure()
for c in df_psd.columns:
df_t = df_psd[c].dropna()
fig.add_trace(go.Scatter(
x=df_t[20:1000].index,
y=df_t[20:1000],
#mode='lines+markers',
name=c
))
fig.update_layout(
title="Compare PSD of Normal Bearing for Different Lengths",
xaxis_title="Frequency (Hz)",
yaxis_title="Acceleration (g^2/Hz)",
legend_title="Recording Length",
)
fig.update_yaxes(type="log")
fig.update_xaxes(type="log")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/bearing-psd-compare.html",full_html=False,include_plotlyjs='cdn')
fig.show()
fig = px.line(get_fft(df),
labels={
"value": "Acceleration (g)",
"variable": "Data Set"
},
title="FFT Comparison of Bearing Data")
fig.write_image("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/bearing-fft.svg")
fig.show(renderer='svg',width=900, height=450)
The trouble with the FFT is how many points we need to present and that this peak we're finding doesn't seem to have any correlation to the actual conditions. It's too dependent on a bit of chance with where the FFT frequencies lie. A PSD on the other hand normalizes the amplitude to a frequency bin allowing us to widen this bin so we can find the energy density.
In the following function we loop through and create several FFTs defined by length nFFT, then average them together before normalizing to the bin width. I like doing bin widths of 1 Hz which could be accomplished by setting nFFT to the sample rate.
df_psd = get_psd(df,1)
fig = px.line(df_psd,
labels={
"value": "Acceleration (g^2/Hz)",
"variable": "Data Set"
},
log_x=True,
log_y=True,
title="PSD Comparison of Bearing Data")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/bearing-psd-normal-faulty.html",full_html=False,include_plotlyjs='cdn')
fig.show()
df_psd_peaks = pd.concat([df_psd.idxmax(),df_psd.max()],axis=1)
df_psd_peaks.columns = ['Peak Frequency (Hz)','Peak Amplitude (g^2/Hz)']
df_psd_peaks
Peak Frequency (Hz) | Peak Amplitude (g^2/Hz) | |
---|---|---|
Fault_021 | 3316.0 | 0.001796 |
Fault_014 | 3237.0 | 0.001713 |
Fault_007 | 1051.0 | 0.000600 |
Normal | 90.0 | 0.000473 |
The nice thing about a PSD (in addition to the easy control of the bin width) is that the area directly relates to the RMS level in the time domain. The equation is as follows.
$$ g_{RMS}=\sqrt{\int PSD(f),df} $$Let's demonstrate by quickly using the PSD just calculated, integrating, and taking the square root and compare to the values we calculated from the time domain.
def rms_from_psd(df_psd):
d_f = df_psd.index[1] - df_psd.index[0]
df_rms = df_psd.copy()
df_rms = df_rms*d_f
df_rms = df_rms.cumsum()
return(df_rms**0.5)
df_rms = rms_from_psd(df_psd)
rms_compare = pd.concat([df_rms.iloc[-1],df_stats['Acceleration RMS (g)']],axis=1)
rms_compare.columns = ['From PSD','From Time']
rms_compare['Error %'] = (rms_compare['From PSD'] - rms_compare['From Time'])/rms_compare['From Time']*100
rms_compare.round(3)
From PSD | From Time | Error % | |
---|---|---|---|
Data Set | |||
Fault_021 | 0.199 | 0.199 | -0.121 |
Fault_014 | 0.157 | 0.158 | -0.348 |
Fault_007 | 0.122 | 0.121 | 0.558 |
Normal | 0.065 | 0.066 | -1.603 |
That is pretty spot on! This also let's us conveniently plot over frequency how the RMS builds.
fig = px.line(df_rms,
labels={
"value": "Acceleration (g RMS)",
"variable": "Data Set"
},
log_x=True,
title="RMS of PSD of Bearing Data")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/bearing-rms-from-psd.html",full_html=False,include_plotlyjs='cdn')
fig.show()
Another nice thing about the PSD is how easy it is to calculate a velocity and displacement PSD from the acceleration one. We showed you can calculate velocity and displacement in the time domain but remember how finicky the calculation was to how we applied the filters and window? It also was relatively heavy from a computation perspective. If we have to get the PSD anyways, the added computation to get velocity and displacement is trivial. Remember we need to do a unit conversion here too and take special care to remember that we are in g^2.
$$Velocity\hspace{3mm}PSD = \frac{Acceleration\hspace{3mm}PSD}{w^{2}}=\frac{Acceleration\hspace{3mm}PSD}{(2*f*\pi)^{2}}$$$$Displacement\hspace{3mm}PSD = \frac{Acceleration\hspace{3mm}PSD}{w^{4}}=\frac{Acceleration\hspace{3mm}PSD}{(2*f*\pi)^{4}}$$The following will first calculate the PSD for velocity and displacement, and then plot it.
df_vel_psd = df_psd.copy()
df_disp_psd = df_psd.copy()
for x in df_vel.columns:
df_vel_psd[x] = df_vel_psd[x]/((df_vel_psd.index*2*np.pi)**2)*(386.2205**2)
df_disp_psd[x] = df_disp_psd[x]/((df_disp_psd.index*2*np.pi)**4)*(386.2205**2)
fig = px.line(df_vel_psd,
labels={
"value": "Velocity ((in/s)^2/Hz)",
"variable": "Data Set"
},
log_x=True,
log_y=True,
title="Velocity PSD Comparison of Bearing Data")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/bearing-velocity-psd.html",full_html=False,include_plotlyjs='cdn')
fig.show()
fig = px.line(df_disp_psd,
labels={
"value": "Displacement (in^2/Hz)",
"variable": "Data Set"
},
log_x=True,
log_y=True,
title="Displacement PSD Comparison of Bearing Data")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/bearing-displacement-psd.html",full_html=False,include_plotlyjs='cdn')
fig.show()
Now that we have the PSD, calculating the RMS is easy as before. But there is a big caveat... we need to pay attention to the low frequency "stuff" like we did when integrating in the time domain. To illustrate let's first calculate the RMS and compare to what we calculated in the time domain for the entire PSD.
df_vel_rms = rms_from_psd(df_vel_psd)
df_disp_rms = rms_from_psd(df_disp_psd)
rms_compare = pd.concat([df_vel_rms.iloc[-1],
df_stats_2['Velocity RMS (in/s)'],
df_disp_rms.iloc[-1],
df_stats_2['Displacement RMS (in)']],axis=1)
rms_compare.columns = ['Vel From PSD','Vel From Time','Disp From PSD','Disp From Time']
rms_compare['Vel Error %'] = (rms_compare['Vel From PSD'] - rms_compare['Vel From Time'])/rms_compare['Vel From Time']*100
rms_compare['Disp Error %'] = (rms_compare['Disp From PSD'] - rms_compare['Disp From Time'])/rms_compare['Disp From Time']*100
rms_compare
Vel From PSD | Vel From Time | Disp From PSD | Disp From Time | Vel Error % | Disp Error % | |
---|---|---|---|---|---|---|
Fault_021 | 0.048174 | 0.007972 | 0.007035 | 0.000054 | 504.305078 | 12871.556753 |
Fault_014 | 0.009365 | 0.008430 | 0.000381 | 0.000010 | 11.083578 | 3876.364938 |
Fault_007 | 0.007507 | 0.006666 | 0.000457 | 0.000011 | 12.615499 | 4234.235476 |
Normal | 0.044487 | 0.026376 | 0.005306 | 0.000083 | 68.661191 | 6307.600574 |
You can see that the comparison of the two waveforms that had a lot of lower frequency content is WAY off compared to what we did in the time domain. But remember in the time domain we applied a pretty aggressive 5 Hz high pass filter, so let's compare what the RMS value is if we skip all data at or below 5 Hz. This let's us get within +/- 10% with all metrics - not bad!
df_vel_rms = rms_from_psd(df_vel_psd.iloc[6:])
df_disp_rms = rms_from_psd(df_disp_psd.iloc[6:])
rms_compare = pd.concat([df_vel_rms.iloc[-1],
df_stats_2['Velocity RMS (in/s)'],
df_disp_rms.iloc[-1],
df_stats_2['Displacement RMS (in)']],axis=1)
rms_compare.columns = ['Vel From PSD','Vel From Time','Disp From PSD','Disp From Time']
rms_compare['Vel Error %'] = (rms_compare['Vel From PSD'] - rms_compare['Vel From Time'])/rms_compare['Vel From Time']*100
rms_compare['Disp Error %'] = (rms_compare['Disp From PSD'] - rms_compare['Disp From Time'])/rms_compare['Disp From Time']*100
rms_compare
Vel From PSD | Vel From Time | Disp From PSD | Disp From Time | Vel Error % | Disp Error % | |
---|---|---|---|---|---|---|
Fault_021 | 0.008591 | 0.007972 | 0.000054 | 0.000054 | 7.763564 | -0.515379 |
Fault_014 | 0.009036 | 0.008430 | 0.000007 | 0.000010 | 7.177371 | -26.278486 |
Fault_007 | 0.006911 | 0.006666 | 0.000010 | 0.000011 | 3.675197 | -5.301851 |
Normal | 0.027228 | 0.026376 | 0.000087 | 0.000083 | 3.228005 | 4.772056 |
So which approach is more correct? Well... it depends. When you go the time domain route, it can be a little easier to see your integration is bad when it is "running away" from 0. So you can adjust the filters and repeat. But this can be computationally heavy. When you do it with the PSD you can readily see how the frequency content is impacting your results. But the results aren't necessarily as intuitive to understand. So both approaches are right! It's good to do this check between the two as I've done. One note for those that are paying especially close attention to detail... I used a hanning window with the PSD calcuation after some trial and error to find good agreement with the two different calculation techniques. At first I was using a rectangular window which has terrible frequency leakage that resulted in my integrated results being especially bad.
You can see from above how easy it is to calculate the RMS based of the PSD. This lets us pick bin sizes that are logarithmically spaced for convenient plotting.
So first we'll use a function to build a log spaced octave frequency ranges. The function let's you pass in what spacing you'd like, for example 1/3 octave is often used.
def build_oct(start,stop,space):
a = [start]
while a[-1] < stop:
a = np.append(a,[a[-1]*(2**space)])
df = pd.DataFrame(a,columns=['Frequency (Hz)'])
df['Lower'] = df['Frequency (Hz)']/(2**(space/2))
df['Upper'] = df['Frequency (Hz)']*(2**(space/2))
df['Upper'][df['Upper']>stop]=stop
df['Frequency (Hz)'][df['Frequency (Hz)']>stop]=stop
df = df.set_index('Frequency (Hz)')
return df[df['Lower']<stop]
df_oct = build_oct(1,fs/2,1/3)
df_oct
Lower | Upper | |
---|---|---|
Frequency (Hz) | ||
1.000000 | 0.890899 | 1.122462 |
1.259921 | 1.122462 | 1.414214 |
1.587401 | 1.414214 | 1.781797 |
2.000000 | 1.781797 | 2.244924 |
2.519842 | 2.244924 | 2.828427 |
3.174802 | 2.828427 | 3.563595 |
4.000000 | 3.563595 | 4.489848 |
5.039684 | 4.489848 | 5.656854 |
6.349604 | 5.656854 | 7.127190 |
8.000000 | 7.127190 | 8.979696 |
10.079368 | 8.979696 | 11.313708 |
12.699208 | 11.313708 | 14.254379 |
16.000000 | 14.254379 | 17.959393 |
20.158737 | 17.959393 | 22.627417 |
25.398417 | 22.627417 | 28.508759 |
32.000000 | 28.508759 | 35.918786 |
40.317474 | 35.918786 | 45.254834 |
50.796834 | 45.254834 | 57.017518 |
64.000000 | 57.017518 | 71.837571 |
80.634947 | 71.837571 | 90.509668 |
101.593667 | 90.509668 | 114.035036 |
128.000000 | 114.035036 | 143.675142 |
161.269894 | 143.675142 | 181.019336 |
203.187335 | 181.019336 | 228.070072 |
256.000000 | 228.070072 | 287.350284 |
322.539789 | 287.350284 | 362.038672 |
406.374669 | 362.038672 | 456.140144 |
512.000000 | 456.140144 | 574.700569 |
645.079578 | 574.700569 | 724.077344 |
812.749339 | 724.077344 | 912.280287 |
1024.000000 | 912.280287 | 1149.401137 |
1290.159155 | 1149.401137 | 1448.154688 |
1625.498677 | 1448.154688 | 1824.560575 |
2048.000000 | 1824.560575 | 2298.802275 |
2580.318310 | 2298.802275 | 2896.309376 |
3250.997354 | 2896.309376 | 3649.121150 |
4096.000000 | 3649.121150 | 4597.604550 |
5160.636620 | 4597.604550 | 5792.618751 |
6000.000000 | 5792.618751 | 6000.000000 |
Now that we have the lower and upper ranges defined we can find the RMS within each frequency bin. This function requires that the PSD being passed in is linearly spaced.
def add_oct_psd(df_psd,df_oct):
df_oct_psd = df_oct.drop(['Lower','Upper'], axis=1).copy()
for name in df_psd.columns:
df_oct_psd[name]=np.nan
df_oct_rms = df_oct_psd.copy()
f_step = df_psd.index[1]-df_psd.index[0]
for i in range(df_oct.shape[0]):
f_l = df_oct['Lower'].iloc[i]
f_u = df_oct['Upper'].iloc[i]
for j in df_psd.columns:
d_t = df_psd[j][(df_psd.index>=f_l) & (df_psd.index<f_u)]
d_t = d_t*f_step
df_oct_psd[j].iloc[i] = d_t.sum()/(f_u-f_l)
df_oct_rms[j].iloc[i] = d_t.sum()
return df_oct_psd, df_oct_rms.cumsum()**0.5
df_oct_psd,df_oct_rms = add_oct_psd(df_psd,df_oct)
Now we can plot the two PSDs together to see how they look on one plot together which requires Plotly's graph objects and some helper functions. Note that there may be some frequency ranges that have 0s so we will drop those so they don't mess up our pretty plot.
df_oct_psd = df_oct_psd.loc[~(df_oct_psd==0).all(axis=1)] #drop all rows that have a 0
import plotly.graph_objects as go
fig = go.Figure()
def add_trace(df,col,name):
fig.add_trace(go.Scatter(
x=df.index,
y=df[col],
name=name+" - "+col
))
def add_full_df(df,name):
for x in df.columns:
add_trace(df,x,name)
add_full_df(df_psd,'Linear Bins')
add_full_df(df_oct_psd,'Logarithmic Bins')
fig.update_yaxes(type="log")
fig.update_xaxes(type="log")
fig.update_layout(
title="Compare PSDs",
xaxis_title="Frequency (Hz)",
yaxis_title="Acceleration (g^2/Hz)",
legend_title="Legend Title",
)
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/log-psd.html",full_html=False,include_plotlyjs='cdn')
fig.show()
fig = px.line(df_oct_psd,
labels={
"value": "Acceleration (g^2/Hz)",
"variable": "Data Set"
},
log_x=True,
log_y=True,
title="1/3 Octave PSD Comparison of Bearing Data")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/1-3-octave-psd-bearing.html",full_html=False,include_plotlyjs='cdn')
fig.show()
Remember how spot on our RMS calculation was from the linear spaced PSD, let's see how it looks with the log spaced.
fig = go.Figure()
add_full_df(df_rms,'Linear Bins')
add_full_df(df_oct_rms,'Logarithmic Bins')
fig.update_xaxes(type="log")
fig.update_layout(
title="Compare Cumulative RMS from PSDs",
xaxis_title="Frequency (Hz)",
yaxis_title="Acceleration RMS (g)",
legend_title="Legend Title",
)
fig.show()
fig = px.line(df_oct_rms,
labels={
"value": "Acceleration RMS (g)",
"variable": "Data Set"
},
log_x=True,
title="1/3 Octave Cumulative RMS from PSD")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/1-3-octave-rms-from-psd-bearing.html",full_html=False,include_plotlyjs='cdn')
fig.show()
Now that we have verified the cumulative RMS is spot on, let's build a dataframe that has values for the RMS within each range and add appropriate labels to pull the upper and lower frequency range into the label.
df_rms_ranges = pd.DataFrame({'Lower':[1,65,300],
'Upper':[65,300,6000]})
df_rms_ranges['Frequency (Hz)'] = (df_rms_ranges['Lower']+df_rms_ranges['Upper'])/2
df_rms_ranges = df_rms_ranges.set_index('Frequency (Hz)')
df_rms_ranges
Lower | Upper | |
---|---|---|
Frequency (Hz) | ||
33.0 | 1 | 65 |
182.5 | 65 | 300 |
3150.0 | 300 | 6000 |
df_arb_psd,df_arb_rms = add_oct_psd(df_psd,df_rms_ranges)
fig = go.Figure()
add_full_df(df_psd,'Linear Bins')
add_full_df(df_arb_psd,'Arbitrary Bins')
fig.update_yaxes(type="log")
fig.update_xaxes(type="log")
fig.update_layout(
title="Compare PSDs",
xaxis_title="Frequency (Hz)",
yaxis_title="Acceleration (g^2/Hz)",
legend_title="Legend Title",
)
fig.show()
fig = go.Figure()
add_full_df(df_rms,'Linear Bins')
add_full_df(df_arb_rms,'Arbitrary Bins')
fig.update_xaxes(type="log")
fig.update_layout(
title="Compare Cumulative RMS from PSDs",
xaxis_title="Frequency (Hz)",
yaxis_title="Acceleration RMS (g)",
legend_title="Legend Title",
)
fig.show()
df_arb_rms_bins = df_arb_rms.diff()
df_arb_rms_bins.iloc[0] = df_arb_rms.iloc[0]
df_arb_rms_bins['Frequency Range (Hz)'] = ' '
for i in range(df_arb_rms_bins.shape[0]):
ind=df_arb_rms.index[i]
df_arb_rms_bins.loc[df_arb_rms_bins.index==ind,'Frequency Range (Hz)'] = str(df_rms_ranges['Lower'].iloc[i].round(1)) + ' to ' + str(df_rms_ranges['Upper'].iloc[i].round(1))
df_arb_rms_bins = df_arb_rms_bins.set_index('Frequency Range (Hz)')
df_arb_rms_bins
Fault_021 | Fault_014 | Fault_007 | Normal | |
---|---|---|---|---|
Frequency Range (Hz) | ||||
1 to 65 | 0.001990 | 0.001450 | 0.000981 | 0.010137 |
65 to 300 | 0.005665 | 0.007912 | 0.009952 | 0.031013 |
300 to 6000 | 0.190864 | 0.147873 | 0.111051 | 0.023736 |
fig = px.bar(df_arb_rms_bins,
labels={
"value": "Acceleration RMS (g)",
"variable": "Data Set"
},
barmode='group',
title="Comparison of RMS per Frequency Bin of Bearing Data")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/grms-vs-frequency.html",full_html=False,include_plotlyjs='cdn')
fig.show()
Reorganize the dataframe so it will be easy to combine with other stats later.
df_arb_rms_bins = df_arb_rms_bins.T
df_arb_rms_bins.columns = ['RMS (g) from ' + str(col) for col in df_arb_rms_bins.columns]
df_arb_rms_bins
RMS (g) from 1 to 65 | RMS (g) from 65 to 300 | RMS (g) from 300 to 6000 | |
---|---|---|---|
Fault_021 | 0.001990 | 0.005665 | 0.190864 |
Fault_014 | 0.001450 | 0.007912 | 0.147873 |
Fault_007 | 0.000981 | 0.009952 | 0.111051 |
Normal | 0.010137 | 0.031013 | 0.023736 |
from collections import namedtuple
import warnings
import numpy as np
import scipy.signal
def rel_displ(accel, omega, dt=1, damp=0, axis=-1):
"""Calculate the relative velocity for a SDOF system."""
# Generate the transfer function
# H(s) = L{z(t)}(s) / L{y"(t)}(s) = (1/s²)(Z(s)/Y(s))
# for the PDE
# z" + (2ζω)z' + (ω^2)z = -y"
with warnings.catch_warnings():
warnings.simplefilter('ignore', scipy.signal.BadCoefficients)
tf = scipy.signal.TransferFunction(
[-1],
[1, 2*damp*omega, omega**2],
).to_discrete(dt=dt)
return scipy.signal.lfilter(tf.num, tf.den, accel, axis=axis)
def pseudo_velocity(accel, freqs, dt=1, damp=0, two_sided=False, axis=-1):
"""The pseudo velocity of an acceleration signal."""
freqs = np.asarray(freqs)
omega = 2*np.pi*freqs
accel = np.moveaxis(accel, axis, -1)
results = np.empty((2,) + freqs.shape + accel.shape[:-1], dtype=np.float)
for i_nd in np.ndindex(freqs.shape):
rd = rel_displ(accel, omega[i_nd], dt, damp=damp)
results[(0,)+i_nd] = -omega[i_nd]*rd.min(axis=-1)
results[(1,)+i_nd] = omega[i_nd]*rd.max(axis=-1)
# Move any frequency axes in place of the specified acceleration axis
results = np.moveaxis(
results,
np.arange(1, omega.ndim+1),
np.arange(1, omega.ndim+1) + (axis % accel.ndim),
)
if not two_sided:
return np.maximum(results[0], results[1])
return namedtuple('PseudoVelocityResults', 'neg pos')(*results)
#freqs = np.linspace(1,fs/2,num=int(fs/2)) #linear spaced 1 Hz bins, but runs MUCH faster if you use log spaced
def build_freqs(start,stop,space):
a = [start]
while a[-1] < stop:
a.append(a[-1]*(2**space))
return a
freqs = build_freqs(1,fs/2,1/24)
pvss = pseudo_velocity(df.to_numpy(), freqs, dt=1/fs, damp=0.05, two_sided=False, axis=0)*386.2205
df_pvss = pd.DataFrame(pvss,columns=df.columns)
df_pvss['Frequency (Hz)'] = freqs
df_pvss = df_pvss.set_index('Frequency (Hz)')
df_pvss
Fault_021 | Fault_014 | Fault_007 | Normal | |
---|---|---|---|---|
Frequency (Hz) | ||||
1.000000 | 1.334636 | 0.301645 | 0.323765 | 1.131910 |
1.029302 | 1.296414 | 0.293319 | 0.314458 | 1.093981 |
1.059463 | 1.259462 | 0.285196 | 0.305718 | 1.057886 |
1.090508 | 1.222877 | 0.277300 | 0.297020 | 1.023294 |
1.122462 | 1.187459 | 0.269625 | 0.288614 | 0.989471 |
... | ... | ... | ... | ... |
5467.504043 | 0.017717 | 0.023767 | 0.011359 | 0.003191 |
5627.714140 | 0.016556 | 0.022969 | 0.011240 | 0.003091 |
5792.618751 | 0.017035 | 0.022290 | 0.011149 | 0.002996 |
5962.355437 | 0.017013 | 0.021681 | 0.010950 | 0.002907 |
6137.065787 | 0.016392 | 0.021054 | 0.010585 | 0.002825 |
303 rows × 4 columns
fig = px.line(df_pvss,
labels={
"value": "Peak Pseudo Velocity (in/s)",
"variable": "Data Set"
},
log_x=True,
log_y=True,
title="PVSS Comparison of Bearing Data")
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/bearing-pvss.html",full_html=False,include_plotlyjs='cdn')
fig.show()
There is some funny business going on with lower frequencies that aren't going to be very relevant to us. So let's ignore everything under 20 Hz, and then find the peak pseudo velocity and corresponding frequency.
df_pvss_stats = pd.concat([df_pvss[20:].max(),df_pvss[20:].idxmax()],axis=1)
df_pvss_stats.columns=['Peak PV (in/s)','Peak PV (Hz)']
df_pvss_stats
Peak PV (in/s) | Peak PV (Hz) | |
---|---|---|
Fault_021 | 0.126597 | 3250.997354 |
Fault_014 | 0.155990 | 512.000000 |
Fault_007 | 0.085494 | 161.269894 |
Normal | 0.407828 | 90.509668 |
df_stats_final = pd.concat([df_stats,df_stats_2,df_psd_peaks['Peak Frequency (Hz)'],df_arb_rms_bins,df_pvss_stats],axis=1)
df_stats_final = df_stats_final.T #transpose
df_stats_final = df_stats_final.iloc[:, ::-1] #Reverse order of columns
df_stats_final
Data Set | Normal | Fault_007 | Fault_014 | Fault_021 |
---|---|---|---|---|
Acceleration Peak (g) | 0.269114 | 0.650390 | 1.338628 | 1.037862 |
Acceleration RMS (g) | 0.065943 | 0.121308 | 0.157784 | 0.198760 |
Crest Factor | 4.081014 | 5.361487 | 8.483951 | 5.221685 |
Standard Deviation (g) | 0.065060 | 0.121272 | 0.157761 | 0.198383 |
Velocity RMS (in/s) | 0.026376 | 0.006666 | 0.008430 | 0.007972 |
Displacement RMS (in) | 0.000083 | 0.000011 | 0.000010 | 0.000054 |
Peak Frequency (Hz) | 90.000000 | 1051.000000 | 3237.000000 | 3316.000000 |
RMS (g) from 1 to 65 | 0.010137 | 0.000981 | 0.001450 | 0.001990 |
RMS (g) from 65 to 300 | 0.031013 | 0.009952 | 0.007912 | 0.005665 |
RMS (g) from 300 to 6000 | 0.023736 | 0.111051 | 0.147873 | 0.190864 |
Peak PV (in/s) | 0.407828 | 0.085494 | 0.155990 | 0.126597 |
Peak PV (Hz) | 90.509668 | 161.269894 | 512.000000 | 3250.997354 |
Now we can expose a way to have a bar chart of each metric, but let you chose in a dropdown which metric to display. This is from a nice person on stack overflow - amazing.
df_plots = df_stats_final.T
fig = px.bar(df_plots, y=df_plots.columns[0])
updatemenu = []
buttons = []
# button with one option for each dataframe
for col in df_plots.columns:
buttons.append(dict(method='restyle',
label=col,
visible=True,
args=[{'y':[df_plots[col]],
'x':[df_plots.index],
'type':'bar'}, [0]],
)
)
# some adjustments to the updatemenus
updatemenu = []
your_menu = dict()
updatemenu.append(your_menu)
updatemenu[0]['buttons'] = buttons
updatemenu[0]['direction'] = 'down'
updatemenu[0]['showactive'] = True
# add dropdown menus to the figure
fig.update_layout(showlegend=False, updatemenus=updatemenu)
fig.update_yaxes(title_text='')
fig.update_xaxes(title_text='Data Set')
fig.show()
The great thing about plotly and dataframes is how easy it is to generate SWEET plots. In this final example, I am going to generate a simple dashboard to show each plot separately in one view. It first requires us to restructure the dataframe to have just three columns: value, stat, dataset. Then generating the facet plot is pretty easy!
values = []
data_set = []
stat = []
x = 0
for i in range(df_stats_final.shape[0]):
for col in df_stats_final.columns:
values.append(df_stats_final[col].iloc[i])
data_set.append(col)
stat.append(df_stats_final.index[i])
df_stats_collapsed = pd.DataFrame(values,columns=['Values'])
df_stats_collapsed['Stat'] = stat
df_stats_collapsed['Data Set'] = data_set
df_stats_collapsed
Values | Stat | Data Set | |
---|---|---|---|
0 | 0.269114 | Acceleration Peak (g) | Normal |
1 | 0.650390 | Acceleration Peak (g) | Fault_007 |
2 | 1.338628 | Acceleration Peak (g) | Fault_014 |
3 | 1.037862 | Acceleration Peak (g) | Fault_021 |
4 | 0.065943 | Acceleration RMS (g) | Normal |
5 | 0.121308 | Acceleration RMS (g) | Fault_007 |
6 | 0.157784 | Acceleration RMS (g) | Fault_014 |
7 | 0.198760 | Acceleration RMS (g) | Fault_021 |
8 | 4.081014 | Crest Factor | Normal |
9 | 5.361487 | Crest Factor | Fault_007 |
10 | 8.483951 | Crest Factor | Fault_014 |
11 | 5.221685 | Crest Factor | Fault_021 |
12 | 0.065060 | Standard Deviation (g) | Normal |
13 | 0.121272 | Standard Deviation (g) | Fault_007 |
14 | 0.157761 | Standard Deviation (g) | Fault_014 |
15 | 0.198383 | Standard Deviation (g) | Fault_021 |
16 | 0.026376 | Velocity RMS (in/s) | Normal |
17 | 0.006666 | Velocity RMS (in/s) | Fault_007 |
18 | 0.008430 | Velocity RMS (in/s) | Fault_014 |
19 | 0.007972 | Velocity RMS (in/s) | Fault_021 |
20 | 0.000083 | Displacement RMS (in) | Normal |
21 | 0.000011 | Displacement RMS (in) | Fault_007 |
22 | 0.000010 | Displacement RMS (in) | Fault_014 |
23 | 0.000054 | Displacement RMS (in) | Fault_021 |
24 | 90.000000 | Peak Frequency (Hz) | Normal |
25 | 1051.000000 | Peak Frequency (Hz) | Fault_007 |
26 | 3237.000000 | Peak Frequency (Hz) | Fault_014 |
27 | 3316.000000 | Peak Frequency (Hz) | Fault_021 |
28 | 0.010137 | RMS (g) from 1 to 65 | Normal |
29 | 0.000981 | RMS (g) from 1 to 65 | Fault_007 |
30 | 0.001450 | RMS (g) from 1 to 65 | Fault_014 |
31 | 0.001990 | RMS (g) from 1 to 65 | Fault_021 |
32 | 0.031013 | RMS (g) from 65 to 300 | Normal |
33 | 0.009952 | RMS (g) from 65 to 300 | Fault_007 |
34 | 0.007912 | RMS (g) from 65 to 300 | Fault_014 |
35 | 0.005665 | RMS (g) from 65 to 300 | Fault_021 |
36 | 0.023736 | RMS (g) from 300 to 6000 | Normal |
37 | 0.111051 | RMS (g) from 300 to 6000 | Fault_007 |
38 | 0.147873 | RMS (g) from 300 to 6000 | Fault_014 |
39 | 0.190864 | RMS (g) from 300 to 6000 | Fault_021 |
40 | 0.407828 | Peak PV (in/s) | Normal |
41 | 0.085494 | Peak PV (in/s) | Fault_007 |
42 | 0.155990 | Peak PV (in/s) | Fault_014 |
43 | 0.126597 | Peak PV (in/s) | Fault_021 |
44 | 90.509668 | Peak PV (Hz) | Normal |
45 | 161.269894 | Peak PV (Hz) | Fault_007 |
46 | 512.000000 | Peak PV (Hz) | Fault_014 |
47 | 3250.997354 | Peak PV (Hz) | Fault_021 |
fig = px.bar(df_stats_collapsed, x="Data Set", y="Values", facet_col="Stat",facet_col_wrap = 4)
fig.update_yaxes(matches=None)
fig.update_yaxes(visible=False)
fig.write_html("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/bearing-dashboard.html",full_html=False,include_plotlyjs='cdn')
fig.show()
You can get most of what you need with the free anaconda edition that will include some of the core libraries like pandas, numpy and scipy. To download visit the anaconda individual edition page.
There are several different interfaces for writing your code but this was all generated in Jupyter which lends itself well to documenting a test/analysis report. It also allows for easy export to HTML for sharing interactive reports.
There will be some libraries that aren't included that you may need. To install type the following in Jupyter and replace plotly
(which you'll need) with whatever library name you're hoping to install.
!pip install plotly
Matplotlib is the default plotting library and built into pandas. You can simply add .plot() to the end of your dataframe to plot it. These won't be interactive though and the default size is too small. So the following shows how to make the plot bigger if you'd like to use this.
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (20,10)
df_accel.plot()
df_vel.plot()
df_displ.plot()
You will need to replace the api_key with your own account information. I didn't end up going this route but wanted to show how this can be done.
import chart_studio
chart_studio.tools.set_credentials_file(username='your_user_name', api_key='your_api_key')
import chart_studio.plotly as py
py.plot(fig, filename = 'your_filename', auto_open=True)
fig.write_html("C:/Users/shanly/bearing-time-series.html",full_html=False,include_plotlyjs='cdn')
fig.write_image("C:/Users/shanly/bearing-time-series.png")
fig.write_image("C:/Users/shanly/bearing-time-series.svg")
The data is saved as .mat files so we need to load them and convert to a handy DataFrame. This is a bit convoluted based on how they named the variables within each .mat file. We are going to only be getting the "Drive End" accelerometer data which has "DE" in the variable name.
import scipy.io
import pandas as pd
import numpy as np
def load_mat(name,var):
dic = scipy.io.loadmat("C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/"+name+".mat")
con_list = [[element for element in upperElement] for upperElement in dic[var]]
df = pd.DataFrame(con_list)
return df
Normal = load_mat('Normal','X098_DE_time')
Fault_007 = load_mat('Fault_007','X279_DE_time')
Fault_014 = load_mat('Fault_014','X275_DE_time')
Fault_021 = load_mat('Fault_021','X271_DE_time')
#Merge into 1 Dataframe
df = pd.concat([Fault_021,Fault_014,Fault_007,Normal],axis=1)
#Truncate to first 120000 points
df=df.loc[:119999]
# Create Column Names
df.columns = ['Fault_021','Fault_014','Fault_007','Normal']
# Add Column of Time
df['Time'] = np.linspace(0,120000,num=120000)/12000
df = df.set_index('Time')
print(df.head())
df.reset_index().to_csv(path_or_buf="C:/Users/shanly/Desktop/enDAQ/Marketing-Content/What-to-Monitor/Bearing_Data.csv",index=False)