For a more detailed explanation of the meaning of various flags, see the data documentation.
In the S5 data set, there are 18 data quality categories, as well as 5 injection categories, each represented as a 1 Hz time series. Let's print out a list of the S5 data quality channel names from the LIGO data file.
In the same directory where the data file you
downloaded in Step One,
create a file named dq.py
. Inside
dq.py
, place the following code:
import numpy as np
import matplotlib.pyplot as plt
import h5py
filename = 'H-H1_LOSC_4_V1-815411200-4096.hdf5'
dataFile = h5py.File(filename, 'r')
gpsStart = dataFile['meta']['GPSstart'].value
dqInfo = dataFile['quality']['simple']
bitnameList = dqInfo['DQShortnames'].value
nbits = len(bitnameList)
for bit in range(nbits):
print bit, bitnameList[bit]
Try running this code in the Python interpreter, either
with run dq
if you are using Canopy or iPython, or
import dq
and reload(dq)
if you are not.
You should see a list of data quality category names. For an
explanation of each category, see the data set
documentation.
All the data quality categories are combined into a bit mask, and
stored as an array with 4096 entries (one entry for each second of data).
In the LIGO data file, this is
called the DQmask
, and you can extract it from the
LIGO data file by adding this code to dq.py
:
qmask = dqInfo['DQmask'].value
Each sample in this bit mask encodes all 18 data quality
categories as a different digit in a binary number.
A one in a particular digit means the corresponding
flag is "passed", so the data is "good" at that category level, and
a zero means the flag is off, so the data is "bad" at that
category level. For example, a DQmask value of
000000001000100011
in
S5 data would mean that the
detector data is available (DATA), and that the data
passes the CBCHIGH_CAT1
, CBCLOW_CAT1
,
and BURST_CAT1
criteria, but fails all other
data quality conditions.
This is a compact way to store a lot of information, but to work with data quality, we'll want to put things in a more convienient form. Let's try to unpack some of this data quality information.
In most cases, you will not want to keep track of
every data quality category, but only a few values that
are important for your search. For example, to search for
transient gravitational wave sources, such as supernovae or
compact object mergers, you may
want to keep track of the DATA
category,
as well as all of the BURST
categories. As an example,
let's try unpacking just the BURST_CAT1 and DATA categories.
Try adding this to dq.py
:
Now, the variable sci will be 0 everywhere the data fails the
sci = (qmask >> 0) & 1
burst1 = (qmask >> 9) & 1
DATA
category, and 1 everywhere the data passes the
DATA
category.
Similarly, since BURST_CAT1
is stored in bit 9, the burst1 variable will correspond to this category.
In a typical case, you only want to use data where all of the
categories of interest are "passed", signaling relatively clean data.
To accomplish this, just take the logical, element-wise AND of
all the channels you need:
To confirm that goodData_1hz = sci & burst1
goodData_1hz
is on when both the
DATA and BURST_CAT1 categories are passed, we can make a plot of
these three DQ channels. Add the following to dq.py
plt.plot(goodData_1hz + 4, label='Good_Data')
plt.plot(burst1 + 2, label='BURST_CAT1')
plt.plot(sci, label='DATA')
plt.axis([0, 4096, -1, 8])
plt.legend(loc=1)
plt.xlabel('Time (s)')
When you run dq.py
, you should see a plot of these
three channels that looks like this:
dummy = np.zeros(goodData_1hz.shape)
masked_dummy = np.ma.masked_array(dummy, np.logical_not(goodData_1hz) )
segments = np.ma.flatnotmasked_contiguous(masked_dummy)
segList = [(int(seg.start+gpsStart), int(seg.stop+gpsStart)) for seg in segments]
We'll see examples of how to use segment lists in the next step.
You can see all the code described in this tutorial as dq.py here.