For a more detailed explanation of the meaning of various flags, see the data documentation.
In the S5 data set, there are 18 data quality categories, as well as 5 injection categories, each represented as a 1 Hz time series. Let's print out a list of the S5 data quality channel names from the LIGO data file.
In the same directory where the data file you
downloaded in Step One,
create a file named
dq.py, place the following code:
Try running this code in the Python interpreter, either with
import numpy as np import matplotlib.pyplot as plt import h5py filename = 'H-H1_LOSC_4_V1-815411200-4096.hdf5' dataFile = h5py.File(filename, 'r') gpsStart = dataFile['meta']['GPSstart'].value dqInfo = dataFile['quality']['simple'] bitnameList = dqInfo['DQShortnames'].value nbits = len(bitnameList) for bit in range(nbits): print bit, bitnameList[bit]
run dqif you are using Canopy or iPython, or
reload(dq)if you are not. You should see a list of data quality category names. For an explanation of each category, see the data set documentation.
All the data quality categories are combined into a bit mask, and
stored as an array with 4096 entries (one entry for each second of data).
In the LIGO data file, this is
DQmask, and you can extract it from the
LIGO data file by adding this code to
Each sample in this bit mask encodes all 18 data quality categories as a different digit in a binary number. A one in a particular digit means the corresponding flag is "passed", so the data is "good" at that category level, and a zero means the flag is off, so the data is "bad" at that category level. For example, a DQmask value of
qmask = dqInfo['DQmask'].value
000000001000100011in S5 data would mean that the detector data is available (DATA), and that the data passes the
BURST_CAT1criteria, but fails all other data quality conditions.
This is a compact way to store a lot of information, but to work with data quality, we'll want to put things in a more convienient form. Let's try to unpack some of this data quality information.
In most cases, you will not want to keep track of
every data quality category, but only a few values that
are important for your search. For example, to search for
transient gravitational wave sources, such as supernovae or
compact object mergers, you may
want to keep track of the
as well as all of the
categories. As an example,
let's try unpacking just the BURST_CAT1 and DATA categories.
Try adding this to
Now, the variable sci will be 0 everywhere the data fails the
sci = (qmask >> 0) & 1
burst1 = (qmask >> 9) & 1
DATA category, and 1 everywhere the data passes the
is stored in bit 9, the burst1 variable will correspond to this category.
In a typical case, you only want to use data where all of the
categories of interest are "passed", signaling relatively clean data.
To accomplish this, just take the logical, element-wise AND of
all the channels you need:
To confirm that
goodData_1hz = sci & burst1
goodData_1hz is on when both the
DATA and BURST_CAT1 categories are passed, we can make a plot of
these three DQ channels. Add the following to
When you run
plt.plot(goodData_1hz + 4, label='Good_Data') plt.plot(burst1 + 2, label='BURST_CAT1') plt.plot(sci, label='DATA') plt.axis([0, 4096, -1, 8]) plt.legend(loc=1) plt.xlabel('Time (s)')
dq.py, you should see a plot of these three channels that looks like this:
We'll see examples of how to use segment lists in the next step.dummy = np.zeros(goodData_1hz.shape) masked_dummy = np.ma.masked_array(dummy, np.logical_not(goodData_1hz) ) segments = np.ma.flatnotmasked_contiguous(masked_dummy) segList = [(int(seg.start+gpsStart), int(seg.stop+gpsStart)) for seg in segments]
You can see all the code described in this tutorial as dq.py here.