Tutorial Step 3: Working with Data QualityIn this tutorial, we will take a look at how data quality information is stored in LIGO data files. If you are not already comfortable with using Python to read a LIGO data file, you may want to take a look at Step 2 of this tutorial.
What's Data Quality?
Understanding data quality is very important when working with LIGO data. Before performing any analysis, use this tutorial and the data documentation to identify the appropriate times for your analysis.
Times which fail the DATA category flag are represented by NaNs and should never be analyzed. Times which fail any CAT1 level flags have severe problems, and also should not be searched for astrophysical sources.In addition to the main data output of the LIGO detectors (the "strain" channel), there are hundreds of other data channels that are recorded to monitor the state of both the instruments and the external environment. Some of these auxillary channels are used to create data quality flags to note times when the strain data is likely to be corrupted by instrumental artifacts. The data quality flags are organized into categories by how severe an impact they may have on a given type of search. The categories for each type of search are defined differently, but in general, a lower data quality category indicates a more severe problem. So, for example, a CBCLOW Category 1 flag means that a stretch of data is strongly corrupted and cannot be used to search for low-mass compact binary coalescence (CBC) signals, but a CBCLOW Category 4 flag indicates a less significant problem with the data.
For a more detailed explanation of the meaning of various flags, see the data documentation.
How is data quality information stored?Data quality information is stored in LIGO data files as a 1 Hz time series for each category. Notice this is a different sampling rate than the 4096 Hz rate which is used to store strain data. So, for example, the first sample in a data quality channel applies to the first 4096 samples in the corresponding strain channel.
In the S5 data set, there are 18 data quality categories, as well as 5 injection categories, each represented as a 1 Hz time series. Let's print out a list of the S5 data quality channel names from the LIGO data file.
In the same directory where the data file you
downloaded in Step One,
create a file named
Try running this code in the Python interpreter, either with
All the data quality categories are combined into a bit mask, and
stored as an array with 4096 entries (one entry for each second of data).
In the LIGO data file, this is
Each sample in this bit mask encodes all 18 data quality categories as a different digit in a binary number. A one in a particular digit means the corresponding flag is "passed", so the data is "good" at that category level, and a zero means the flag is off, so the data is "bad" at that category level. For example, a DQmask value of
This is a compact way to store a lot of information, but to work with data quality, we'll want to put things in a more convienient form. Let's try to unpack some of this data quality information.
In most cases, you will not want to keep track of
every data quality category, but only a few values that
are important for your search. For example, to search for
transient gravitational wave sources, such as supernovae or
compact object mergers, you may
want to keep track of the
When you run
Here are some things to notice in the plot:
You can see all the code described in this tutorial as dq.py here.