O2 Data Set Technical Details

*Some of the links on this page are to internal pages that require ligo.org credentials. If you need to gain access to the information there, please contact us at gwosc@igwn.org.

GWOSC data downsampling and repackaging

GWOSC builds files from standard LIGO and Virgo h(t) frames. We have chosen to repackage our data to make it more accessible to casual users both within the LIGO Virgo Collaboration and outside.

  • Data are made available both as frame files (GWF) and HDF5 (HDF). The GWF frame format is a standard within the GW community, but may be unfamiliar to people in other fields. HDF5 is a popular format, easily readable in many languages, including python, matlab, Mathematica, and C.
  • The channel names used to collect data from the original files are: "H1:DCH-CLEAN_STRAIN_C02" for H1, "L1:DCH-CLEAN_STRAIN_C02" for L1 and "V1:Hrec_hoft_V1O2Repro2A_16384Hz" for V1. The frame types are: "H1_CLEANED_HOFT_C02" for H1, "L1_CLEANED_HOFT_C02" for L1 and "V1O2Repro2A" for V1. Note that these channels names and frame types refer to the files used internally by LIGO and Virgo. The channel names in the files available to the public can be found in the paragraph "GWF Channel Names". It was decided to use a homogenous channel name style for the three detectors.
  • The strain data are made available both at 16384 Hz and 4096 Hz sample rates. Users should choose which sampling rate is most appropriate for their search. The data quality (DQ) is less well studied above 2 kHz, and the strain calibration is valid only up to 5 kHz. Of course, the down-sampled data are smaller, reducing both the download time and storage requirements.
  • In the 4096 Hz data set, the use of an anti-aliasing filter corrupts the data near the Nyquist frequency. For studies involving frequencies of around 1700 Hz or above, the 16384 Hz data should be used instead.
  • The down-sampling is done using the python package scipy, with the method scipy.signal.decimate.
  • Advanced LIGO and advanced Virgo data are not calibrated or valid below 10 Hz or above 5 kHz. In most searches for astrophysical sources, data below 20 Hz are not used because the noise is too high. The calibrated data come with a set of files that quantify the uncertainty in the calibration, both magnitude and phase, as a function of frequency. These files, with associated documentation, are available at this link.
  • Our hdf5/frame files have fixed duration (4096 seconds) and boundaries. Tutorial 4 presents a user API to get the data and load it into python, giving users access to a list of data segments.
  • We provide Timelines and My Sources to aid the user in finding data (including DQ and HWinj info) from a particular time, instead of segDB queries. From Timeline, you can see multiple DQ and Injection flags, zoom in, and download segments.
  • The data quality (DQ) and hardware injections (HW) are summarized in 1 Hz vectors, in both the hdf5 and frame files. See the bit mask definition and tutorials for details. The bit mask definition is equivalent for the files sampled at 16 kHz and at 4 kHz.

Noise Subtraction

After data collection, several independently-measured terrestrial contributions to the detector noise were subtracted from the LIGO data at both sites. This subtraction removed calibration lines and 60 Hz AC power mains harmonics from both LIGO data streams and laser pointing noise from only the LIGO Hanford data stream. The subtraction of laser pointing noise reduced noise levels for frequencies up to 1 KHz. For reference, see:

  • "Improving the sensitivity of Advanced LIGO using noise subtraction" arXiv:1809.05348

GWF Channel Names

The O2 4KHZ and 16KHz GWF files (ending with extension .gwf) have new channel names that differ from the standard names used in S5, S6 and O1 4KHz GWF files.


Channel names found inside GWF files

O2 (4KHz samples per second) O2 (16KHz samples per second)
{ifo}:GWOSC-4KHZ_R1_STRAIN {ifo}:GWOSC-16KHZ_R1_STRAIN
{ifo}:GWOSC-4KHZ_R1_DQMASK {ifo}:GWOSC-16KHZ_R1_DQMASK
{ifo}:GWOSC-4KHZ_R1_INJMASK {ifo}:GWOSC-16KHZ_R1_INJMASK

NOTES:

  • {ifo} is a place holder for either H1, L1 or V1, e.g., H1:GWOSC-16KHZ_R1_STRAIN, L1:GWOSC-16KHZ_R1_STRAIN or V1:GWOSC-16KHZ_R1_STRAIN.
  • The _R1_ substring in the O2 channel names represents the revision number of the channel name.
  • The HDF5 group names for O2 are the same as used on S5, S6 and O1. However, a new meta channel attribute has been added to the HDF5 structure that captures the associated channel name found in the GWF files outlined in the table above.

Notes about the DATA flag

See the Defining the DATA Flag page.

O2 Hardware Injections

The O2 data hardware injections that appear as simulated gravitational wave signals in the data. During O2, the Burst and CBC groups removed hardware injections at various DQ category levels. However, the GWOSC data does not mask out data that has injections -- rather lists are provided of those. See the O2 Hardware Injection page for details. Segment lists of hardware injections may include times when data are not publicly available. Details of these injections are not included in the documentation.

Data Quality

Data quality categories, or flags, are defined by each analysis group: Compact Binary Coalescence (CBC), Burst, Continuous Waves (CW) and Stochastic. This is because periods of noisy data will affect each type of analysis differently. For each flag, GWOSC data files contain a corresponding 1 Hz time-series that marks times that pass the flag as a "1" (good data), and times that fail the flag as a "0" (bad data). A full list of O2 data quality categories can be seen on the O2 data quality definitions page (for the release at 16 kHz) (the data quality categories for the files at 4 kHz are identical and can be found here). The details of each category are described in the references linked below. However, as a rough guide:

  • DATA (Data Available): Failing this level indicates that GW strain data are not publicly available because the instruments were not operating in an acceptable condition. For O2, this is equivalent to Category 1.
  • CAT1 (Category 1): Failing a data quality check at this category indicates a critical issue with a key detector component not operating in its nominal configuration. Since these times indicate a major known problem these times are identical for each data analysis group. Times that fail CAT1 flags and the corresponding data are not available.
  • CAT2 (Category 2): Failing a data quality check at this category indicates times when there is a known, understood physical coupling to the gravitational wave channel. This might include times of high seismic activity.
  • CAT3 (Category 3): Failing a data quality check at this category indicates times when there is statistical coupling to the gravitational wave channel which is not fully understood.

In general, data quality levels are defined in a cumulative way: a time which fails a given category automatically fails all higher categories. For example, if the only known problem with a given time fails a burst category 2 flag, then the data is said to pass DATA and BURST_CAT1, but fails BURST_CAT2 and BURST_CAT3. However, the different analysis groups are independent: if something fails at CAT2_BURST, then it can pass CAT2_CBC.

These graduated categories of quality allow a data pipeline to adjust its behavior depending on the data quality. For example running the numerical search (template matching) against all the data segments that pass CAT1, but ignoring any candidate events from data that do not pass CAT3. This strategy allows long sections of data to be used, increasing search efficiency.

Note to LVC members: Conventionally, hardware injections are vetoed by CAT flags so that searches do not see them. However GWOSC strain data provides h(t) at these times: therefore a search with GWOSC data will find lots of chirps, to be compared with the lists of injections -- see above.

For information on how to use data quality information:

  • Characterization of Transient Noise in Advanced LIGO Relevant to Gravitational Wave Signal GW150914 CQG, or arXiv
  • Effects of Data Quality Vetoes on a Search for Compact Binary Coalescences in Advanced LIGO's First Observing Run arXiv.
  • Step 3 and Step 4 of the introductory tutorial show how to apply data quality flags
  • Data Quality definitions for the O2 data set (the link refers to the 16 kHz files, the data quality categories for the files at 4 kHz are identical and can be found here).
  • Plot and download segment lists from Timeline

MD5 Check Sums