Gravitational Wave Open Science Center

O3a Data Set Technical Details

*Some of the links on this page are to internal pages that require ligo.org credentials. If you need to gain access to the information there, please contact us at gwosc@igwn.org.

GWOSC data downsampling and repackaging

GWOSC builds files from standard LIGO and Virgo h(t) frames. We have chosen to create a repackaged version of our data to make it more accessible to casual users.
  • Data are made available both as frame files (GWF) and HDF5 (HDF). The GWF frame format is a standard within the GW community, but may be unfamiliar to people in other fields. HDF5 is a popular format, easily readable in many languages, including python, matlab, Mathematica, and C.

  • The channel names used to collect data from the original files are: "H1:DCS-CALIB_STRAIN_CLEAN-SUB60HZ_C01" for H1, "L1:DCS-CALIB_STRAIN_CLEAN-SUB60HZ_C01" for L1 and "V1:Hrec_hoft_16384Hz" for V1. The frame types are: "H1_HOFT_CLEAN_SUB60HZ_C01" for H1, "L1_HOFT_CLEAN_SUB60HZ_C01" for L1 and "V1Online" for V1. The last two weeks of O3a Virgo data were reprocessed (see this Note), so starting from GPS time 1252540000 the frame type used is "V1O3Repro1A" and the strain channel is "V1:Hrec_hoft_V1O3ARepro1A_16384Hz". Note that these channel names and frame types refer to the files used internally by LIGO and Virgo. The channel names in the files available to the public can be found in the paragraph "GWF Channel Names". It was decided to use a homogenous channel name style for the three detectors.

  • The strain data are made available both at 16384 Hz and 4096 Hz sample rates. Users should choose which sampling rate is most appropriate for their search. The data quality (DQ) is less well studied above 2 kHz, and the strain calibration is valid only up to 5 kHz. Of course, the down-sampled dataset is smaller, reducing both the download time and storage requirements.

  • In the 4096 Hz data set, the use of an anti-aliasing filter corrupts the data near the Nyquist frequency. For studies involving frequencies of around 1700 Hz or above, the 16384 Hz data should be used instead.

  • The down-sampling is done using the python package scipy, with the method scipy.signal.decimate.

  • The detector strain h(t) is only calibrated between 10 Hz and 5000 Hz for Advanced LIGO and between 20 Hz and 2000 Hz for Advanced Virgo. In addition, data between 49.5 Hz and 50.5 Hz must not be used for Virgo, because of effects in the calibration related to the main power lines. In most searches for astrophysical sources, data below 20 Hz are not used because the noise is too high.

  • Our hdf5/frame files have fixed duration (4096 seconds) and boundaries.

  • We provide Timelines and My Sources to aid the user in finding data (including DQ and HWinj info) from a particular time, instead of segDB queries. From Timeline, you can see multiple DQ and Injection flags, zoom in, and download segments.

  • The data quality (DQ) and hardware injections (HW) are summarized in 1 Hz vectors, in both the hdf5 and frame files. See the bit mask definition for details. The bit mask definition is equivalent for the files sampled at 16 kHz and at 4 kHz. This tutorial shows how to work with the DQ bit mask (in the tutorial use the flags for the O3a run from the bit mask definition).

  • Meta-data on each file includes an estimate of the Binary Neutron Star (BNS) Range, as seen on the O3a_4KHZ_R1 and O3a_16KHZ_R1 archive, with the "includes statistics of each file" option. Starting with O3a, the PSD is calculated as the median average of overlapping periodograms, where in previous runs the Welch's average periodogram method was used instead.

Noise Subtraction

After data collection, several independently-measured terrestrial contributions to the detector noise were subtracted from the LIGO data at both sites. This subtraction removed calibration lines and 60 Hz AC power mains harmonics from both LIGO data streams. Additional noise contributions due to non-stationary couplings of the power mains were also subtracted. The Virgo online strain data production also performed broadband noise subtraction during O3a. The subtracted noise included frequency noise of the input laser, noise introduced controlling the displacement of the beam splitter, and amplitude noise of the 56 MHz modulation frequency. The reprocessing of Virgo data between the 14th Sepetember 2019 and the 1st October 2019 with an improved noise subtraction resulted in a BNS range incease of 3 Mpc.

For reference, see:
  • "Improving the sensitivity of Advanced LIGO using noise subtraction" arXiv:1809.05348
  • "Machine-learning non-stationary noise out of gravitational-wave detectors" arXiv:1911.09083
  • "Online h(t) reconstruction for Virgo O3 data: start of O3" Virgo Tec. Rep.
  • "Reprocessing of h(t) for the last two weeks of O3a" Virgo Tec. Rep.


GWF Channel Names

The O3a 4KHZ and 16KHz GWF files (ending with extension .gwf) use the channel names in the table below:


Channel names found inside GWF files

O3a (4KHz samples per second) O3a (16KHz samples per second)
{ifo}:GWOSC-4KHZ_R1_STRAIN {ifo}:GWOSC-16KHZ_R1_STRAIN
{ifo}:GWOSC-4KHZ_R1_DQMASK {ifo}:GWOSC-16KHZ_R1_DQMASK
{ifo}:GWOSC-4KHZ_R1_INJMASK {ifo}:GWOSC-16KHZ_R1_INJMASK
NOTES:
  • {ifo} is a place holder for either H1, L1 or V1, e.g., H1:GWOSC-16KHZ_R1_STRAIN, L1:GWOSC-16KHZ_R1_STRAIN or V1:GWOSC-16KHZ_R1_STRAIN.
  • The _R1_ substring in the O3a channel names represents the revision number of the channel name.
  • The HDF5 group names for O3a are the same as used on S5, S6, O1 and O2. However, a new meta channel attribute has been added to the HDF5 structure that captures the associated channel name found in the GWF files outlined in the table above.

Notes about the DATA flag

See the Defining the DATA Flag page.

O3a Hardware Injections

The O3a data contain hardware injections that appear as (simulated) gravitational wave signals in the data. During O3a, the Burst and CBC groups removed hardware injections at various DQ category levels. However, the GWOSC data does not mask out data that has injections -- rather lists are provided of those. See the O3a Hardware Injection page for details.

Segment lists of hardware injections may include times when data are not publicly available. Details of these injections are not included in the documentation.

Data Quality

Data quality categories, or flags, are defined by each analysis group: Compact Binary Coalescence (CBC), Burst, Continuous Waves (CW) and Stochastic. This is because periods of noisy data will affect each type of analysis differently. For each flag, GWOSC data files contain a corresponding 1 Hz time-series that marks times that pass the flag as a "1" (good data), and times that fail the flag as a "0" (bad data). A full list of O3a data quality categories can be seen on the O3a data quality definitions page (for the release at 16 kHz) (the data quality categories for the files at 4 kHz are identical and can be found here). Data quality is described in these categories:
  • DATA (Data Available): Failing this level indicates that GW strain data are not publicly available because the instruments were not operating in an acceptable condition. For O3a, DATA is equivalent to Category 1.
  • CAT1 (Category 1), CAT2 (Category 2), CAT3 (Category 3): See O2/O3 LIGO Detector Characterization Paper

In general, data quality levels are defined in a cumulative way: a time which fails a given category automatically fails all higher categories. For example, if the only known problem with a given time fails a burst category 2 flag, then the data is said to pass DATA and BURST_CAT1, but fails BURST_CAT2 and BURST_CAT3. However, the different analysis groups are independent: if something fails at CAT2_BURST, then it may still pass CAT2_CBC.

These graduated categories of quality allow a data pipeline to adjust its behavior depending on the data quality. An example is running the numerical search (template matching) against all the data segments that pass CAT1, but ignoring any candidate events from data that do not pass CAT3. This strategy allows long sections of data to be used, increasing search efficiency.

Note to LVC members: Conventionally, hardware injections are vetoed by CAT flags so that searches do not see them. However GWOSC strain data provides h(t) at these times: therefore a search with GWOSC data will find lots of chirps that must be compared with the lists of injections -- see above.

For information on how to use data quality information:



MD5 Check Sums