O3 Data Set Technical Details

Please Read This First!

Click for data usage notes

*Some of the links on this page are to internal pages that require ligo.org credentials. If you need to gain access to the information there, please contact us at gwosc@igwn.org.

GWOSC data downsampling and repackaging

GWOSC builds files from standard LIGO and Virgo h(t) frames. We have chosen to create a repackaged version of our data to make it more accessible to casual users.

  • Data are made available both as frame files (GWF) and HDF5 (HDF). The GWF frame format is a standard within the GW community, but may be unfamiliar to people in other fields. HDF5 is a popular format, easily readable in many languages, including python, matlab, Mathematica, and C.
  • The channel names used to collect data from the original files are: "H1:DCS-CALIB_STRAIN_CLEAN-SUB60HZ_C01" for H1, "L1:DCS-CALIB_STRAIN_CLEAN-SUB60HZ_C01" for L1 and "V1:Hrec_hoft_16384Hz" for V1. The frame types are: "H1_HOFT_CLEAN_SUB60HZ_C01" for H1, "L1_HOFT_CLEAN_SUB60HZ_C01" for L1 and "V1Online" for V1. The last two weeks of O3a Virgo data were reprocessed (see this Note), so starting from GPS time 1252540000 the frame type used is "V1O3Repro1A" and the strain channel is "V1:Hrec_hoft_V1O3ARepro1A_16384Hz". Note that these channel names and frame types refer to the files used internally by LIGO and Virgo. The channel names in the files available to the public can be found in the paragraph "GWF Channel Names". It was decided to use a homogenous channel name style for the three detectors.
  • The strain data are made available both at 16384 Hz and 4096 Hz sample rates. Users should choose which sampling rate is most appropriate for their search. The data quality (DQ) is less well studied above 2 kHz, and the strain calibration is valid only up to 5 kHz. Of course, the down-sampled dataset is smaller, reducing both the download time and storage requirements.
  • In the 4096 Hz data set, the use of an anti-aliasing filter corrupts the data near the Nyquist frequency. For studies involving frequencies of around 1700 Hz or above, the 16384 Hz data should be used instead.
  • The down-sampling is done using the python package scipy, with the method scipy.signal.decimate.
  • The detector strain h(t) is only calibrated between 10 Hz and 5000 Hz for Advanced LIGO and between 20 Hz and 2000 Hz for Advanced Virgo. In addition, Advanced Virgo data between 49.5 Hz and 50.5 Hz are characterised by a large increase of calibration errors because of effects related to the main power lines (details can be found in this paper). Due to this increased systematic error, data in this narrow frequency band were considered to be uninformative for source-parameter estimation (see Appendix E of GWTC-3 Catalog paper for relevant methods).

    In most searches for astrophysical sources, data below 20 Hz are not used because the noise is too high. Check the following papers for details about the calibration and calibration uncertainties for LIGO during O3a, for LIGO during O3b and for Virgo during O3.

    Files containing the uncertainty in the calibration, both magnitude and phase, as a function of frequency, with associated documentation, are available at this link.

  • Our hdf5/frame files have fixed duration (4096 seconds) and boundaries. Before downsampling the data from 16384 Hz to 4096 Hz, for each file a padding of 8 seconds is requested to avoid border effect. However, these data are not always available so in some cases tiny border effects could still be present in the 4096 Hz data.
  • We provide Timelines and My Sources to aid the user in finding data (including DQ and HWinj info) from a particular time, instead of segDB queries. From Timeline, you can see multiple DQ and Injection flags, zoom in, and download segments.
  • The data quality (DQ) and hardware injections (HW) are summarized in 1 Hz vectors, in both the hdf5 and frame files. See the bit mask definition for details. The bit mask definition is equivalent for the files sampled at 16 kHz and at 4 kHz, for O3a and O3b. This tutorial shows how to work with the DQ bit mask (in the tutorial use the flags for the O3a run from the bit mask definition).
  • Meta-data on each file includes an estimate of the Binary Neutron Star (BNS) Range, as seen on the O3a_4KHZ_R1, O3a_16KHZ_R1, O3b_4KHZ_R1 and O3b_16KHZ_R1 archives, with the "includes statistics of each file" option. Starting with O3a, the PSD is calculated as the median average of overlapping periodograms, where in previous runs the Welch's average periodogram method was used.

Noise Subtraction

After data collection, several independently-measured terrestrial contributions to the detector noise were subtracted from the LIGO data at both sites. This subtraction removed calibration lines and 60 Hz AC power mains harmonics from both LIGO data streams. Additional noise contributions due to non-stationary couplings of the power mains were also subtracted.

The Virgo online strain data production also performed broadband noise subtraction during O3. The subtracted noise included frequency noise of the input laser, noise introduced controlling the displacement of the beam splitter, and amplitude noise of the 56 MHz modulation frequency. The reprocessing of Virgo data between the 14th September 2019 and the 1st October 2019 with an improved noise subtraction resulted in a BNS range increase of 3 Mpc.

For reference, see:

  • "Improving the sensitivity of Advanced LIGO using noise subtraction" arXiv:1809.05348
  • "Machine-learning non-stationary noise out of gravitational-wave detectors" arXiv:1911.09083
  • "Online h(t) reconstruction for Virgo O3 data: start of O3" Virgo Tec. Rep.
  • "Reprocessing of h(t) for the last two weeks of O3a" Virgo Tec. Rep.

GWF Channel Names

The O3 4KHZ and 16KHz GWF files (ending with extension .gwf) use the channel names in the table below:

Channel names found inside GWF files

O3a and O3b (4KHz samples per second) O3a and O3b (16KHz samples per second)
{ifo}:GWOSC-4KHZ_R1_STRAIN {ifo}:GWOSC-16KHZ_R1_STRAIN
{ifo}:GWOSC-4KHZ_R1_DQMASK {ifo}:GWOSC-16KHZ_R1_DQMASK
{ifo}:GWOSC-4KHZ_R1_INJMASK {ifo}:GWOSC-16KHZ_R1_INJMASK

NOTES:

  • {ifo} is a place holder for either H1, L1 or V1, e.g., H1:GWOSC-16KHZ_R1_STRAIN, L1:GWOSC-16KHZ_R1_STRAIN or V1:GWOSC-16KHZ_R1_STRAIN.
  • The _R1_ substring in the O3a and O3b channel names represents the revision number of the named channel.
  • The HDF5 group names for O3 are the same as used on S5, S6, O1 and O2. However, a new meta channel attribute has been added to the HDF5 structure that captures the associated channel name found in the GWF files outlined in the table above.

Notes about the DATA flag

See the Defining the DATA Flag page.

O3 Hardware Injections

The O3 data contain hardware injections that appear as (simulated) gravitational wave signals in the data.

However, the GWOSC data does not mask out data that has injections -- rather lists are provided of those. See the O3 Hardware Injection page for details.

Segment lists of hardware injections may include times when data are not publicly available. Details of these injections are not included in the documentation.

Data Quality

Data quality categories, or flags, are defined by each analysis group: Compact Binary Coalescence (CBC), Burst, Continuous Waves (CW) and Stochastic. This is because periods of noisy data will affect each type of analysis differently.

For each flag, GWOSC data files contain a corresponding 1 Hz time-series that marks times that pass the flag as a "1" (good data), and times that fail the flag as a "0" (bad data). A full list of O3 data quality categories can be seen on the O3a data quality definitions page (for the release at 16 kHz) (the data quality categories for O3b and for the files at 4 kHz are identical). Data quality is described in these categories:

In general, data quality levels are defined in a cumulative way: a time which fails a given category automatically fails all higher categories. For example, if the only known problem with a given time fails a burst category 2 flag, then the data is said to pass DATA and BURST_CAT1, but fails BURST_CAT2 and BURST_CAT3. However, the different analysis groups are independent: if something fails at CAT2_BURST, then it may still pass CAT2_CBC.

These graduated categories of quality allow a data pipeline to adjust its behavior depending on the data quality. An example is running the numerical search (template matching) against all the data segments that pass CAT1, but ignoring any candidate events from data that do not pass CAT3. This strategy allows long sections of data to be used, increasing search efficiency.

Note to LVC members: Conventionally, hardware injections are vetoed by CAT flags so that searches do not see them. However GWOSC strain data provides h(t) at these times: therefore a search with GWOSC data will find lots of chirps that must be compared with the lists of injections -- see above.

For information on how to use data quality information:

  • LIGO Detector Characterization in the Second and Third Observing Runs: arxiv:2101.11673
  • Virgo Detector Characterization and Data Quality during the O3 run: arxiv:2205.0155
  • Data Quality Vetoes Definitions: LIGO and Virgo
  • Step 3 and Step 4 of the introductory tutorial show how to apply data quality flags
  • Data Quality definitions for the O3 data set (the link refers to the 16 kHz files of O3a, the data quality categories for the files at 4 kHz and for O3b are identical).
  • Plot and download segment lists from Timeline for O3a and for O3b

MD5 Check Sums