Gravitational Wave Open Science Center

O3GK Data Set Technical Details

*Some of the links on this page are to internal pages that require ligo.org credentials. If you need to gain access to the information there, please contact us at gwosc@igwn.org.

GWOSC data downsampling and repackaging

GWOSC builds files from standard GEO 600 and KAGRA h(t) frames. We have chosen to create a repackaged version of our data to make it more accessible to casual users.
  • Data are made available both as frame files (GWF) and HDF5 (HDF). The GWF frame format is a standard within the GW community, but may be unfamiliar to people in other fields. HDF5 is a popular format, easily readable in many languages, including python, MATLAB, Mathematica, and C.

  • The channel names used to collect data from the original files are: "G1:DER_DATA_HD_CLEAN" for G1 and "K1:DAC-STRAIN_C20" for K1. The frame types are: "G1_RDS_C02_L3" for G1 and "K1_HOFT_C20" for K1. Note that these channel names and frame types refer to the files used internally by GEO 600 and KAGRA. The channel names in the files available to the public can be found in the paragraph "GWF Channel Names". It was decided to use a homogenous channel name style for the five detectors.

  • The strain data are made available both at 16384 Hz and 4096 Hz sample rates. Users should choose which sampling rate is most appropriate for their search. The data quality (DQ) is less well studied above 2 kHz. KAGRA data are not calibrated or valid below 30 Hz or above 1500 Hz, and the data sampled at 4096 Hz are not valid above about 1500 Hz. GEO 600 data are not calibrated or valid below 40 Hz or above 6000 Hz, and the data sampled at 4096 Hz are not valid above about 1620 Hz. Of course, the down-sampled dataset is smaller, reducing both the download time and storage requirements.

  • In the 4096 Hz data set, the use of an anti-aliasing filter corrupts the data near the Nyquist frequency. For studies involving frequencies of around 1500 Hz or above for KAGRA (and 1600 Hz or above for GEO 600), the 16384 Hz data should be used instead.

  • The down-sampling is done using the python package scipy, with the method scipy.signal.decimate.

  • In most searches for astrophysical sources, data below 20 Hz are not used because the noise is too high.

  • Our hdf5/frame files have fixed duration (4096 seconds) and boundaries. Before downsampling the data from 16384 Hz to 4096 Hz, for each file a padding of 8 seconds is requested to avoid border effect. However, these data are not always available so in some cases tiny border effects could still be present in the 4096 Hz data.

  • We provide Timelines and My Sources to aid the user in finding data (including DQ and HWinj info) from a particular time, instead of segDB queries. From Timeline, you can see multiple DQ and Injection flags, zoom in, and download segments.

  • The data quality (DQ) and hardware injections (HW) are summarized in 1 Hz vectors, in both the hdf5 and frame files. See the bit mask definition for details. The bit mask definition is equivalent for the files sampled at 16 kHz and at 4 kHz for O3GK. This tutorial shows how to work with the DQ bit mask (in the tutorial use the flags for the O3GK run from the bit mask definition).

  • Meta-data on each file includes an estimate of the Binary Neutron Star (BNS) Range, as seen on the O3GK_4KHZ_R1, O3GK_16KHZ_R1, O3GK_4KHZ_R1 and O3GK_16KHZ_R1 archives, with the "includes statistics of each file" option. The PSD is calculated as the median average of overlapping periodograms.

Noise Subtraction

After data collection, a couple of independently-measured technical contributions to the detector noise were subtracted from the GEO 600 data. These sources include the laser amplitude fluctuations and noise from signals used for the longitudinal control of the signal recycling mirror and those used for the auto-alignment of the Michelson mirrors. In addition, the broadband noise contributions due to bilinear couplings from the SR longitudinal control were also removed from the data.

For reference, see:

GWF Channel Names

The O3 4KHZ and 16KHz GWF files (ending with extension .gwf) use the channel names in the table below:


Channel names found inside GWF files

O3GK (4KHz samples per second) O3GK (16KHz samples per second)
{ifo}:GWOSC-4KHZ_R1_STRAIN {ifo}:GWOSC-16KHZ_R1_STRAIN
{ifo}:GWOSC-4KHZ_R1_DQMASK {ifo}:GWOSC-16KHZ_R1_DQMASK
{ifo}:GWOSC-4KHZ_R1_INJMASK {ifo}:GWOSC-16KHZ_R1_INJMASK
NOTES:
  • {ifo} is a place holder for either G1 or K1, e.g., G1:GWOSC-16KHZ_R1_STRAIN or K1:GWOSC-16KHZ_R1_STRAIN.
  • The _R1_ substring in the O3GK channel names represents the revision number of the named channel.
  • The HDF5 group names for O3 are the same as used on S5, S6, O1 and O2. However, a new meta channel attribute has been added to the HDF5 structure that captures the associated channel name found in the GWF files outlined in the table above.

Notes about the DATA flag

See the Defining the DATA Flag page.

O3 Hardware Injections

Hardware injections that appear as (simulated) gravitational wave signals in the data. However, the GWOSC data does not mask out data that has injections -- rather lists are provided of those. See the O3 Hardware Injection page for details.

Segment lists of hardware injections may include times when data are not publicly available. Details of these injections are not included in the documentation.

The O3GK data do not contain any hardware injections for both G1 and K1.

Data Quality

Data quality categories, or flags, are defined by each analysis group: Compact Binary Coalescence (CBC), Burst, Continuous Waves (CW) and Stochastic. This is because periods of noisy data will affect each type of analysis differently. For each flag, GWOSC data files contain a corresponding 1 Hz time-series that marks times that pass the flag as a "1" (good data), and times that fail the flag as a "0" (bad data). A full list of O3 data quality categories can be seen on the O3GK data quality definitions page (for the release at 16 kHz) (the data quality categories for O3GK and for the files at 4 kHz are identical). Data quality is described in these categories:
  • DATA (Data Available): Failing this level indicates that GW strain data are not publicly available because the instruments were not operating in an acceptable condition. For O3, DATA is equivalent to Category 1.
  • CAT1 (Category 1), CAT2 (Category 2), CAT3 (Category 3): See O2/O3 LIGO Detector Characterization Paper

In general, data quality levels are defined in a cumulative way: a time which fails a given category automatically fails all higher categories. For example, if the only known problem with a given time fails a burst category 2 flag, then the data is said to pass DATA and BURST_CAT1, but fails BURST_CAT2 and BURST_CAT3. However, the different analysis groups are independent: if something fails at CAT2_BURST, then it may still pass CAT2_CBC.

These graduated categories of quality allow a data pipeline to adjust its behavior depending on the data quality. An example is running the numerical search (template matching) against all the data segments that pass CAT1, but ignoring any candidate events from data that do not pass CAT3. This strategy allows long sections of data to be used, increasing search efficiency.

For information on how to use data quality information:

  • Step 3 and Step 4 of the introductory tutorial show how to apply data quality flags
  • Data Quality definitions for the O3GK data set (the link refers to the 16 kHz files of O3GK, the data quality categories for the files at 4 kHz are identical).
  • Plot and download segment lists from Timeline for O3GK


MD5 Check Sums