Tutorial: Getting Data Automatically
In this tutorial, we'll fetch LIGO data associated with specific times, doing it all by code and web services rather than by clicking.
Data discovery automatically
Let us assume we want to investigate a number of astrophysical events, each precisely timed, and we want to run a search in the minutes around each event. Suppose we have GPS times for each event. As an example, we can take the times of the 22 gamma-ray bursts that happened during LIGO's S5 run, as listed here, taken from this paper. Here are the first few, written as a python dictionary:
Obviously we could make a dictionary like this in many ways: the data model is an 'event' as a pair: (GPStime, Name), and we have a list of N of these. Even though the list above has only 22 events, let us try to build the software for N >> 1.
Fetching Timeline data: Is There Good Data?First we will want to find out which detetctors were operating when, or more specifically, when are they operating with good data quality. You can code against this with the JSON version of the Timeline tool. Here is code to make this decision. It asks for the duty cycle of the CAT2 data for H1, in 128 seconds surrounding the given time:
Thus we can select for further study only those GRBs where there is good data (passes CAT2) from both detectors (H1 and L1).
Fetching the Strain DataFor those GRBs where there is good data, we can download the 4096-second HDF5 files of strain data. These will be up to 120 Mbyte in size.
Now that we have the data files, we can go back to the earlier tutorials, read in the data, and run a search at the times of the GRBs.
Fetching the Data CatalogAnother way to look at data quality is through the GWOSC data catalog. For each of the 4096-second data files, there are statistics available about the proportion of time (duty cycle) that the various data quality flags are on, similar to the 128-second averages computed above from the Timelines. There are also gross statistics about the strain vector during the time of the file, the minimum, maximum, mean, and standard deviation; also three band-limited RMS in three bands: 30-40 Hz, 40-100 Hz, and 100-1000 Hz.
There is another tutorial about automatically downloading all the data from a Run, or all the open data from all Events.