Tutorial
Home Index

Working With Data

Overview

This tutorial introduces some of the basic tools for working with superdarn data. To understand how the toolset works, you must first know a little about how UNIX operates.

Most users are familiar with the concept of running a program to perform some action on a file, for instance using gzip to compress a file. However UNIX also lets you run utilities on data streams. Every program has an input stream and an output stream, called "standard input" and "standard output". Usually standard input is taken from the users keyboard and standard output is written to the console. When you type a command like "cat README.txt", the command is actually reading the data from the file and then writing it to standard output.

In UNIX it is possible to redirect where the standard input and output streams read and write to by using the redirection operators "<" and ">". For instance, to display the contents of a file you could use:

>cat < README.txt

The "<" redirection operator reads the file "README.txt" and feeds it to the standard input of cat. Similarly you can write output to a file by using:

>cat README.txt > READMECOPY.txt

The ">" redirection operator writes the standard ouptut of cat to the file "READMECOPY.txt", effectively copying the file "README.txt".

Almost all UNIX utilities can use standard input and output instead of files. The real power of streams comes from the concept of the UNIX pipe. A pipe connects the standard output of one process to the standard input of another and is indicated by the "|" symbol. A pipe lets your string together multiple commands into a single operation:

>who | sort | lpr

The simple example above lists who is logged onto the system, sorts them into alphabetical order and prints the list on the default printer.

Most of the SuperDARN tools work both on files and on the standard input and output streams.

Filenames and suffixes

It is important to understand that UNIX offers great flexibility in naming files and filename suffixes do not necessarily determine the type of a file. The convention is that certain suffixes are used for certain types of file, for example "gz" indicates that the file has been compressed using gzip, but the user is free to deviate from this convention. Similarly, SuperDARN files have a naming convention, documented here, but the user is free to call a file anything they want.

This UNIX convention means that the utilities cannot infer the type of the file from its name and any information that is required to determine its type must be supplied by the user.

Old and new data formats

The two principal types of SuperDARN data are the raw ACF data derived from the radar observations and the fitted paramaeters generated by the FitACF algorithm. In the past the raw ACF data has been stored in dat files and the fitted parameter are stored in fit files. However a new self-describing format has been developed and raw ACF data will be stored in "rawacf" files and fitted parameters will be stored in fitacf files.

Some utilities in the software are only applicable to old format files and some are only applicable to the new. When a utility can be applied to both, the command line option "-new" is used to indicate that the utility is working on a new format file.

Concatenating Files

Individual SuperDARN data files normally contain up to two hours of data, so one of the first jobs that a user will want to do is concatenate these short files together to create larger more useful files.

How this cocatenation is done depends on whether the data is in the old or new format.

Old Format Data

Old format fit and dat files contain header information that must be created when the files are concatenated. Consequently files in the old format require a special tool to cocatenate them together.

To concatenate together a dat file, use the tool catraw:

catraw inpfile1 inpfile2.... outfile

The program takes as command line arguments the list of files to concatenate; the final command line argument is the output filename. It's a lot of work to have to type the name of each file on the command line, so you can take advantage of UNIX wildcard filename expansion:

>catraw 20021200.*.kap.dat .... 200021200.kap.dat

Note: Be careful when using wildcards that the pattern doesn't turn up a match for the output file. If the filenames follow the SuperDARN naming convention, this will not be a problem, but if they do not, place a trailing "C" on the end of the ouput filename.

To concatenate together a fit file, use the tool catfit:

catfit inpfile1 inpfile2.... outfile

New format files

New format fitacf and rawacf files can be concatenated together by using the stanard UNIX cat command:

cat 20021200.*.kap.fitacf > 20021200.kap.fitacf

Creating Index Files

SuperDARN data files can be very large and to speed up searching through the file for a particular time, you can create an index file.

Old format dat files do not have an associated index file as this feature was never implmented, but new format rawacf files do have indices. You can create an index file for a rawacf file using the utility make_rawinx:

make_rawinx -new [-vb] [inpfile]

Note: For consistency with the rest of the software, make_rawinx still requires you to indicate that this is a new format file by including the command line option "-new". If you leave out this option the program will conclude that you are using a dat file and report an error. (In the future it is possible that indices for dat files will be implemented).

The program will either read from the file specified on the command line or from standard input and write the index to standard output:

>make_rawinx -new 20021200.kap.rawacf > 20021200.kap.rawinx
>cat 20021200.kap.rawacf | make_rawinx -new > 20021200.kap.rawinx

To create an index file for a fit file, use the utility make_fitinx:

make_fitinx [-vb] inpfile outfile

The program will read the fit file from the first file given on the command line and write the index to the second:

make_fitinx 20021200.kap.fit 20021200.kap.inx

You use the same utiltity to create an index for a fitacf file, but with a slightly different syntax:

make_fitinx -new [-vb] [inpfile]

As with make_rawinx, the program will either read from the file given on the command line or from standard input and write the index to standard output.

>make_fitinx 20021200.kap.fitacf > 20021200.kap.fitinx
>cat 20021200.kap.fitacf | make_fitinx -new > 20021200.kap.fitinx

The filename suffix for fit indices is inx, for rawacf and fitacf, the suffixes are rawinx and fitinx respectively.

Trimming Files

Often you will only be studying a short interval within a data file. It would be useful if you could trim larger files down so that they contain just the interval of interest; this will make the files smaller and easier to work with.

To do this you can use the utilities trim_raw and trim_fit.

These utilities have a number of command line options, but the most important ones are used to specify the interval to extract from the file. You can specify either a start time and an end time, or a start time and an extent. If the file consists of one day of data then only the UT time must be specified, if the file extends beyond one day then you may also have to specify a start date and an end date. The start time is specified using the "-st hr:mt" option, where hr is the UT hour and mt is the ut minute; the end time is similarly defined using "-et hr:mt". If you decide to specify an extent of time to extract, use the "-ex hr:mt" option. The start date is specified using the "-sd yyyymmdd" option, where yyyy is the four digit year, mm mm is the two digit month and dd is the two digit dat; similarly the end date is specified using the "-ed yyyymmdd" option.

The syntax of trim_raw and trim_fit depends on whetther you are working with old or new files.

Old Format Data

The syntax of trim_raw when working with dat files looks like this:

trim_raw [-vb] [-t thr] [-sd yyyymmdd] [-st hr:mt] [-ed yyyymmdd] [-et hr:mt] [-ex hr:mt] inpfile outfile

The "-t thr" sets the threshold level for the ratio of signal to power; ranges with a lag zero power to noise ratio less than this value are not stored. The default value for this ratio is 3. The dat file specified by inpfile is trimmed and written to the file outfile:

>trim_raw -sd 20021220 -st 10:30 -ex 0:30 20021220.kap.dat 20021220.1030.00.kap.dat
>trim_raw -sd 20021220 -st 8:00 -et 10:00 20021220.kap.dat 20021220.0800.00.kap.dat

The syntax of trim_fit when working with fit files looks like this:

trim_fit [-vb] [-sd yyyymmdd] [-st hr:mt] [-ed yyyymmdd] [-et hr:mt] [-ex hr:mt] [-i] inpfile [inpinx outfile [outinx]

The fit file specified by inpfile is trimmed and written to the file outfile. An option index file given by inpinx can also be included to speed up the process. If the "-i" option is given then the program will automatically create an index file and write is to the third specified file inxfile:

>trim_fit -sd 20021220 -st 10:30 -ex 0:30 -i 20021220.kap.fit 
   20021220.kap.inx 20021220.1030.00.kap.fit 20021220.1030.00.kap.inx
>trim_fit -sd 20021220 -st 8:00 -et 10:00 20021220.kap.fit 20021220.0800.00.kap.fit

New Format Data

The syntax of trim_raw when working with rawacf files looks like this:

trim_raw -new [-vb] [-t thr] [-sd yyyymmdd] [-st hr:mt] [-ed yyyymmdd] [-et hr:mt] [-ex hr:mt] [inpfile] [inpinx]

The "-t thr" sets the threshold level for the ratio of signal to power; ranges with a lag zero power to noise ratio less than this value are not stored. The default value for this ratio is 3. The rawacf is read either from the file specified by inpfile or from standard input and is written to standard output. An optional index file can be included to speed up the process:

>trim_raw -new -sd 20021220 -st 10:30 -ex 0:30 20021220.kap.rawacf 20021220.kap.rawinx > 20021220.1030.00.kap.rawacf
>cat 20021220.kap.rawacf | trim_raw -new -sd 20021220 -st 8:00 -et 10:00 > 20021220.0800.00.kap.rawacf

The syntax of trim_fit when working with fitacf files looks like this:

trim_fit -new [-vb] [-sd yyyymmdd] [-st hr:mt] [-ed yyyymmdd] [-et hr:mt] [-ex hr:mt] inpfile [inpinx]

The fitacf is read either from the file specified by inpfile or from standard input and is written to standard output. An optional index file can be included to speed up the process:

>trim_fit -new -sd 20021220 -st 10:30 -ex 0:30 20021220.kap.fitacf 20021220.kap.fitinx > 20021220.1030.00.kap.fitacf
>cat 20021220.kap.fitacf | trim_fit -new  -sd 20021220 -st 8:00 -et 10:00 > 20021220.0800.00.kap.fitacf

Creating fit and fitacf files

The program make_fit is used to create fit and fitacf files. The syntax of the command when creating fit files is:

make_fit [-vb] datfile fitfile [inxfile]

The dat file is read from datfile and the fit file is written to fitfile. An index file is created if the optional filename inxfile is included on the command line:

make_fit 20021220.kap.dat 20021220.kap.fit 20021220.kap.inx

The syntax of the command when creating fitacf files is:

make_fit -new [-vb] [inpfile]

The rawacf file is read from either inpfile or from standard input and the fitacf file is written to standard output:

make_fit 20021220.kap.rawacf > 20021220.kap.fitacf
cat 20021220.kap.rawacf | make_fit > 20021220.kap.fitacf

Note:You can only make a fit file from a dat file, and you can only make a fitacf file from a rawacf file. You can always convert to a different file type using the conversion utilities.


Back