Difference between revisions of "UWAR:DataFormat"

From PublicWiki
Jump to: navigation, search
 
(Numeric Data)
 
Line 437: Line 437:
 
==== Numeric Data ====
 
==== Numeric Data ====
  
 
+
Talk to bdferris@cs about writing tables of fixed-timestamp inverval ("numf") and variable-timestamp-interval ("numv") numeric values to a trace (or see the source).
  
 
==== MSB Data ====
 
==== MSB Data ====

Latest revision as of 18:24, 12 February 2007


The UWAR group has defined a file format for recording streaming sensor data into structured trace formats for aggregation and processing (both online and offline) of disparate sensors.

Conventions

All numeric values are stored in Little Endian format. As such, the decimal integer 1234567890, which is 0x499602D2 in hex, would be encoded as four sequential bytes:

 D2 02 96 49

Version 1.0

See Evan's original notes: http://www.cs.washington.edu/homes/evan/uwar_format.txt

Key Points:

  • Sensor traces will be collected as "runs"
    • A run can be thought of as a trace collection session and may typically span the length of a day
    • Each run is stored as a sequence of 30 min .UWAR files
  • The basic file structure will be: a general header, followed by header information for each sensor stream included in the run, followed by data packets from each stream in <stream-type flag><payload> format.
  • In addition to one stream for each sensor, there will be a meta-data stream which contains information about the run and the file, and a synch-history stream that contains information about when and how the device uploaded data.

Stream Header

The data stream is started off with a header section that contains data about the stream as a whole and allows individual data sections for each of the sensor streams present in the complete stream.

Index Length Description
00 - 03 4 The UWAR file id, should always be the four character sequence "UWAR"
04 - 07 4 An integer specifying the data format version; That would be version 1 in our case.
08 - 11 4 An integer specifying the number of individual sensor stream headers present in the header section.
12 - 15 4 An integer specifying the total number of bytes taken up by the individual sensor stream headers.

Individual Sensor Stream Headers

The stream header has specified the number of individual sensors stream headers and their length in bytes. Immediately following are the individual sensor stream headers themselves. The following documents a single sensor stream header. A parser implementation can either choose to count the number of headers seen or the number of header bytes seen for determining when the end of the sensor stream headers data section has been reached.

Index Length Description
00 - 03 4 The sensor stream id: a unique identifier assigned for each sensor stream type.
04 - 05 2 A two-byte specifying the fixed length of data packets for this sensor stream. If the sensor has variable length packets, this value should be zero (0x00 00). Fixed-length packets allow for a more compact packet format, as described below (note that the fixed length should include the packet timestamp).
06 - 07 2 The sensor stream symbol. This is a unique identifier assigned to each sensor stream that will be used to match subsequent packets to the appropriate stream. These identifiers are arbitrarily assigned by the trace writer and will only be applicable for the given trace, not across traces.
08 - 11 4 An integer specifying the total size of the header, including the 16 bytes of structured sensor stream header data plus the number of bytes of custom header data following the structured portion of the sensor stream header.
12 - 15 4 An integer specifying the version of the sensor stream.
16 - xx variable Raw custom header data, whose length is determined by the header size field mentioned above (be sure to subtract 16 to account for the bytes occupied by the rest of the header).

Sensor Packet Data

The header section is now complete, and we are ready for the actual sensor stream data. Data points are broken up into packets. As mentioned previously, packets can be of either fixed length or variable length, as determined by the sensor stream header. The main difference is that variable length packets have an additional length field, while fixed length packets do not. Packets are simply read until EOF is reached. There are no guarantees about in what order the packets will be written to the stream. A variable length packet is structured as follows:

Index Length Description
00 - 00 1 A single byte indicating the sensor stream symbol of the packet (see previous section). This provides the link between the packet and its sensor stream type.
01 - 04 4 An integer specifying the length of the packet data in bytes.
05 - 08 4 An integer timestamp that currently specifies the number of milliseconds that the host machine running the trace writer has been active (since boot-time). Thus, this number isn't really useful as an absolute measure of time, but more as a relative measure of time.
09 - xx variable The raw packet data. The length of the raw packet data is equal to the total packet data length specified above, minus four bytes for the timestamp.

A fixed-length packet is a little simpler:

Index Length Description
00 - 00 1 A single byte indicating the sensor stream symbol of the packet (see previous section). This provides the link between the packet and its sensor stream type.
01 - 04 4 An integer timestamp that currently specifies the number of milliseconds that the host machine running the trace writer has been active (since boot-time). Thus, this number isn't really useful as an absolute measure of time, but more as a relative measure of time.
05 - xx variable The raw packet data. The length of the raw packet data is equal to the fixed packet data length specified in the header, minus four bytes for the timestamp.


Version 2.0

Key Changes:

  • Timestamps are now a permanent part of headers and packets
  • Drop the field specifying the total number of bytes occupied by all the stream headers in the UWAR header section. This info is redundant, because we can already parse the headers using the stream count field. Additionally, it would save us from having to load all the headers into memory for determining their size before writing them out.
  • Convention that when we have a variable-length data field, the field specifying the length of that field should always proceed the data and should specify the length of that data and nothing else. This clears up some weirdness in the protocol such as where an individual stream header length is followed by the stream version and then the actual data, and the length refers to the size of the entire header and not just the subsequent data. Same goes for packets. Since every packet is expected to have a timestamp, I would move it before the packet length field and not included it in the packet length calculation.
  • Standardize the storage size of the stream symbol as a byte throughout the protocol
  • Standardized, extensible meta-data encoding scheme

Meta Data

We introduce a general mechanism for encoding meta-data in a sensor stream, generally modeled as an Id-Type-Length-Value quadruplet which we refer to henceforth as an atom. A single atom is structured as follows:

Index Length Description
00 - 03 4 An integer id for the given atom (see the UWAR:DataFormat:MetaDataNameSpace entry for registered ids)
04 - 04 1 A byte identifying the type of the atom.
05 - 08 4 An integer specifying the length of the value section.
09 - xx variable The value data for the atom, where the atom's type determines the structure of the value data.

There are a currently a number of supported atom types:

Type (hex) Type (ascii) Description
0x63 'c' Atom container. The value section will be composed of sub-atoms, the total length of which is determined by the value field of this parent atom.
0x69 'i' Numeric value. The length of the atom value field specifies the number of bytes of precision for the number, encoded in Little Endian form.
0x73 's' String value. The length of the atom value field specifies the number of bytes of UTF-8 string data.
0x62 'b' Raw bytes. The length of the atom value field specifies the number of bytes of data.

Consider the following examples. If we wished to encode a four byte integer with an atom id 4 and value 1234, we would get the following atom:

   04 00 00 00    69 04 00 00
   00 d2 04 00    00

If we wished to encode the string "México" with an atom id of 8, we could get the following atom:

   09 00 00 00    73 07 00 00
   00 4d c3 a9    78 69 63 6f

If we wish to wrap the previous two atoms into a container atom with id of 12, we get:

   0c 00 00 00    63 1d 00 00
   00 04 00 00    00 01 04 00
   00 00 d2 04    00 00 09 00
   00 00 02 07    00 00 00 4d
   c3 a9 78 69    63 6f

Using nested atoms, we should be able to encode arbitrarily complex structures of meta-data. It is basically a matter of selecting appropriate IDs and types to build the tree of data. This meta-data encoding scheme will be used throughout the actual UWAR data format, as described below.

Main UWAR Header

The data stream is started off with a header section that contains data about the UWAR data stream as a whole, including timing info, global meta-data, and sub-headers describing the individual sensors carried in the full data stream.

Index Length Description
00 - 03 4 The UWAR file id, should always be the four character sequence "UWAR"
04 - 07 4 An integer specifying the data format version; That would be version 2 in our case.
08 - 15 8 A long timestamp specifying the time when the trace was started, measured in milliseconds since the epoch.
16 - 19 4 An integer specifying the number of meta-data atoms contained in the header. The atoms (if any) immediately follow.
20 - xx variable Custom meta-data written in the atom-encoding scheme mentioned above. This variable-length data region should contain the number of atoms specified by the previous field in sequential form. The region will be empty and have no length if the number of atoms is zero.
[xx+1] - [xx+3] 4 An integer specifying the number of individual sensor stream headers present in the header section. The sensor stream headers follow immediately.


As described above, the header starts of with identifier and timing info for the stream. Next, (optional) meta-data atoms are encoded in the header for any appropriate meta-data for the full stream. Meta data may include the device id, encoded as a string atom with id 0x64697774 ("twid"). Finally, an integer specifies the number of individual sensors carried in the stream, and a sensor header describing each follows immediately in the data stream, as described below.

Individual Sensor Stream Headers

The main UWAR header has specified the number of individual sensors stream headers. Immediately following are the headers themselves. The following documents a single sensor stream header.

Index Length Description
00 - 03 4 The sensor stream id: a unique identifier assigned for each sensor stream type.
12 - 15 4 An integer specifying the version of the sensor stream.
04 - 05 2 A two-byte specifying the fixed length of data packets for this sensor stream. If the sensor has variable length packets, this value should be zero (0x00 00). Fixed-length packets allow for a more compact packet format, as described below (note that the fixed length should include the packet timestamp).
06 - 07 2 The sensor stream symbol. This is a unique identifier assigned to each sensor stream that will be used to match subsequent packets to the appropriate stream. These identifiers are arbitrarily assigned by the trace writer and will only be applicable for the given trace, not across traces.
08 - 11 4 An integer specifying the number of meta-data atoms contained in the sensor stream header. The atoms (if any) immediately follow.
12 - xx variable Custom meta-data written in the atom-encoding scheme mentioned above. This variable-length data region should contain the number of atoms specified by the previous field in sequential form. The region will be empty and have no length if the number of atoms is zero.

Sensor Packet Data

The header section is now complete, and we are ready for the actual sensor stream data. Data points are broken up into packets. As mentioned previously, packets can be of either fixed length or variable length, as determined by the sensor stream header. The main difference is that variable length packets have an additional length field, while fixed length packets do not. Packets are simply read until EOF is reached. There are no guarantees about in what order the packets will be written to the stream. A variable length packet is structured as follows:


Index Length Description
00 - 01 2 The sensor stream symbol of the packet (see previous section). This provides the link between the packet and its sensor stream type.
02 - 05 4 An integer timestamp for the packet, specifying the number of milliseconds that have passed since the start of the sensor trace. By adding this timestamp to the timestamp in the trace header, one can determine the absolute time of the packet.
06 - 09 4 An integer specifying the length of the packet data in bytes.
10 - xx variable The raw packet data. The length of the raw packet data is specified by the previous field.


A fixed-length packet is a little simpler:


Index Length Description
00 - 01 2 The sensor stream symbol of the packet (see previous section). This provides the link between the packet and its sensor stream type.
02 - 05 4 An integer timestamp for the packet, specifying the number of milliseconds that have passed since the start of the sensor trace. By adding this timestamp to the timestamp in the trace header, one can determine the absolute time of the packet.
06 - xx variable The raw packet data. The length of the raw packet data is specified by the fixed length field in the sensor header.


Clock Sync Data

  • Sensor ID: 0x6b636c63 ("clck")
  • Sensor Version: 0x00000001
  • Sensor Description: Records the difference in timestamps between two devices
  • Fixed length: 16 bytes


Index Length Description
00 - 03 4 Integer specifying the number of seconds since epoch on remote device
04 - 07 4 Integer specifying the number of micro-seconds (us) since epoch on remote device
08 - 11 4 Integer specifying the number of seconds since epoch on local device
12 - 15 4 Integer specifying the number of micro-seconds (us) since epoch on local device


Numeric Data

Talk to bdferris@cs about writing tables of fixed-timestamp inverval ("numf") and variable-timestamp-interval ("numv") numeric values to a trace (or see the source).

MSB Data

  • Sensor ID: 0x2062736d ("msb ")
  • Sensor Version: 0x00020000
  • Sensor Description: Records raw MSB frames as read from the MSB device. Frames (and packets accordingly) are 78 bytes long
  • Fixed length: 78 bytes

GPS Data

  • Sensor ID: 0x20677073 ("gps ")
  • Sensor Version: 0x00020000
  • Sensor Description: Contains readings from a GPS unit. No current support in the UWAR:Tools:IO package, so see UWAR:Tools:TraceWriterLibraryCE source for more info.

LatLon Data

  • Sensor ID: 0x6e6c746c ("ltln")
  • Sensor Version: 0x00020000
  • Sensor Description: Records latitude, longitude, and altitude location data.

The LatLon sensor encodes a coordinate location with latitude, longitude, and altitude. It is simpler than the GPS sensor in that it encodes no other information about the location trace. The reading is encoded as 24 raw bytes, with eight bytes for each of three values in lat-lon-alt order. Each value is typically a double, and is converted to it's 8-byte value by first encoding the double as a long value, as described in the Java API for Double. The long value is the encoded as 8-bytes in Little Endian order.

Wifi Data

  • Sensor ID: 0x69666977 ("wifi")
  • Sensor Version: 0x00020000
  • Sensor Description: Records the signal strength of visible Wifi access points

The Wifi Header meta-data can optionally include the following atoms:

  • Wifi Adapter Name : id="wfan" (0x6e616677) type="s" - specifies the String name of the wifi adapter used to take the given readings. Useful for tracking which adapter produced what readings

Each Wifi packet is composed of a four-byte integer specifying the number of access points to follow. Each access point is represented a ten-byte field, where the first six-bytes represent the BSSID of the AP in raw hex form and the remaining four-bytes represent the RSSI as an integer. Thus, a reading with 4 access points will have a total length of 44 bytes.

Annotation Data

  • Sensor ID: 0x38746e61 ("ant8")
  • Sensor Version: 0x00020000
  • Sensor Description: Designed to encapsulate arbitrary annotation entries for a sensor trace. Ideal for annotating activities like "started walking" or "phone call" or whatever.
  • Variable length

Each annotation packet starts of with a single 'type byte:

  • '+' : indicates that an activity started
  • '-' : indicates that an activity stopped
  • '=' : indicates that an activity occured
  • 'm' : indicates a meta-data annotation
  • 's' : indicates a UTF-8 string annotation

Each of the '+' '-' and '=' annotations are followed by a Annotation Label ID (4 bytes) - a four byte identifier (enough to store an integer) for the annotation. The assignment and semantics of the annotation id are up to user.

The meta-data annotation is followed by a four-byte integer specifying the number of meta-data atoms, which is then followed by the meta-data atoms themselves.

The string annotation is followed by a four-byte integer specifying the length of the UTF-8 encoded string in bytes, which is then followed by the actual bytes of the string.

Audio Statistics Data

  • Sensor ID: 0x61756473 ("auds")
  • Sensor Version: 0x00020000