Detecting and Correcting Bad Data

27 views
Skip to first unread message

James Bretz

unread,
Dec 8, 2009, 4:47:22 PM12/8/09
to ia...@googlegroups.com
Hello,

I've noticed that there are a large number of people lately concerned with
correcting bad data (spikes, etc). It seems to be a hot topic. Anyway, I
just wanted to write some background info about the subject down to help get
everyone on the same page.

When we see spikes or bad data, sometimes it's hard to pinpoint the source
and correct the problem. There can be quite a few areas in the overall test
"pipeline" that could cause this problem; from sensors to collection,
transmission to receiver, calibration and eu conversion, to actual display
and analysis.

For the purposes of this discussion, let's focus in on the actual transmit
and receive portion of this issue. Generally, this situation comes about due
to the fact that we're connected to the test vehicle remotely. The test
vehicle can be potentially many many miles away from the receiving station
(or worse behind an obstruction), and the communication fails between the
test vehicle and control room at an alarming rate. When the communcation
system fails, you can be sure that bad data will follow and make it's way to
your screen/analysis. This often results in TestPoints being reflown, or in
extreme situations, bad decisions being made that lead to bigger trouble.

At this point in time, it is very difficult to eliminate bad data 100% due
to remote transmission. In fact, it's hard to get even close. I'm sorry to
say that it's just an inherent bug if you will (or more like a lack of
foresight) on the part of the telemetry manufacturers.... but there may be a
way in the near future. The protocol that we've chosen to send/receive data
doesn't lend itself very well to detecting and/or correcting when a problem
does occur.

Here is a very general and simplistic overview of the path of the data
between the aircraft and the ground for purposes of this discussion. I'm
leaving out all the details so we can get right to the problem at hand. If
you want something a little more technically grounded, there is a nice intro
here (http://www.l-3com.com/tw/tutorial/telemetry_system_overview.html).


1) Logically, data flows from aircraft to the ground PCM system "row by
row". Each row of data of data is referred to as "minor frame" or
"sub-frame". "N" number of these minor frames make up a "major frame".

2) Each minor frame (row) is bounded by two "bad data" indicators. Usually,
at the start of the minor frame is a complex bit pattern or "sync pattern".
At the end of the minor frame is an item called a sub-frame ID (which is
just an integer counter + 1 each subframe, 0 at first subframe).

3) The way the system detects bad data is if one of these markers (sync
pattern or subframe id) is bad. This "status" is reflected in IADS as the
DecomStatus parameters (_IadsDecomStatusN_ and their corresponding system
based parameters).

Now here's the real problem:

4) Data corruption can occur anywhere within a "minor frame" (subframe) but
it will not be detected by the "bad data indicators". Let's say that the
minor frame has 100 words in it. If the first word (sync) and the last word
(subframe id) are good, then the whole minor frame is considered good. That
means the whole entire payload of the minor frame can be absolute garbage
and still be deemed "good". That scenerio is not likely, but what is very
likely is that some percentage of the data within the minor frame is
bad..... and if that happens to be your parameter then you're SOL.

5) In addition to these issues, each TPP (PCM processing system) has it's
own inherent quirks. Some systems don't show bad data through DecomStatus
until *after* it has occured.... while others show before or during. There
is also other issues of alignment and timing between the data and these
indicators. The bottom line is that the error detection is rather poor
across the board.


Having said all of this, we feel that the solution to all these issues is
for the telemetry manufacturers to incorporate a sort of "CRC" code into
each minor frame (http://en.wikipedia.org/wiki/Cyclic_redundancy_check).
With this CRC information, the TPP or Iads could perform a CRC check on all
the incoming data in each minor frame and thus eliminate almost all the bad
data issues due to transmit/receive. There is some discussion going on right
now as to this subject... and it might be possible to achieve this with the
current systems... but it's certainty is unknown as of now.

There are other techniques already available on current system (parity), but
are very bandwidth costly and thus are rarely used. Even when activated, it
doesn't provide an acceptable level of protection in my opinion
(http://en.wikipedia.org/wiki/Parity_bit).

Of course, the discussion is open to anyone to come up with some substitute
solution. The aim here is to virtually elliminate bad data due to
transmission errors.

Jim


bkelly

unread,
Jan 22, 2010, 10:51:37 PM1/22/10
to IADS
I neglected to include a few links. As usual, Wiki has numerous
articles on this topic.
http://en.wikipedia.org/wiki/Hamming_code
http://en.wikipedia.org/wiki/Berger_code
http://en.wikipedia.org/wiki/Error_detection_and_correction
http://simple.wikipedia.org/wiki/Reed-Solomon_error_correction
http://en.wikipedia.org/wiki/Viterbi_algorithm

I think that error correcting more than makes up for its bandwidth
consumption.
I am not qualified, but there are certainly arguments to be made
regarding the span of error correction schemes. Too small and it
consumes too much bandwidth for the extra bits. Too large and a big
hit looses more data.
Bryan

James Bretz

unread,
Jan 23, 2010, 1:06:45 AM1/23/10
to IADS
Thanx Bryan,

I agree. Attempting to correct the data based off of one of the schemes
above can be bandwidth intensive. Just being able to detect the error
reliably would be a step in the right direction. We can't even do that right
now.

A simple CRC (http://en.wikipedia.org/wiki/Cyclic_redundancy_check) with an
IEEE recommended 32 bits for a minor frame would help and not take too much
bandwidth.

As far as correction, once it is detected properly we can use a simple LGV
(last good value) or other scheme. Most of the problem right now is that you
can't rely on the data given... and garbage data slips through frequently.
This wreaks havoc on the engineers in the control room and piles even more
responsibility on their shoulder (as if it wasn't stressful enough).

Maybe someday, someone will step up to the plate and give us a reliable
communication mechanism from the ground up, but until then, at minimum, we
should be able to detect an error with a higher level of reliability than we
currently have (which in my opinion is pretty much nothing).

Jim

Reply all
Reply to author
Forward
0 new messages