The 2017 VAST Challenge MC2 required the analysis of sensor data that detected airborne pollutants on the periphery of a wildlife reserve. The objective was to identify possible sources of the detected pollutants from the spatio-temporal patterns in sensor readings.
This page provides details of the giCentre's solution to the challenge, which won the award 'Comprehensive Mini-Challenge 2 Answer'.
Team Members: Jo Wood
How many hours spent working on submission: Approximately 10 hours to construct software and perform analysis. A further 10 hours assembling the report and video. Additional software written for other VAST Challenges was also used.
A short paper Visual Analytic Design for Detecting Airborne Pollution Sources outlines some of the design challenges for this kind of problem and the video below show the software we built to help answer the challenge questions.
MC2.1 Characterize the sensors' performance and operation. Are they all working properly at all times?
Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?
The nine sensors, each measuring four chemical concentrations, are generally functionally continuously during the three month-long sample periods. Readings are logged at hourly intervals 24 hours per day. Figure 1 shows the sensor readings (9 sensors; 4 chemical detectors each; 3 month-long periods) with exceptions where there are unexpected gaps in the logged readings symbolised as red discs with red vertical lines to aid accurate time comparison with other features. In this figure and the others below, the vertical scale measures the square root of chemical concentration in parts per million so can show variation among the less extreme values in addtion to spikes in chemical concentration.
There are two broad patterns to the apparent absences represented by red discs. The first of these shows certain timestamps where missing data points coincide. Figure 2 shows there are 7 points in time where there is largely an absence of recordings - midnight at the start of 2nd April, 6th April, 2nd August, 4th August, 7th August, 2nd December and 7th December. The only readings at these times (circled in Figure 2) were on the 1st August for Sensor 3 (AGOC-3A and Methylosmolene); 7th December for Sensor 6 (AGOC-3A), Sensor 7 (AGOC-3A and Appluimonia) and Sensor 8 (AGOC-3A and Methylosmolene).
Several of these exceptions to the missing data are revealing because for some sensors they occur at or around comparatively rare spikes in chemical concentrations. Of note is the peak in Methylosmolene in Sensor 3 that surrounds the reading for the 2nd August missing data timestamp (Figure 3). This and the following observations suggest there may be some association between the points of consistently missing data and possible unusual chemical readings.
Midnight on the 7th December invites particular scrutiny as this occurs around the time major peaks in Appluimonia (sensor 6, Figure 4), Methylosmolene and AGOC-3A (sensors 7 and 8, Figures 5 and 6).
Missing data at these points may be masking other peaks and require further investigation.
The second pattern observable from Figure 1 is a set of apparently missing readings in all sensors (but especially sensors 4, 5 and 6) in Methylosmolene. In every case other than those noted above, these missing vales coincide with double readings attributed to AGOC-3A in the same sensor. This is shown in Figure 7 where duplicate readings are symbolised as green discs with vertical lines accurately depicting the exact timestamps where they occur. In every case these are aligned with (i.e. occur at the same time as) missing Methylosmolene values (red discs on bottom row of each sensor).
Assuming that these duplicate/missing readings are the correct values but have been misallocated to the wrong chemical type, the following procedure was applied: For each sensor, the mean and standard deviation was calculated over the three month period for each chemical type. Labelling each pair of AGOC-3A duplicates as D1 and D2, there are two possible allocations: either
D1 -> AGOC-3A and D2 -> Methylosmolene
D1 -> Methylosmolene and D2 -> AGOC-3A
The z-scores (number of standard deviations away from the mean) for both possible allocations were calculated and option with the lowest sum of squared z-scores was automatically selected. In other words, each value was allocated to the distribution that it more typically represented.
Figure 8 shows some examples of this allocation. It can be seen that spikes occur in both AGOC-3A and Methylosmolene for these allocated values so there remains the possibility that the values themselves are incorrect, not simple allocated to the wrong group.
There are a number of related possible explanations for the patterns seen.
- Some error in the sensor readings for AGOC-3 and Methylosmolene results in data being attributed to the wrong chemical type.
- High concentrations of one or more chemicals could be the cause of the error.
- The error above could result in erroneously high readings.
- There could be some deliberate malicious attempt to hide high readings.
Finally, there is a likely problem with Sensor 4 that shows a consistent increase in chemical concentration readings over time.
There is a small possibility that this trend could be triggered by genuine local environmental change, but given this trend is not detected by other nearby sensors, this is regarded as low probability.
MC2.2 Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data?
Figure 10 provides an overview of the trends in chemical detection for all sensors. The CUSUM (Cumulative Sum) chart shows the cumulative z-score over time, which allows for trends to be detected more easily than the equivalent raw sensor readings (shown in Figure 1). 'Normal' behaviour was modelled on the mean and variance for week 1 of all sensors and so if beyond that week, chemical levels are consistently above or below normal behaviour, the CUSUM line moves above or below the baseline. This shows that in general levels of all chemicals were higher by December than they were in April (bars thicker and above the baseline towards the right of Figure 10).
Discounting the apparent increase in probably erroneous Sensor 4, some of the largest increases in concentration are for Methylosmolene (sensors 2, 5, 7, 8 and 9); Chlorodinine (sensors 1,5 and 9) and Appluimonia (sensor 5). Sensor 5 shows a general increase in three of the four chemicals, but the fact that AGOC3-A does not appear to increase suggests it does not have the same recording problem exhibited by Sensor 4. However given the spatial proximity of sensors 4 and 5, both sensors should be checked in order to rule in or out, the possibility of serious local contamination.
Figures 11-14 show in more detail the concentration levels of all four chemicals as reported by the set of 9 sensors for the three month periods.
All four chemicals show a typical, largely random, noise component with occasional 'spikes' of much higher concentrations, typically 5-20 standard deviations from background levels. The most extreme spikes occur for AGOC-3A and Methylosmolene (noise showing smaller variation when scaled by maximum peak value in Figures 11 and 14). Appluimonia shows the least spikey distribution (Figure 12).
Figures 11-14 also show the anomalous behaviour of Sensor 4 for all chemicals suggesting at least a large part of the trend in apparently increasing concentrations is erroneous. The fact that Sensor 5 does not show a similar pattern lends further support to the observation above that levels of Appluimonia, Chlorodinine and Methylosmolene in that area are increasing over time rather than the product of sensor malfunction.
In addition to Sensor 5, Sensor 9 saw an increase in levels for all chemicals from the end of August (23-29th) and though December. Sensors 5 and 9 are geographically proximal and are the sensors that are closest to the interior of the park. The increase isn't immediately obvious from the raw concentration charts (Figures 11-14), but revealed by the CUSUM charts (Sensor 9 shown in Figure 15). Up until 23rd of August, detected levels are reasonably stable, but beyond that period we observe a trend of increasing concentrations due to combination of increased spike frequency and general background levels. This is most strongly evident in Chlorodinine but present also in the other three chemicals.
MC2.3 Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.
Spatial analysis of the sensor readings suggests the factories primarily responsible for high concentration chemical releases are as follows:
Kasios Office Furniture: AGOC-3A; Methylosmolene.
Indigo Sol Boards: Appluimonia.
Roadrunner Fitness Electronics: Chlorodinine.
Radiance ColourTek: No detectable environmental pollution
Spatial analysis and visualization was performed largely with a zoomable map view showing the positions of the sensors and factories along with a timeline view showing a summary of high concentration detection events (see Figure 16). The map view was constructed as part of an integrated spatial analysis and visualization used for all three VAST Mini-challenges.
To detect the spread of airborne pollutants, selected peaks in concentration were displayed on the map view as probability cones based on the measured wind direction and strength at the time of the event. The probability cone shows the most likely source of the detected chemical by tracing a vector back in the opposite direction of the wind, expanding the likely region with distance away from sensor. The threshold that defines a detection event visualized in this way can be changed interactively. An example of events detected at 2pm on August 21st is shown in Figure 17.
By combining probability cones for all chemical concentration peaks, a composite picture is created (Figure 18) showing a spatial structure to the events when considered by chemical type. This composite suggests Indigo Sol Boards the likely source for (orange) Appluimonia, Roadrunner for (green) Chlorodinine and possibly Kasios for (pink) Methylosmolene and (blue) AGOC-3A. However, more convincing evidence is provided by filtering by chemical type.
Examining just extreme AGOC-3A detection events (Figure 19), we see the most likely origins are in the region of Roadrunner Fitness and Kasios Office factories. However, Sensor 6 suggests Roadrunner an unlikely source given that very few detection events occurred with the prevailing NW wind (which would have carried the chemical from Roadrunner if it had been the source). In contrast, westerly winds almost exclusively carry AGOC-3A detected by the sensor. With Kasios being almost due west of sensor 6 this is the most likely origin.
Similar reasoning can be applied to Methylosmolene, originating from the same source (Figure 20).
Once wind direction is taken into account, the distribution of Appluimonia events can be seen as spatially distinct and uniquely focussed around Indigo Sol Boards (Figure 21). Note also that while Sensor 9 provides the primary positive evidence, Sensor 5 also supports this. As noted above, both of these sensors showed an increase in detection levels over time, suggesting the emissions from Sol Boards has increased since late August.
Finally, evidence for the origin of the Chlorodinine emissions is provided in Figure 22. Sensor 6 is again particularly discriminating in ruling out Kasios as a source and instead providing compelling evidence for Roadrunner being located NW of the sensor.