[The String Diaries Epub Download

0 views
Skip to first unread message

Vida Hubbert

unread,
Jun 10, 2024, 6:55:14 PM6/10/24
to vipaphosfern

Global positioning systems (GPS) are increasingly being used in health research to determine the location of study participants. Combining GPS data with data collected via travel/activity diaries allows researchers to assess where people travel in conjunction with data about trip purpose and accompaniment. However, linking GPS and diary data is problematic and to date the only method has been to match the two datasets manually, which is time consuming and unlikely to be practical for larger data sets. This paper assesses the feasibility of a new sequence alignment method of linking GPS and travel diary data in comparison with the manual matching method.

The String Diaries Epub Download


Download ::: https://t.co/3QkR1IqXS8



GPS and travel diary data obtained from a study of children's independent mobility were linked using sequence alignment algorithms to test the proof of concept. Travel diaries were assessed for quality by counting the number of errors and inconsistencies in each participant's set of diaries. The success of the sequence alignment method was compared for higher versus lower quality travel diaries, and for accompanied versus unaccompanied trips. Time taken and percentage of trips matched were compared for the sequence alignment method and the manual method.

The sequence alignment method matched 61.9% of all trips. Higher quality travel diaries were associated with higher match rates in both the sequence alignment and manual matching methods. The sequence alignment method performed almost as well as the manual method and was an order of magnitude faster. However, the sequence alignment method was less successful at fully matching trips and at matching unaccompanied trips.

Global positioning systems (GPS) are increasingly used in health research to determine the location of study participants. Despite the utility of GPS in providing objective information about where participants travel and at what time, it does not reveal the purpose or meaning of their movement; what they were doing, who they were with, or why they were in that location. Therefore, in most health related studies GPS data is combined with other datasets that provide additional information about the participants, their behaviour and their immediate environment. For example, GPS data is commonly augmented with Geographic Information Systems (GIS) data to identify and characterise locations visited. GPS data can also be linked with data collected using mobile monitors such as accelerometers and air pollution monitors to provide spatio-temporal information about participants and the environment. Data from these objective measures are easily linked using spatial coordinates and timestamps from internal clocks (e.g., [1]).

GPS data can also be combined with information obtained via self-report which is useful in collecting information such as participant perceptions, trip purpose, and accompaniment (i.e. who is with the participant). However, a problem with combining GPS and self-report data is the difficulty in linking the two datasets. Diaries (travel or activity) are perhaps the most straightforward self-report dataset to link with GPS data because they are a timed sequential list of trips or activities undertaken, which means that it is theoretically possible to link the datasets using timestamps. However, being self-reported, diary data is subject to recall biases and times are unlikely to precisely match the GPS times due to participants' not recalling accurate times, and differences in the internal GPS clock and the watches/clocks used by participants. The likely time mismatches mean that it is difficult to use timestamps to automatically link GPS data with diary data, which is a problem for researchers wanting to combine these two datasets.

As far as we can deduce, manual matching works well, but is vulnerable to operator error and subjectivity. Manual matching is also labour intensive, which can make it prohibitively expensive for large studies.

To address the problem of linking GPS and diary data in a large study population and to avoid the issue of inaccurately entered travel diary times we developed a partially automated method of linking the datasets using sequence alignment algorithms. Sequence alignment is based on the principle of comparing sequences of strings [6]. It was developed in the 1980s for use in the natural sciences to analyse deoxyribonucleic acid (DNA) sequences. Since then its use has expanded to other fields including transportation and urban planning [7, 8], sociology [9, 10], the analysis of sketch maps [11], and tourist behaviour [12]. Sequence alignment methods are computational and, through automation of the process, have the potential to reduce the time taken to link GPS and travel diary. In addition, automating the linking of GPS and diary data (i.e., writing code) provides the added benefit of documenting the process and making it objective, repeatable, and replicable.

This paper assesses the feasibility of a new method of linking GPS and travel diary data in comparison to the only existing alternative method (i.e. manual matching). This is done in the context of a study of children's IM. We begin with a description of the sequence alignment method. Next, the feasibility and utility of this method is assessed by comparing the time taken and trip match rates of the manual and sequence alignment methods of linking GPS and travel diary data. The effect of higher and lower quality travel diary data on the match rate are also assessed. Finally, the results and the practicalities of using sequence alignment to link GPS and travel diary data are discussed.

We used a subsample of data from Kids in the City, a study of IM in children aged 9 - 11 years in six neighbourhoods in Auckland, New Zealand [13]. The subsample used in this series of analyses comprises seven day GPS and travel diary data for 40 children from two schools (School A, School B).

The schools selected were located in neighbourhoods with differing socio-economic status and walkability characteristics (Table 1). Neighbourhood socio-economic status was determined by the school decile rating, a measure of the socio-economic position of households in a school's catchment area derived from New Zealand Census data. Neighbourhood walkability was determined by calculating a walkability index comprising dwelling density, street connectivity, land use mix and retail floor area ratio [14], and this index was applied to the neighbourhood where the two schools were located. We purposely selected schools with varying quality travel diary data - as reported by the research assistants responsible for data collection - because we were interested in how travel diary quality might affect the success of the sequence alignment method. Twenty children from each school were randomly selected from the total number of participants at each school.

Data collection occurred between March and June 2011. Children wore QStarz BT-Q1000 or BT-Q1000XT GPS units (Qstarz International Inc., Taiwan) on a belt for seven consecutive days. The units were configured to log data every 10 s. Children completed travel diaries for the same period. These were collected and checked with the children every school day during data collection. Further details on data collection are available in Oliver et al. [13].

The quality of the travel diaries was assessed by counting the number of errors and inconsistencies evident in each child's set of diaries. Indicators of low quality were: missing data, data that had been obviously adjusted or entered by a researcher, and inconsistencies such as a child travelling to a park but not recording a trip back home. The number of indicators of low quality were summed, resulting in a diary quality score for each child, with higher scores representing lower quality diaries.

The travel diary quality scores ranged from 0 - 25. The 17 children with travel diary quality scores greater than 4 were assigned a quality cateqory of 'lower', while the 23 children with quality scores less than or equal to 4 were assigned a quality category of 'higher'.

The sequence alignment method described below matched GPS and travel diary datasets according to the sequence of trips. Trips were matched based on their origin and destination locations and on the order of trips in a day as established by the travel diary. For example, in Figure 1 there are two 'Home - Friends' trips identified in the GPS dataset for a nominated child, yet only one in the travel diary dataset. Using the sequence alignment method the second GPS 'Home - Friends' trip would be matched with the travel diary 'Home - Friends' trip because it occurred after the 'Home - School' trips. By relying on the relative order of the trips it is not necessary to use the times entered in the travel diaries.

The method matches trip chains with two links. For example, the GPS trips from 'School - Shops' and 'Shops - Home' are matched with the travel diary trip 'School - Home'. The method also distinguishes between full and partial matches. A full match is where both the origin and destination locations match. A partial match is where only one of the origin or destination locations match.

GIS software, ArcGIS v.9.3 (ESRI, Redlands), and open source statistical computing software, R (R Foundation for Statistical Computing, Vienna, Austria), were used to implement the sequence alignment method. Custom scripts were written in the Python [15] and R scripting languages. These scripts are available from the lead author upon request.

The implementation process is illustrated in Figure 2. In the absence of established protocols to clean GPS data, we cleaned the data by removing data points with speeds greater than 160 km/h, horizontal dilution of precision (HDOP) values greater than five, and data points with less than four visible satellites. HDOP is a factor in determining the horizontal accuracy of the GPS data and relates to how the GPS satellites are positioned in the sky [16].

795a8134c1
Reply all
Reply to author
Forward
0 new messages