The KoBo Experience: or how to avoid Dark Wizards when conducting research in Africa

43 views
Skip to first unread message

Neil Hendrick

unread,
Nov 28, 2010, 7:10:50 AM11/28/10
to ict4chw

~Overture or The Introduction~
Normally, I know this list focuses on Ehealth/Mhealth issues, so I hope my contribution here is not too great a departure. I will introduce myself, and tell you something about the research institute I work for, and our research. We focus on human rights, but you will be happy to hear that there is a medical, or at least a mental health element, and that our research from the Central African Republic on exposure to violence and mental health was recently published in the Journal of the American Medical Association. (JAMA, August 4, 2010—Vol 304, No. 5)

Let me introduce myself. I'm Neil Hendrick, and I work for the University of California, Berkeley Human Rights Center. I'm a mobile technology specialist, which is an utterly made up title, but which fits the bill. I develop digital data collection tools and methodology and I apply these tools in the field as a researcher. I get to see both sides of the equation, developing software on one hand, and working with it in real world conditions. 

The Berkeley Human Rights Center does research in post-conflict areas to study attitudes to peace and transitional justice. I am not a principal investigator, we have a couple PhDs who do the big brain thinking, statistical analysis, and reports. The reports are all available at the HRC website, hrc.berkeley.edu, and I urge you to take a look, if only to see a highly polished output from data collection, and some fine examples of drawing conclusions from raw statistics. 

A side note, I am currently in Zorzor, Lofa County, Liberia. I'm in the midst of a very large population survey, 27 days in to data collection with two weeks to go. I can discuss with you a little bit about our methodology and the realities of working in the field with technology.

~Sampling~
Zorzor, Voinjamma, Kakata, and several hundred other locations in Liberia are the stage on which this drama is set. For those of you who work in research, you may know how sampling methodology works. For us to make a report that explains the situation in Liberia, we have to talk to a lot of people all over the country. You can't talk to everyone (if you could, that would be called a Census) so you just talk to a sample. There is a bunch of math that goes into determining how many people you have to talk to for your sample to be "representative" of the population. Long story short, for this study, that number is about 4200. We want to tell the story, not just of the one-third of the country that live in the capital city, but also of the rural population, and people from very remote areas, whose opinions and experiences might be very different from people in Monrovia. 
To that end, we use data from the government about the population. How many people there are and where they live. We can't know how right this data is, but we can use it for sampling. It's the best data available, so it's what we have to go with. The main ingredient in sampling is randomness, it is the yeast in the brew, the jam in the donut -- the goal is that every person in the country has an equal chance of being selected as a respondent in the survey. Their selection must be both random and systematic. We don't have a list of every person and where they live (there is no postal delivery here), so we have to select based on what the census calls "enumeration areas" or "EAs". Each EA is a geographic area that contains 80-120 persons. In the city, there are many and they are close together, several per block. In the rural areas, several small villages have to be lumped together to form an EA. All the EAs in the country (about 7000) are put in a Bingo hopper and spun wildly by a blind/deaf/mute research assistant until about 250 EAs fall out of the chute. I am kidding about the Bingo hopper, a computer does it (a blind/deaf/mute computer), randomly selecting 250 of these points. They are clustered by county (again, I am glossing over a lot of math). The point is, that I don't make any decisions about where to go and who to talk to, the computer gives me a list and I go to those places. 
Some of these places are walking distance from my apartment in Monrovia. Most of them are far flung settlements in the forests and jungles of Liberia. If the computer tells me to go to Lukamai in Lofa County in the North of Liberia, on the border of Guinea, if it is several days drive, if you have to abandon the car 6km out and cross a bamboo footbridge then hike through the jungle to get there, then that is what I do. The computer has spoken, the people of Lukamai will be heard, and unless great barriers are put before me (oceans of flaming thistles or occupation by rebel troops hostile to science, for example) then I will do everything in my power to get there. In fact, I was in Lukamai on Monday and it was very nice. The elders gave us a traditional welcoming gift of Kola nuts. 
Whatever EA is chosen by the Great and Terrible Gods of Randomness, I do not go there alone. I can't interview 4200 people by myself. There are 5 teams in different parts of Liberia, each team has 8 people on it, equally split by gender so that we can interview the exact same number of men and women. (There is actually an extra person on each team in case someone gets sick or some other small tragedy befalls the team). 44 surveyors doing 4 interviews each every day, means we take 176 surveys each day.  When we arrive in Lukamai (or any EA) the Enumerators have a method for selecting a person to speak to. Remember, the person must be selected in a way that is Random and Systematic. The EA is physically divided into Zones. If there are eight enumerators, we divide it into 4 zones. You do this by drawing a little sketch map of the place and arbitrarily dividing it. A gender matched pair of enumerators (a man and a woman) then proceed to the approximate center of the zone. There, each one of them spins a pen on the ground. Where the pen points, there they walk. The enumerator walks from the center of the zone, to the edge of the zone, counting domiciles. The number of houses is divided by the number of interviews to determine the Sampling Interval. If there are 12 houses in a line between the center and edge of the zone, and the enumerator has to do 4 interviews, the the Sampling interval is Three. (12 / 4 = 3). So, the enumerator goes back through and selects every third house (House, House, SAMPLE, House, House, SAMPLE, House, House, SAMPLE, House, House, SAMPLE). At each Sampled house, the enumerator knocks on the door and asks how many adult same-gender persons are living there. If I am a male enumerator, I find that there are three adult males there, I then randomly select one of those males by asking their first names and choosing the first in the alphabet. There is no reason why the letter your name starts with should have any effect on your opinions, unless you can make a case for men named Alfred having significantly differing opinions from men named Zackary. There are some more rules for handling empty houses and people who refuse to participate. Of course, you need a person to be there, and you need consent, but I will elide those details in the interest of time. Now you have put an enumerator face to face with a respondent and you can start your survey.  I am only going into such detail so you can see the tremendous lengths that you have to go to to randomly and systematically collect data in a population survey. 

~The Survey~
The survey instrument is a series of questions. The survey here in Liberia has about 300 questions, but there is some skip logic so that every respondent does not answer every question. If you ask "Have you ever heard of the Special Court for Sierral Leone?" and the woman says "No" then you can skip further questions about the Court. 
The survey does include a wide variety of questions on areas including socio-economics, health, access to police and the courts, traditional justice, drinking water, education, personal history with the war, exposure to violence, depression, and more. It's a monster thing that takes a solid hour to administer. It tries the patience of both enumerators and respondents, but all the questions are important, and the construction of the survey is labored over for months by greater minds than mine. 
It is designed with analysis in mind. The idea is to be able to draw conclusions from the dataset about subsets in the population. We don't just want to know how many people were exposed to violence, we want to know about the relationship between exposure to violence and mental health (See article in JAMA for exhaustive detail). We want to be able to draw conclusions about economic hardship and attitudes toward peace and justice. 
From my personal point of view, I am handed the Sample (a list of enumeration areas) and the completed list of questions (the survey instrument) and my work begins. I do my work, and then I return the final data set. The Big Brains go about the analysis and produce a report, drawing conclusions from the data and even making recommendations. The report is used by NGOs, aid organizations, and the government to understand realities on the ground, to set policy, and to allocate resources. In the next section, I will describe the process of going from a set of questions to a completed dataset. 

~Finally, the Technology Section~
For those of you who have been waiting for me to say the names of software and pieces of hardware, your moment has finally arrived. This is my area of expertise, and it's ironic that my undergraduate degree in anthropology and experience traveling the developing world didn't do me a bit of good until I had expertise gleaned in the IT industry that made me useful as a researcher. I was asked in 2007 if I could figure out a way to move from paper based surveys to digital data collection using smart-phones. I was in the middle of doing a Masters degree in International Development at Tulane when I was sent to Uganda to do the first pilot digital data collection project. It was supposed to be three months in Acoliland, but in fact, I never came home. I never got back to Tulane or New Orleans, I haven't yet finished my degree. I did work in The Hague for the International Criminal Court developing databases, and I came on staff at Berkeley to build on the original pilot project for digital data collection. 
At this point, we have a well polished system that completely bypasses paper. We collect more data faster and at higher quality than anyone could imagine five years ago. The system is based on several open-source softwares, off the shelf hardware, and some custom software that addresses the specific needs of working in remote and harsh environments. I've got years of developing and tweaking and adding things on to this system ahead of me, but it's a mature product already that can be put into practice by any researcher. We are evangelical about helping other academics, NGOs, or anyone get started. 

~Hardware~
If you want to collect data digitally, you could use laptops, tablets, smart-phones, semi-smart phones, maybe even something like iTouches or a handheld gaming device. Even pretty dumb phones can run Java these days, so there is the option to collect data with all kinds of things. I settled on Android phones for a couple reasons. First, the name is cool. I always wanted an android, and now I have dozens of the things. Androids are all the same, that is to say that even though I use HTC MyTouch phones, I don't have to make any changes to my software or system to add some Nexus Ones to my fleet. When Android tablets come out, I will incorporate them with the same ease. There are cheaper and cheaper Androids, while iPhones are more and more expensive. If a voice inside your head keeps saying "These are not the droids you're looking for" then move along, there is another model that will better fit your needs. 
For field surveys, the highly portable form factor of a phone is perfect. The enumerator can use one hand to control the survey, reading questions to the respondent from the screen and recording the responses with a finger while using the other hand to gesture. It's easier to maintain eye contact, and it is surprisingly intimate when compared with working on paper. 
As I said, our PDAs are all HTC MyTouch phones (with a few Wildfires thrown in for supervisors) but we don't use the phone functions at all. They are all in airplane mode at all times and synchronization is done using usb cables. For all you tech wizards out there, this seems crippling. Please recall the location of Lukamai, 16 hours drive from the capital, 6 km hike uphill through the jungle to the edge of the country. Do you think they have internet service out there? Connectivity out here is intermittent. It's easier to aggregate the data into a single file and send it for analysis in a daily data dump, or as often as the supervisors can get some GPRS. 
I can't say enough about the reliability and utility of Android phones. For a developer, they are a holy grail, a gizmo that you can hack right down to the bones, but one that is fully functional off the shelf. For my purposes, I have a custom ROM that i install on every android with all my software and settings just the way I like it. When you have to do the same thing for 50 PDAs, that kind of flexibility is a lifesaver. 
I have a ton of other hardware to keep my data synchronized, my PDAs charged, and my data backed up to the cloud. Just briefly, recharging is a pain in a country where there is not a single power plant. All power in Liberia comes from generators and virtually nothing has power 24 hours a day. To compensate, and to take advantage of the tremendous amount of driving I do, I have a 150 watt DC inverter that plugs into the cigarette lighter in my Land Cruiser. I plug a 13 port USB into that, and then I have USB cables for every PDA, so when data collection is done for the day, we get in the truck and I plug the PDAs into the Hub, the Hub into the Inverter, the Inverter into the Cigarette lighter.  By the time we get back to whatever town is the current base of operations, the PDAs are charged and ready to go for tomorrow. 
I have power backups as well. When the car charging doesn't work (I killed the alternator driving into a mud hole the other day and submerging the whole engine) I have Solio chargers. These are solar batteries that can charge in the sun or from USB. Really useful. Also, If there is current in the guest house, I can plug in my hub through my A/C stepdown converter and voltage regulator. Never, ever plug your thousands of dollars worth of gear into filthy generator electricity without a voltage regulator. Finally, I have a 12 volt car battery that I can connect the inverter to and charge from. I can charge 16 PDAs three times over with some power to spare. Eventually, you need a battery charger and some A/C power to top it off. 
Every team is equipped just like this. Even so, it takes constant vigilance to make sure your enumerators have nice green batteries when they start a survey. If you doubt any of this, please refer to the above description of the efforts it takes to get an enumerator face to face with a properly selected respondent. 
Besides other electronic hardware you need, like cameras and laptops and back-up laptops and so on. You also need shovels and machetes and power strips and umbrellas and notebooks and shark repellent. All this is true except for the shark repellent, which I just threw in there for whimsy. The point is, half the job is acting as quartermaster and gear-head. 

~Software~
Digital Data Collection is not super complex technologically. There are a lot of more sophisticated things that a PDA can do, but it is important that the application be very user friendly, easy to learn, and extremely reliable. If your sample size is 4200, and you have a software glitch that appears only 1% of the time, you just threw away 42 surveys. I refer you again to the above description of what it takes to get a single survey into your dataset. 
Our data collection software is called KoBo, and it is built on the Open Data Kit. ODK is open source, which means that I can take the source code, modify it as I like, and use it in whatever way is best for my needs. ODK is great software, and the community of developers behind it (some of whom are surely on this list) are top notch. Instead of being alone with my Use Cases and my IDE, I have a whole crew of people who want to help me through my problems because my problems solved are their problems avoided. My software is branded as KoBo (an Acoli word that means "transfer of knowledge") but I return to the ODK code base with every upgrade and improvement. 
The Survey, as I said, is handed to me as a list of questions. It's actually and Excel spreadsheet. I code these questions into XML using the rules for Xforms. This involves a lot of effort and some code savvy, but we are developing a Forms Building Engine that gives users a graphical interface to make it easy. (Keep your ears to the ground, I'll be looking for volunteer alpha testers in the next couple of months.)  The XML survey is loaded onto the SDcard of the Androids. It's easy, you can make changes and update the thing in a flash. 
Once loaded onto the android, the enumerator opens KoBo and selects the survey. It then presents the questions in the survey one at a time to the enumerator, who asks the respondent and then records the answer. Data can be single select, multi-select, media like images or audio, even GPS. Also, good old fashioned text box input where you type the answer on the virtual keyboard. Rule of thumb, wherever possible, predict all possible answers and present a list to the enumerator to select from. Answers can be constrained so that if you are looking for a number, the system will not let you type any letters, only numbers. You can constrain the values as greater than or less than, like if you want to make sure your respondents are adults, the age must be greater than 18, the enumerator will see an error message otherwise and be unable to continue. 
Skip logic is built in so that you can control the flow of questions. In paper based surveys, this is one of the most common areas where enumerators make mistakes. KoBo takes those decisions away from the enumerator and presents the next relevant question based on previous answers.   Skip logic, automatic progression, and data constraints go a long way towards getting data that is clean and uniform. There are other safeguards, such as time-stamping the beginning and end of the survey and recording the GPS location of each interview. Knowing where and when each survey occurs allows for error checking based on duration of surveys and time between surveys. GPS stamping allows for error checking based on proper methodology (you can see if enumerators are following the proper methodology for selecting respondents). GPS locations also add a mapping element to data presentation that is guesswork with paper-based surveys. Seeing results on a map is so much more dramatic than a table of numbers, you can never forget that there is an end-user, a consumer who will read your report. Be kind to that reader, give them some pretty maps to look at. 

The Data is collected as XML files on the SDcard of the Androids. It can't be analyzed until it can be looked at in a database, a spreadsheet, or a statistical software like SPSS or SAS. To this end, we developed the KoBo Post Processor, a java software that pulls the output from the SDcards, stores them safely on your hard-drive as a back-up, and then aggregates the data into a CSV output. This kind of data can be imported into any sort of analytical program. 

With KoBo and the Post Processor, I can collect data quickly and accurately, and can look at it daily. We catch errors and fix them before they spoil the whole survey, we isolate enumerators who are failing to follow the methodology and either retrain them or replace them. 
~The KoBo Experience~

The experience for the enumerator is an easy to use data collection system. The questions are easy to read and the answers are easy to record. There is no forgetting your place or skipping questions accidentally. There are no worries about legibility. Data entry is synchronous with data collection, saving on time, money, and errors.  
The experience for the respondent is a conversational style of interview, with more eye contact and less watching someone hunched over a clipboard writing who-knows-what. You would think the PDA would be a distraction, but experience in the field suggests otherwise. 
The experience of the field researcher, or my experience anyway, is something that you will not find in the the workaday world. I spend weeks in a truck with enumerators, always natives of some distant land. The hours are brutal, the conditions inhumane. I work 16 hours days every day, a Sunday off just means that I don't have to drive around. I still have to spend the day working on the data, running the receipts, buying supplies, training people, or  fixing the truck. I am an accountant, a cartographer, a nurse, a kindergarten teacher. I break up fights between enumerators and instigate them in equal measures. You see the sun rise, you see it set. Watch the jungle or the forest or the scrub or the grasslands or swamp or desert flash by the window. There's a waterfall nearby? Well, you don't have time to see it. There's a slum? That, you have time to see, in fact, you will spend all day there. Sunburnt, insect-bitten, sleepless, I survive on power bars and doxycycline. Locked in a weird symbiotic relationship with a dozens of enumerators, this isn't a job, it's a murder-suicide pact. Sometimes, after a hard day, they will sing gospel music while we drive. We often start the day with a prayer, which is always begun, then halted while they yell at me to take off my hat, then begun again. They know every tiny town in the country, they have ways of making things happen that appear to be guided by invisible hands. After midnight, arriving in some remote town, and before the trucks have even shut down the engines people are walking up to us with food ready for the whole team; potato greens, pepper soup, rice, and fu-fu. Everyone gathers around and eats out of the same bowls. We are guided to a house where everyone sleeps. No one pays for anything. I don't even ask how it is made to happen.
There is nothing like it, field research. These parts of the world are fading away as development paves the roads and shortens the distances and pulls everyone everywhere closer together. Before I go, I will give you one last example of life in the field, of the things that are fading away, the things our children will never see. 

My intrepid team of enumerators and I went to a place called Yelekor Village, It's only 10 km out of town, but then there is no road in to it, so we had to hike 2.5 km in to the actual village. 
I Spoke to the Elder of the village, getting permission to be there before setting to work. At this point, I usually do an interview with the Chief or Elders about the community in general. Usually, this does a lot of good diplomatically by getting the local leaders involved without having them interfere with the enumerators.
While I was there waiting, a gaggle of elders came from surrounding places, Yelekor being the center of a network of a dozen little places. The elders started in to a big talky-talk. The dialect there is impenetrable,  so I was relying on one of my enumerators, Cynthia, to handle translation. 
For a while, there was a lot of yelling. This is pretty normal, people here yell a lot, so I didn't worry too much. In fact, Cynthia turned out to be a very able diplomat. However, one elder was vexed about our presence there, he thought we should have announced our presence to him personally by letter in advance. Keep in mind, that he lives in a place that isn't in the map and doesn't have an address. Also, there is no mail delivery and most people can't read. He told us that he would not bring us Kola Nuts, which is local code for "you are not welcome here." He stormed off, and the other Elders were apologetic and welcoming after that. They were very positive about our presence there. I texted the other enumerators at that time that there was some tension with the traditionals, and they should be extra nice to anyone who came to talk to them. 
Unfortunately, the angry elder was also the Poro Master of the Country Devil and the Clan Zoe. That is to say, he is the leader of the local secret society "Country Devil" and he is the witch doctor for that group of villages, which is called a "Zoe". He went and found my enumerators, who had spread out into the neighboring villages. Those of us in Yelekor were under the protection of the Elders in Yelekor, but the others were not. The Zoe found them and threatened them with a killing from the Country Devil, from which they would never be found, even if we used a satellite. I am quoting. The Zoe was quite clear, he repeated the threat twice. Especially, the part about the satellites. 
Two village tough boys escorted the enumerators back to the  place I was waiting. We left with an escort, one of the good guys from the friendly village, though he abandoned us partway to the road. I never saw everyone walk so fast. The guys who had been personally threatened by the Zoe would not even stop to urinate, when they got to the road, they crossed to the other side of the road to pee. We loaded up and left.
Not a great day for science, thrown out of town by a wizard, my sample truncated by a secret society.  These are the lengths you have to go to to get data that is not close to the surface. Tough, but in the end, we will have the best dataset on the Liberian people ever collected. Also, you should hear the stories from the teams working in the SouthEast of the country, they make my troubles sound like a vacation. You can bring all kinds of technology, but it's always boils down to people. I guess that's the moral of the story here. Having taken up so much of your time, I'll thank you and invite any questions you may have about KoBo, the Human Rights Center, or what to do in case of magical attacks. 

Your man in Monrovia,

Neil Hendrick

David Isaak

unread,
Nov 28, 2010, 9:14:30 AM11/28/10
to ict...@googlegroups.com
We all spent a great deal of time reading...this has to be the best piece I have read in a great while....Informative and hilarious. We need more of this style of info delivery. Was this a copy-and-paste of the JAMA article??

Thanks Neal!

--
You received this message because you are subscribed to the Google Groups "ict4chw" group.
To post to this group, send email to ict...@googlegroups.com.
To unsubscribe from this group, send email to ict4chw+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/ict4chw?hl=en.

Ime Asangansi

unread,
Nov 29, 2010, 12:35:14 AM11/29/10
to ict...@googlegroups.com
Wow! Great article and story!
Thanks for this Neil!
You have a blog or some pic gallery? :)

Ime

neal lesh

unread,
Dec 1, 2010, 9:38:15 AM12/1/10
to ict...@googlegroups.com

Hi Neil,

 

Thanks so much for this post!  It’s insightful as it is entertaining—rather glorious really.  Your final assessment that “You can bring all kinds of technology, but it's always boils down to people.” applies to most of our work, I’m sure, and it’s really helpful to see such a grounded perspective on it.    A few questions—about both the technology and the people—but as usual, no need to answer them all:

 

Would you want functionality in KoBo to help with the selection of houses, rather than have the enumerates spin the pencil?  I’m thinking the user would tell the system that they are now in the center of the zone and then it would tell them where to walk, and then the user would indicate when they came to a house, and then the system indicates if they should do an interview or not.     Perhaps the pencil and counting method is just fine, though.

 

Is it rare for somebody to decline the survey?  Do you have a sense of what the factors are?

 

One question I get asked a lot was how long a survey you can do on a phone?  The presumption was that it would be easier to do very long surveys on paper.  Am I right that is not your experience?  In general, is there anything you miss from the paper surveys or things you think paper was better at? 

 

Is there some functionality on your wish list?  I think there are techies out there willing to work on futuristic stuff if they know what would be useful.

 

We are trying to assess how useful the data we collect from CommCare is for ongoing health surveillance, and I wonder if you might have some guidance for us, which I’m sure would apply to many ict4chw systems.  I know this isn’t what you’re doing but I am hoping there is some overlap.  Our data is collected by community health workers during monthly visits to their clients.  For example, the community health workers ask if anybody in the household has a cough, fever, etc.   The primary purpose is to deliver care, but a useful side effect would be to use the data for household-level health surveillance.  Our general question is how accurate does the data have to be to be to support health surveillance?  A related specific question is if you have a sense of how much agreement you’d get if different enumerators interviewed the same person (this might be called inter-rater reliability.) 

 

I was very interested in your description of you how check the data daily to catch errors and identify enumerators who are not following the survey methodology.  What tools do you use to analyze the data? Do you think all the team supervisors do that pretty well and consistently and/or is this something that could be automated?

 

To add to Jon’s point, there seem to be a lot of efforts to create better form authoring tools.   I’d guess it’s no small task to try to coordinate all of them, but perhaps we could at least survey them here, if there’s interest in members of the ict4chw list.  

 

Thanks again, Neil!

 

-n

Reply all
Reply to author
Forward
0 new messages