Data Entry and Cleaning

I got back from White River to Tom, who spent the week (willingly I will add!) with the data enterers trying to get all the info from the surveys into excel spreadsheets.  There were a couple errors on our part and then many errors on the part of the data collectors and enterers that made this whole process take that week and the better part of the next week and a half to get done to our satisfaction.

First, we had told the data collectors to interview the women (the wife) in the house.  This was done because we wanted to track the genders but more importantly because women tend to know more about prices and how much items that the house needs to buy actually cost.  She also tends to have a good sense of the amount of money coming in – though clearly not an perfect sense because it’s possible that the couple hides things from each other.

Needless to say the data collectors did NOT interview women – which I discovered after noticing that the ages and education levels seemed inconsistent for the wife to be the one answering.  I believed, probably accurately, that the wife would usually be younger than the husband and we were saying consistently that the person answering was older.  This also does not, unfortunately, mean that they only interviewed the husbands because 1) women could be older sometimes and 2) sometimes we were seeing younger respondents.  Anyway, it meant that we were going to have a hard time disaggregating along gender lines and we were mad at ourselves for not asking a clear question “what gender are you?” and also for not asking if interviewing the wives would be difficult for the ennumerators… maybe saying it for the 101st time would have helped?  Either way, it means the data is still useful but that we can’t aggregate along gender.
Anyway, that, some formatting errors, and some data enterer errors meant that Tom and I spent the next week going through groups of 25-40 surveys (as many as I could view on my computer screen at one time) looking for mistakes and inconsistencies and then going back to the actual documents to check.  Many were data collector mistakes, many were data enterer mistakes, and some were our own formatting mistakes that led to confusion on everyone’s part.  We DO have them clean now and in a STATA file and will begin the analysis next week.  Once we have populations of the areas so that we can weight the data from the different sites appropriately and the patterns will mean more.  We’ll both be gone from Mozambique by the time we get our final stuff to TNS but luckily it’s stuff easily done from afar.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s