By Amanda Nero
Sandhya Kambhampati, most recently a Knight-Mozilla fellow working in Berlin, Germany, helped CAR Conference attendees turn mediocre spreadsheets into ready-to-use data for analysis.
Kambhampati took attendees through the dos and dont’s of formatting data in Excel. The session focused on the top errors she and her colleagues have run across in newsrooms all over the world.
She stressed four things to avoid while structuring your data:
- Vague and nonsensical header rows and file names: It’s hard to perform key functions on rows you name “thing_1” and “thing_2” once you’ve walked away from a spreadsheet and completely forgotten what those indeterminate labels mean. You’ll save yourself time and frustration if you label your header rows something clear and concise the first time around. The same goes for the filename itself.
- Colors as delineations: Using lime green and hot pink cells to define different sections of your data in Excel is untidy, and colors are often less sensible than data dictionaries or flags when it comes to data organization.
- One mega-sheet: This one is simple. Don’t put multiple tables all over one spreadsheet. Branch out and use multiple sheets within the same file.
- Don’t assume stuff: Don’t assume the next person looking at your data file will get what you meant when you used colloquial abbreviations or left rows blank. Be consistent and precise when filling out tables. You, or the next person to look at it, will be grateful.
Four tricks for organization:
- Curate your data dictionary carefully: Avoid the sticky notes all over your desk method. You will lose them. Instead, create a file, either within or separate from your data, for your notes. Include information like data source, explanations of the content within the file, and functions and queries to analyze the data.
- Watch your spacing cautiously: Spaces between words can make data hard to parse or run functions on down the line. Kambhampati suggests stringing multiple words together with underscores or dashes.
- The power of the unique ID: Unique IDs can make joining tables much easier. They’re also useful when data has field names that aren’t necessarily spelled or formatted the same way, but share the same unique ID. If the data doesn’t have an ID already, you can create your own.
- Use flags: Kambhampati suggested using a series of flags (she regularly uses 1’s and 0’s) to delineate one thing in a data set from another, like which rows have been checked and which haven’t. This is the better alternative to using colors to organize.
She said data structuring and organization is a lot of trial and error. Instinctual decisions on how to parse, sort and analyze comes with time, but Kambhampati’s simple set of rules is a good place to start.