Home » IRE News » Behind the Story: Sweeping FOIAs, document-mining reveal ...
Behind the Story: Sweeping FOIAs, document-mining reveal problems with Norway kindergartens
By John Bones, Verdens Gang
It started like an ordinary news story last October. One of our reporters, Frank Haugsbo, made Freedom of Information Act (FOIA) requests to the five biggest cities in Norway to get access to the kindergarten inspection reports. While reading them, he saw a pattern of violation of law.
This gave VG the idea to investigate the whole country. The inspections are done by the different municipalities, so Frank Haugsbo had to make 429 different FOIA requests, asking for the reports from 2010, 2011 and 2012. One municipality answered in four hours; one in four months.
From text to numbers
When Haugsbo started to receive the reports, some of them as PDF files, some as Word files and some in paper format, he saw that he had to find a way to handle the text data, which we later counted to 31,000 pages. In the beginning of December, two colleagues with data skills were asked to join the project. We had to figure out a solution for four main tasks:
- What kind of database are we going to build?
- Which is the easiest way of reading the documents?
- How are we going to classify the documents?
- Is there a way to convert text to quantitative data?
Several techniques and programs were discussed. Our first idea was to combine Abbyy Fine Reader, Copernic Desktop Search, Adobe Pro and Google Refine in one way or another. We made several tests, and after a few days we concluded that it would not work as we wanted.
Then, inspired by the ProPublica project "Free the Files" using DocumentCloud, we came up with a solution based upon these techniques.
However, the kindergarten documents had a few more difficulties. Each municipality had their own way of conducting inspections, and they used different methods of reporting when and how the different laws were broken. In some reports it was difficult for us to determine whether the inspector had found a violation of law or rather he or she just had left a note.
After some discussion, we came up with the idea to tag each document with keywords, the same way blog posts are being tagged. We also created ten standard tags to classify the violations of law. We tested this technique by reading and tagging 50 documents, we made some minor changes, and then the hard work could begin.
We started in the middle of January, five reporters reading all reports from 4,484 kindergartens.
VG journalists created this system, inspired by ProPublica's "Free the Files", to turn their documents into data.
We were manually filling out the different forms, sitting in the same room and could ask each other if there was something we wondered about. The close contact between us, the continuous small talk, helped us pinpoint the most important cases later on.
We took almost four weeks reading all the documents, and then, in the middle of February, we began to structure and analyze the data. All tags, together with the kindergarten ID, municipality, county and ownership, were stored in a MySQL database. We developed the kindergarten ID by joining two files: A list showing all Norwegian kindergartens, which we got from the government, and information from a central business register. Then we could geolocate all kindergartens and make an interactive mapping solution when we were ready to publish the reports.
As we were five reporters reading 4,500 documents from around 400 municipalities, there had to be different kinds of tagging. In the end we listed more than 2,000 different tags, that we manually added into seven different main files.
When this was done, we saw the main news in the material:
- In total, there were violations of law in about half of the kindergartens.
- Security danger in one out of six.
- Critical hygiene violations in on out of five.
- Too few adults in one out of ten.
- One of ten reports are prepared by the kindergarten itself.
- 55 municipalities did not have any inspections at all.
- No difference between private kindergartens and kindergartens owned by the municipality.
We also found that about half of the kindergartens had not been visited by the inspection authorities.
We published the stories in the paper edition VG, Web edition VG Nett, mobile edition VG Mobile as well as our web-TV channel VGTV.
We experimented with different paper front pages, and instead of making a classical news front, we made this one. The text means: "Mom and dad think I am safe in the kindergarten, but is it true?"
The inside stories were classical tabloid. The title reads "1 of 2 kindergartens were breaking the law", and the picture is showing a young girl who died in the kindergarten. All together we made 32 print pages, in addition to the Web and web-TV publications.
There is some uncertainty in the material we have been publishing:
- Nobody knows exactly how many kindergartens there are in Norway. When checking the latest file we received from the Government, in February, we found some kindergartens which no longer exist.
- We also know that some kindergartens are not on the list.
- As the inspections are done in quite different ways, we do not know exactly how many times there have been violations of law.
- When we were converting text to numbers, this was done manually, and where there are human beings involved, there will be done mistakes.
That's the reason why we have not made tables and graphs and compared the different parts of the country. We know the main results, but we think it is risky to publish the result for each municipality.
However, and this is one of the major innovations in this project, we made a searchable database containing all inspection reports from the kindergartens. The readers could search the database, and in the first 11 days following publication, the word "Kindergarten" was the most used. To date, we have had more than one million database searches. (Norway has five million inhabitants).
One of the main goals for this project was to present all our reports in a database where our readers could find their own kindergarten and check the conditions for themselves. We also wanted to make a small community on every kindergarten, where employees and parents could discuss and share their experience.
The main view for VG's custom database of kindergarten inspection reports.
When presenting 6,000 kindergartens at once we had to make it user friendly and easy for our readers to navigate. To find your kindergarten you could either search through our autocomplete search field or find every kindergarten in your municipality, which were listed county by county and in a separate map. In the right column we also provided links to articles explaining our method and why we decided to publish the reports.
This is the listing of every kindergarten in the municipality. We also geotagged every kindergarten and displayed them in a map powered by Leaflet and Cloudmade.
The listing view for all kindergartens in a municipality.
The view for each kindergarten, with the opportunity for reader comments.
This is how every kindergarten is presented. We listed every kindergarten in Norway, even though we only had reports on half of them. But at every kindergarten we had a comment field where our readers could share their experience with this kindergarten. With a total of 6,000 kindergartens in Norway we also had to moderate 6,000 different debates.
Our comment field is built around a tool called “Protokoll”, developed at VG Nett many years ago. All of the comments are handled by an administration tool that looks like this.
VG mannaged reader comments using this system administration tool.
Moderating was one of the biggest challenges of this project. In the first week after launch we had up to five moderators approving and declining comments. By now we have published more than 4,300 comments in our database. We´re also very strict in our moderation, which has led to more than 1,500 unpublished comments.
A work like this had never been done in our country before, so we had to invent the methods ourselves. We had some trouble because we did not have all documents when we started the data work, and we faced new problems when we were doing the GIS work. We were uncertain about the identity and geolocation of 300 kindergartens, and therefore we had to check these one by one by hard identification work. If we were to do a job like this again, we should have access to all documents before we start to handle the data.
John Bones is a senior staff reporter at Verdens Gang, Norway’s largest daily newspaper.