Reflections on secondary data analysis in the PhD…

vaiablesYou may remember that I have been spending the last few months carrying out some secondary data analysis of different surveys relating to innovation and other related concepts in the workplace. However, you may also remember that this has not been an easy task.

Initially, I was going to be analysing secondary data taken from surveys relating to innovation alone, but as you can see in my previous blog post this approach did not work. I got to the point (not long after writing that blog post) where I had has enough, and I questioned the purpose of doing the secondary data analysis at all. To me, it didn’t matter that this was a requirement of my original PhD proposal and that my PhD sponsors and funders wanted this type of analysis done. To me, it was getting me down and the only way I could see to fix it was to make my way out.

I decided to tell one of my supervisors this as it got to the point where my frustrations were getting me down. It turns out they did not know it was making me feel bad, but did know I had not had the support network present that I really needed for the analysis, and at times I just had to plod on. We decided to have a supervision meeting to chat about the purpose of my secondary data analysis and to explore the next steps we could take. It turns out that this was one of the most productive meetings I have had with my supervisors as it meant from that point on, I knew what I was doing and made myself a plan of what I was going to do. In our meeting we decided that:

  1. I was going to analyse some secondary data from international surveys. This would give information on trends occurring on the macro level (between international countries);
  2. This would then lead to further exploration of UK survey data (on the micro level) where I could explore specific contributions to the development of innovation from data taken in the UK;
  3. I could then draw conclusions about what was happening at international level and how this compares to countries within the UK, and finally I would have my European and UK comparison of secondary data.

We also formulated a plan of the types of exploratory analyses I could do and how these were carried out on the programme I was using (the joy of SPSS). And this is how it went…

Accessing data

I accessed some open-access data where I did not require specific permission to use it. My supervisor was great in finding lots of links and I spent a week searching through these, looking for variables associated with innovation and also variables which could contribute to innovation development. I accessed most of my data through Eurostat so that I had variables there and then. This meant I did not have to apply to use the data or have my project approved specifically either, which eased the tension a little. I did use other websites to access information on things such as education provision, gross domestic product and related variables, something I highly recommend PhD students do. The more data sources you access (reasonably) the more information you may find on variables related to concepts you are exploring. I found that this reduced my data access anxiety almost immediately as I was able to create a fully comprehensive dataset – a dataset containing over 70 variables I must admit. But this meant I had a starting point and a starting point was looking forward to explore. More importantly, I was able to send my variables to my supervisor and they gave advice on other data I may like to find to help fill the gaps in the analysis I was about to do.

Preparing the dataset

One thing that people don’t realise is that secondary data is not always in the best condition. There are often countries with missing data, countries where data that is incorrect corrand data which makes no sense at all. As part of my data preparation and cleaning I had to make sure I had a full dataset (by replacing missing values with means), remove errors and full stops before putting it into an SPSS file. I then I had to make sure I labelled, coded and checked all variables again for errors so the analysis could be performed properly. Some students don’t realise that this takes time, and that this time is precious. It meant I got to understand all of my variables and where they came from, meaning I understood what they actually meant!

The preparation also included clustering my data into levels of innovation within countries. I wanted to make comparisons between high, medium and low levels of innovation so I had to use hierarchical cluster analysis to get this done. I managed to successfully cluster my data and was able to make plans for further analysis from the results I got.

And then the analysis bit…

Firstly, I carried out some exploration of descriptive statistics. This has given me a picture of innovation and employment trends across 28 countries in the EU. It also highlighted some interesting trends in terms of how the UK compares to countries across Europe and I have found that this can help justify the purpose of my PhD – quite good for me.

Secondly, I then carried out come correlation analysis for two main reason. Initially I wanted to see if there were any relationships between variables that could be explored later, and after that I wanted to help justify my choice of exploratory analysis I would use next. I found that there were some correlations between variables and these correlations seemed to make sense. When reporting them in my write-up I started to find myself trying to explain the patterns in my head to try and explain what was happening and why. The interpretation part of statistics is one part that I love, regardless of how significant results seem to be.

Analysis notes

Thirdly I then carried out something called a Principal Components Analysis (PCA). This was so that I could see if any variables were repeated and whether the variables could be narrowed down to less components. I found that this was quite hard. I had not run this type of analysis in almost 8 years and could not recall the point. But a great book and some statistics notes form my undergraduate and masters years helped me get through what I needed to do. In my opinion my analysis did not go as planned, but I was still able to see why. I was able to see why the components were not appropriate and made justifications to carrying out the next steps of my analysis.

Finally, I carried out a series of one-way analysis of variance and two-way analysis of variance analyses. This happened in three stages:

  1. I explored differences in means between low, medium and high innovative countries on the variables I had categorised as ‘influencers of innovation’;
  2. From step 1, I then reversed some of the analysis. I explore the difference in means of the influencing factors on the amount of innovation present in countries. This helped me determine whether significant relationships in stage 1 were repeated and which ones were not significant at all;
  3. I then took all significant variables in stage 2 and entered these into two-way analysis of variance analyses. This determined if there were any main effects of the independent variable (innovation influencer) present on the dependent variable (amount of innovation). It was at this point I considered covariates, variables that may influence relationships between other variables and ones that you can account for. I entered these into the analysis before getting my results.

It was at the point that I felt my analysis has been a success. I found that there are some patterns in my data worth reporting, and some that are not so helpful. I have found that this analysis WILL help me answer my research question initially and then my next analysis will go further in detail. I am hoping I can explore the UK survey data in as much depth as here but that will be a challenge for 2017 I think!

However, going tough the process of the analysis on my own (with slight help from my wonderful colleagues and supervisors when I got stuck) has given me a sense of achievement. I have managed to progress through stages of secondary data analysis on my own (pretty much), tackled problems I faced and fought hard to win. And winning the fight of my data analysis was key to my PhD 2016 success.

For now, that is all. I am sure that I will write a blog post on my findings at some point in 2017. However, I will leave that until my secondary data analysis is completely done.


3 thoughts on “Reflections on secondary data analysis in the PhD…

  1. Congrats to you for overcoming that challenge! I find that I have learned the most about statistics when working through secondary data on my own, with very few hints from others.


  2. Pingback: Is the domain of your PhD really that important? – Lyndsey Jenkins

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s