Secondary data struggles…

My trusty stats tree!

This part of my PhD should have been one of the easiest, however, so far it has been one of the hardest. I’m doing SECONDARY DATA ANALYSIS but I’m not doing data analysis for any random reason though! I’m in the process of analysing some secondary data taken from different work and employment surveys so that I can see if there are any trends in training and skills that require further exploration in my case studies. Currently, I am focusing on two main ones form the UK:

Labour Force Survey

Workplace Employment Relations Survey

I thought this part of my PhD would be quite straight forward as I am research methods and statistics trained. From working on my Bachelors and Masters degrees, I had substantial research methods training with a lot statistical elements to this. In a nutshell, I learned how to explore data (descriptive statistics, central tendency, spread of data), how to represent data (so things like frequencies and charts), how to explore relationships between variables (regressions and so on) and also how to compare means (ANOVA, MANOVA, ANCOVA). This combination of descriptive and inferential statistics means I understand (mostly) what to analyse and figure out how to analyse it. However, I have soon learned that this is not the easiest part of my PhD and may become the hardest.

I have ran into a few problems so far. Firstly, I cannot access the data I need easily. I had hoped to get the UK Innovation Survey as this is one of the main areas of my PhD. The survey would tell me about innovation activity of organisations within the UK. I had hoped to get this so that I could then compare data with the Community Innovation Survey. This is the European version of the innovation survey with the UK Innovation survey being part of the Community Innovation Survey.  However, as both of these surveys are restricted in terms of who can have them (like much of the European data), I need to work with my supervisors to get them. This could take a while (up to two months for some!) so my plan of a wonderful UK/European innovation comparison will have to wait as I can’t write a paper without my data!

4 years of research methods and statistics!

Secondly, some of the data is not all that useful. It’s great having over 80,000 respondents to surveys but when only 12 people give an answer to the question (or variable) you are looking to analyse then then the analysis of this hits a brick wall. I’m quite glad that I have other variables to explore, but it is making me question how representative secondary data is (in general, not just mine) in terms of representing the population of the surveyed country… or even whether there’s a point to doing this at all???

There is, I can assure you there is!

I’m even encountering problem with deciding what to analyse and don’t often go into analysis with no clue at all. This is exactly what I am doing for this data analysis and the lack of literature at the moment is becoming somewhat problematic. This is not only because I cannot see where my analysis story lies but also because I cant grasp an understanding of how the analysis might flow, or what the variables say at a glance.Its not my data, its someone else and this wads bound to happen sooner later, I just wish it had happened sooner.

Normally  I would follow a pattern and:

(1) identify a problem for exploration (from the literature generally);

(2) identify hypotheses to test or variables to explore;

(3) explore the data in terms of central tendency and spread (and graphs, oh I love graphs!);

(4) decide on an analysis and justify this (ie, why am I doing this? What am I trying to find out?);

Data analysis or sun? Data!


(5) carry out some simpler analysis and explore individual relationships among variables (to see if they are ‘worth’ exploring further);

(6) carry out tests for suitability of analysis on that dataset (such as tests for normality in terms of distribution, homogeneity of variances and so on);

(7) carry out the analyses (normally figure this is wrong and do another type of analysis) – and then do post-hoc comparisons depending on the data and analysis complexity

(8) see how wonderful the results are, report them and explain them in terms of the literature.

Even my tweets are showing the slow and frustrating process of exploring data to see what variables are of use and seeing if any patterns emerge *sigh*!

So for my next few weeks I have a plan! I am going to explore the data a little further to see if there are any more in-depth analyses that I can do, and comparisons I can make. Who knows, when I dig this may lead to something big (or not!). Either way, I will be able to write up my findings in a short report but whether it warrants anything publishable may not be known until the rest of my data is explored and I get my hands on the surveys I need the most.



One thought on “Secondary data struggles…

  1. Pingback: Reflections on secondary data analysis in the PhD… – Lyndsey Jenkins

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s