Reflections on secondary data analysis in the PhD…

vaiablesYou may remember that I have been spending the last few months carrying out some secondary data analysis of different surveys relating to innovation and other related concepts in the workplace. However, you may also remember that this has not been an easy task.

Initially, I was going to be analysing secondary data taken from surveys relating to innovation alone, but as you can see in my previous blog post this approach did not work. I got to the point (not long after writing that blog post) where I had has enough, and I questioned the purpose of doing the secondary data analysis at all. To me, it didn’t matter that this was a requirement of my original PhD proposal and that my PhD sponsors and funders wanted this type of analysis done. To me, it was getting me down and the only way I could see to fix it was to make my way out.

I decided to tell one of my supervisors this as it got to the point where my frustrations were getting me down. It turns out they did not know it was making me feel bad, but did know I had not had the support network present that I really needed for the analysis, and at times I just had to plod on. We decided to have a supervision meeting to chat about the purpose of my secondary data analysis and to explore the next steps we could take. It turns out that this was one of the most productive meetings I have had with my supervisors as it meant from that point on, I knew what I was doing and made myself a plan of what I was going to do. In our meeting we decided that:

  1. I was going to analyse some secondary data from international surveys. This would give information on trends occurring on the macro level (between international countries);
  2. This would then lead to further exploration of UK survey data (on the micro level) where I could explore specific contributions to the development of innovation from data taken in the UK;
  3. I could then draw conclusions about what was happening at international level and how this compares to countries within the UK, and finally I would have my European and UK comparison of secondary data.

We also formulated a plan of the types of exploratory analyses I could do and how these were carried out on the programme I was using (the joy of SPSS). And this is how it went…

Accessing data

I accessed some open-access data where I did not require specific permission to use it. My supervisor was great in finding lots of links and I spent a week searching through these, looking for variables associated with innovation and also variables which could contribute to innovation development. I accessed most of my data through Eurostat so that I had variables there and then. This meant I did not have to apply to use the data or have my project approved specifically either, which eased the tension a little. I did use other websites to access information on things such as education provision, gross domestic product and related variables, something I highly recommend PhD students do. The more data sources you access (reasonably) the more information you may find on variables related to concepts you are exploring. I found that this reduced my data access anxiety almost immediately as I was able to create a fully comprehensive dataset – a dataset containing over 70 variables I must admit. But this meant I had a starting point and a starting point was looking forward to explore. More importantly, I was able to send my variables to my supervisor and they gave advice on other data I may like to find to help fill the gaps in the analysis I was about to do.

Preparing the dataset

One thing that people don’t realise is that secondary data is not always in the best condition. There are often countries with missing data, countries where data that is incorrect corrand data which makes no sense at all. As part of my data preparation and cleaning I had to make sure I had a full dataset (by replacing missing values with means), remove errors and full stops before putting it into an SPSS file. I then I had to make sure I labelled, coded and checked all variables again for errors so the analysis could be performed properly. Some students don’t realise that this takes time, and that this time is precious. It meant I got to understand all of my variables and where they came from, meaning I understood what they actually meant!

The preparation also included clustering my data into levels of innovation within countries. I wanted to make comparisons between high, medium and low levels of innovation so I had to use hierarchical cluster analysis to get this done. I managed to successfully cluster my data and was able to make plans for further analysis from the results I got.

And then the analysis bit…

Firstly, I carried out some exploration of descriptive statistics. This has given me a picture of innovation and employment trends across 28 countries in the EU. It also highlighted some interesting trends in terms of how the UK compares to countries across Europe and I have found that this can help justify the purpose of my PhD – quite good for me.

Secondly, I then carried out come correlation analysis for two main reason. Initially I wanted to see if there were any relationships between variables that could be explored later, and after that I wanted to help justify my choice of exploratory analysis I would use next. I found that there were some correlations between variables and these correlations seemed to make sense. When reporting them in my write-up I started to find myself trying to explain the patterns in my head to try and explain what was happening and why. The interpretation part of statistics is one part that I love, regardless of how significant results seem to be.

Analysis notes

Thirdly I then carried out something called a Principal Components Analysis (PCA). This was so that I could see if any variables were repeated and whether the variables could be narrowed down to less components. I found that this was quite hard. I had not run this type of analysis in almost 8 years and could not recall the point. But a great book and some statistics notes form my undergraduate and masters years helped me get through what I needed to do. In my opinion my analysis did not go as planned, but I was still able to see why. I was able to see why the components were not appropriate and made justifications to carrying out the next steps of my analysis.

Finally, I carried out a series of one-way analysis of variance and two-way analysis of variance analyses. This happened in three stages:

  1. I explored differences in means between low, medium and high innovative countries on the variables I had categorised as ‘influencers of innovation’;
  2. From step 1, I then reversed some of the analysis. I explore the difference in means of the influencing factors on the amount of innovation present in countries. This helped me determine whether significant relationships in stage 1 were repeated and which ones were not significant at all;
  3. I then took all significant variables in stage 2 and entered these into two-way analysis of variance analyses. This determined if there were any main effects of the independent variable (innovation influencer) present on the dependent variable (amount of innovation). It was at this point I considered covariates, variables that may influence relationships between other variables and ones that you can account for. I entered these into the analysis before getting my results.

It was at the point that I felt my analysis has been a success. I found that there are some patterns in my data worth reporting, and some that are not so helpful. I have found that this analysis WILL help me answer my research question initially and then my next analysis will go further in detail. I am hoping I can explore the UK survey data in as much depth as here but that will be a challenge for 2017 I think!

However, going tough the process of the analysis on my own (with slight help from my wonderful colleagues and supervisors when I got stuck) has given me a sense of achievement. I have managed to progress through stages of secondary data analysis on my own (pretty much), tackled problems I faced and fought hard to win. And winning the fight of my data analysis was key to my PhD 2016 success.

For now, that is all. I am sure that I will write a blog post on my findings at some point in 2017. However, I will leave that until my secondary data analysis is completely done.

It’s more than ‘just’ a PhD…

ed-h-floorLast week I attended some training designed to highlight the importance of the Researcher Development Framework and how PhD students can use the resources on the Vitae website to support their own development throughout the PhD. This is why it’s more than just a PhD.

Now I was quite skeptical about this training as I was concerned that I has heard it before as I have attended some similar training events in the not so distant past. However, the training got me thinking a little about my own PhD and what I am (and hoping to) get out of the whole process.

We do take it for granted that we will get a PhD as some of us actually don’t. Some of us leave, some of us fail and some of us struggle through. However, going into a PhD does have one ultimate goal – carrying out a research project and writing it up successfully so that we can graduate with smiles on our faces and be proud of our achievements, right? Sometimes not! For me, getting the PhD is a goal, but I have other ones too. I want to make sure I am confident enough to move on after my PhD, don’t dwell on things that have brought me down over the past few PhD years and most importantly, I want to make the outcome of my PhD worth it. To me, this means I want to use it in the career and make sure I do something worthwhile so that I know the last three years have been worth it. I know they will have been worth it anyway, but I want to know I’m putting my studies to use when I’m done and know I am doing that too.

What about the considerations to what we do after the PhD? Where do we go? Do we stay in academia, or not? Do we go into practice or do we forget about our research completely, decide we have had enough and take a whole new direction. for you, this is not up to me!

img_1415-copyNow the careers adviser in me is the thing that keeps me going in this when I think I have not done enough. The career adviser tells me that I still have two years to go and that I can work towards me end goal without having a final destination in mind (ie, a job). However, the career adviser in me also tells me that it’s not the end goal that matters as much (although it still does), it’s more about what you do during your PhD years that will help. So this is there the Researcher Development Framework comes in – it can help you see how you are developing as a researcher rather than how you are not. It can help you point out skills and qualities that you want to work on as well as ones you are think you are good at. It can also help you plan ahead, and schedule in activities to help you work on those things you want to and most importantly, it can help you track progress so that you can actually see you are doing something more than ‘just a PhD’. It can help you work towards being a confident academic researcher with all the skills you need to survive, skills which employers want to see.

The afternoon of that training was a little weird and I started thinking about all the stuff I wanted to get out of my PhD that had nothing to do with the research itself. I then actually took a while to look at the Research Degrees Framework and have a 10 minute reality check that I’m doing okay and there is still stuff I’d like to work on – and that’s okay too!

We got talking in the training about how to ‘evidence’ out achievements in the RDF and this is something I am in two minds about. I wholeheartedly agree that things should be ‘evidenced’ to demonstrate capability in certain domains of the framework, however, I often don’t agree with this being recorded in one online place. For example, on day one of my PhD, I was given a Post Graduate Development Record by my supervisor and for me, this is my evidence. I am a very practical person so when there is the opportunity for me to ‘evidence’ my achievements physically I would normally choose that option anyway. My PDR file is full of bits and bobs that I have done – conference presentations, reviews, feedback and so on. For me, that is more important than nothing at all. For me, having all of my PhD stuff in one place is quite handy, not only to see feedback on something I did, but also to refer back to when I need to see what I have done.

I know this option is not for everyone and I do blame my careers and education background for my preference, but there are other options in how to record progress, and the RDF planner is one of them. Now this gadget keeps all your stuff in one place you can comment on evidence and upload evidence pretty much how you feel, but it does come at a price. Your educational institution should have a subscription to the service and as far as we know, the subscription ends as soon as you leave, but you can still pay as an individual, I think. You can always download your portfolio to keep and some people may prefer this way of doing things.

RD5 planWhen I had my last review, I used my PDR file to see what I had done as I was feeling like I had not progressed much since. I actually proved myself very wrong in this case and realised that I had done more than just a PhD but that these things were helping my PhD too.

I found I had attended a lot of training events which was helping to develop my knowledge all of the way through, and also help me learn new things I did not know. I had helped to organise a 2016 doctoral colloquium earlier in the year and also a student conference in the School of Computing. I had also presented my work at Departmental level, Edinburgh Napier University level, Cross-University level and also internationally so I really now don’t see why I had a reason to complain. I’ve also published two articles with colleagues (one journal and one in conference proceedings) and worked with some wonderful people along the way. For my efforts I managed to win a few awards: best paper, second best poster and also best student presentation which rounded off my first academic year nicely in one. Then this year, I took on the role of School of Computing PhD student rep, and then I was chosen to be a SGSSS student rep too. I also started teaching which I found I love and have also been interviewed for a Scottish company on my graduate experiences so the second PhD year is quite fruitful too.

So for me, reflecting on my progress was a success. Using the RDF as a means for improvement (or development!), I then started to measure my own progress on a on the points scale of each domain. It worked for me knowing from my initial RD4 review to my RD5 review I had improved in some areas and there were some areas I felt I wanted to improve more. It means I have things I would like to work on and things I can say I’m okay with and other things I’m really really bad at!