Neural network models are regression/classification models developed in machine learning for predicting outcome variables in terms of a number of input variables. They are generally used for problems where there are many input variables, and the outcome
variables are modelled as complex non-linear functions of all the input variables.
This project will investigate a famous data set which was analysed by Neural Network models. The idea is to start with a simple model, and add in layers of complexity, to see how complex the model needs to be in order to make a good prediction. The data
is a set of 20,000 English words with associated phonetic transcriptions. The aim of the analysis is to build a model which can predict the transcription of a new word.
The idea is to start with a simple regression model, modelling the phonetic transcription in terms of the input letters (for example frequency of each letter). This model can be tested by predicting the phonetic transcription for each word from the model
and comparing with the actual phonetic transcription. The student will then try to improve the model’s prediction by adding in layers of complexity. The predictions will then be compared to those found using a neural network model.
More information on the study (and the data set) can be found in the UCI Machine Learning Repository.
The project will involve:
- choice of suitable analysis method or methods for the data and question of interest,
- writing about the theory of the method to be used,
- a full statistical analysis, including exploratory and model checking
- discussion and interpretation of results.