Fall detection

Name: Guillaume De Gani

The purpose of this notebook is to study the fall detection using floor sensors and to select the right model to predict wether or not the subject fell which in the future could help elders in their daily lives.

Initializing the Data

As you can see above, the dataset has 87 features and the target feature "Fall". The 87 features are seperated in 3 categories each going from X1 to X29:

We start by initializing the data we need for the diffent models, to do so the data is split it two stets:

After splitting the data in two sets X & Y we can apply train_test_split which is a quick and easy way to split the data in a training set and a testing set to properly check the accuracy of our model. In this case we set aside 20% of the data to test it after fitting the various models.

The data count is the following:

Before starting the analyses it is important to note that the data is fairly imbalanced i.e only 7,16% of the data represents a fall. Which means that using accuracy is unwise since the model could easily reach 93% accuracy by labeling all data as 0. For this reason it's preferable to use f1 score to only measure true positive and avoid these issues.

Quick overlook of the various models

In this section different model will be tested and for each classifier a confusion matrix will be ploted which gives important information notably on the true positive rate TPR of the model which is the most important metrics when studying imablanced data.

Here a list of the models that were tested and compared:

The first model that is tested is Random Forest were we plot the confusion matrix which is helpfull to see how accurate the model list by checking that the diagonal as high values.

The result for the first model gives an F1 score of 93.3% and a precision of 95%. These result are promissing with the amount of data that was used for the training.

The next model that was tested was a support vector machine with three different kernels:

And once again all the models have fairly high Accuracy and F1 score however this was done without cross validation so the sanity of the data selected migh be the reason for these high value.

The next plot represent the confusion matrix for another set of classifier once again just to do a quick sanity check and see if any of them give absurd results.

Every confusion matrix as a diagonal that is close to one which indecates that these model are fairly accurate.

Comparing the models

After testing the diffrent model to check if any of them were giving absurd results it's important to compare them to see which model performs the best. To do so it's important to find diffrent ways to visulize the scoring parameter of the classifier tested.

Receiver Operator Characteristic curve (ROC)

In this part the ROC is used to evaluate the ability of the binary classfier to find true possitives. In the bottom left corner of the graphs it's possible to compare the diffrent AUC or Area Under the Curve is equivalent to the probability that a randomly chosen positive instance is ranked higher than a randomly chosen negative instance.

The closer it is to 1 the better the model is at classifying the data. In this case every model as an AUC above 0.90 which seems very promising.

K-Fold cross validation

Previously the model were evaluated without using cross validation, to get better results it's best practice to measure the diffrent metrics by averaging on a K-Fold. The following data is optained using 10-folds.

In the table below we can compare diffrent scoring metrics for each model this table seems to indicate that the better performing model is Random Forest regarding the f1 score is random forest. However the goal of this technologie is to dectect wether or not a person fell and it's important to ask ourself how dangerous it is to not detect a fall. That's why if the objective is to minimize this error than Naive Baiysian has the highest fall detection accuracy.

Another way to visulize this is using Bar Plots, before plotting the data we sort each classifier by the mean of their f1 score.

This plot confirms our previous assesment that overall Random Forest seem to be better performing than the other models regarding the f1 score. Even though most perform in a similar manner.

Conclusion

The data given was fairly imbalanced towards the first label which represents the patient not falling. This made it challenging to find a model that was performing well overall and that wasn't missing to many falls which if applied in real life could be problematic.

It seem like the better peforming model overall is Random forest and to have a high level of precision regarding Fall detection then Naive Baiysian give good results.

Finally the question can be asked regarding is using transfer learning since this data was optained by using fairly young test subjects age 25 to 45 and it's supposed to be applied for seniors e.g above 65. It's likely that their walking patern and falling patern could differ slightly like in their speed for example. And for obvious reason getting data by asking seniors to fall on purpose seems highly unethical.

Another question that can be asked is the impacts of walking aids like canes and walkers on the data receveid by the floor sensors. This could be an important factor to consider since in the US 16.4% of seniors use a cane and 11.6% use a walker.