Driver fatigue monitoring system based on multi-layer deep learning framework and motion analysis

Driver fatigue monitoring system based on multi-layer deep learning framework and motion analysis

The latest development of the automobile industry has aroused the interest of researchers in fatigue driving monitoring, with the intention to develop an effective driver monitoring system that can detect abnormal psychophysical states in time and reduce traffic accidents caused by fatigue driving. Much of the literature now focuses specifically on the study of physiological signals, by measuring heart rate variability (HRV) to obtain information about cardiac motion.

Abstract: The latest development of the automobile industry has aroused the research interest of researchers in fatigue driving monitoring. The intention is to develop an effective driver monitoring system, which can detect abnormal psychological and physical states in time and reduce traffic accidents caused by fatigue driving. Much of the literature now focuses specifically on the study of physiological signals, by measuring heart rate variability (HRV) to obtain information about cardiac motion. In fact, HRV is also a valid measure of physiological stress because it can provide information related to the activity of the cardiovascular system innervated by the autonomic nervous system. This paper aims to reconstruct the photoplethysmography (PPG) signal in a robust manner by extracting facial feature points, analyzing the subtle skin motions induced by blood pressure. It is concluded that the PPG signal detected by the sensor has a strong correlation with the PPG signal reconstructed using the facial landmarks, and we obtain evidence to support this conclusion from the experimental results.

1 Introduction

Drowsiness is a physiological state characterized by a reduced level of consciousness and difficulty staying awake.In the U.S., the share of fatal crashes caused by drowsy driving is rising significantly, according to the National Safety Council[1]. Therefore, it is of great significance to develop an effective early warning system that can detect in advance that the driver’s physiological condition is not suitable for driving.According to reports, studies have shown that heart rate variability (HRV) is associated with driver’s level of attention[2]. To be precise, heart rate variability is an important indicator representing an individual’s physiological adaptability and behavioral flexibility. Heart rate variability is assessed by measuring blood pressure using the PPG signal.Specifically, the PPG signal is composed of the peak volume of blood vessels representing successive cardiac cycles. The PPG detection method is to use an LED light source to illuminate different parts of the skin, and then use a photodiode to evaluate the reflected intensity of the light.[3].Although physiological signals allow us to monitor drowsiness, recent research has focused on assessing driver fatigue using computer vision techniques[4].While developing a face detection system in an automotive environment is certainly challenging, there are many ways to use cameras to determine the blink rate and thus assess fatigue[5]. Different from other studies, our method focuses on using computer vision techniques to detect and extract facial landmarks by analyzing the pixel intensity changes of previously recorded video sequences to define the time series of facial landmarks. More specifically, the rationale of our method is also to reveal subtle facial movements caused by changes in blood pressure through “video zoom”. The purpose of this study is to construct a PPG signal by defining a time series of facial landmarks instead of using sensors.

The rest of the paper is structured as follows: Section II presents related research results; Section III provides an overview of PPG signals and introduces our pipeline based on long short memory and convolutional neural networks. Section IV explains the experimental procedure. Finally, Section V discusses the advantages of our method and future research directions.

2 Related research

Most of the papers published in the past have used physiological signals to detect driver drowsiness, and achieved high detection accuracy. In fact, many studies have proven that driver fatigue monitoring solutions based solely on computer vision technology may not necessarily be effective, especially visual methods that focus on analyzing traffic signs, which often fail when road conditions are poor.

Some researchers have published a research result of photo volume description signal (PPG) detection[6], the authors achieved good detection results using low-power wireless PPG sensors.another way [7] It was the authors that used low- and high-frequency PPG signals detected in fingers and earlobes to assess fatigue. The research cited in this paper mainly evaluates HRV signaling by studying ECG and PPG signaling. However, the method cited in this paper has high requirements on computational performance and requires the integration of expensive detection equipment on the vehicle. Although the integrated sensor is not necessarily a direct measurement tool, in order to accurately obtain physiological signals, the driver still needs to place the hand or other parts of the body (such as earlobes or fingers) on the sensor, which is a key to promoting the application in the car. limit. This paper takes a different approach and proposes an innovative framework. The basic principle is to capture the driver’s face image, collect facial feature points, and reconstruct the PPG signal to evaluate the HRV signal and fatigue level.

3 Background and pipeline scheme

As mentioned before, we propose an innovative driver drowsiness state monitoring method without using sensors to acquire PPG signals.The research results of some scholars[8]It is explained how the video magnification method can reveal the movement changes of the human face by magnifying the ordinary video image, because the blood pressure changes in successive cardiac cycles can cause the color changes of different parts of the skin. Studies have demonstrated that autonomic nervous system activity modulates certain physiological processes, such as blood pressure and breathing rate, which can be indirectly measured by assessing heart rate variability signals, which occur during periods of physiological stress, extreme fatigue, and drowsiness Variety.

Assessing HRV heart rate variability requires the use of biofeedback tools or software, as well as high-quality sensors to detect ECG signals, and a powerful processor to manage the large amount of data. ECG signal is a traditional HRV assessment method. However, this method has some drawbacks in its use. Although the detection effect is good, the subtle movement of the human body during the data acquisition (data sampling) process can cause some noise in the signal. and artifacts. In order to overcome the problems of ECG, the industry proposed that PPG signal is a reliable solution, and the ability to detect changes in blood volume enables PPG to effectively detect subtle skin movements that are difficult to observe with the naked eye. In particular, by analyzing the PPG signal, we were able to delineate changes in heart rate over a specific period of time, showing whether both branches of the autonomic nervous system (parasympathetic and sympathetic) were functioning properly. Generally, a small HRV value indicates a constant heart rate interval; a large HRV value indicates an abnormal heart rate interval. A very normal heart rhythm and subtle changes in heart rate can determine if attention is reduced due to chronic physiological stress. However, there is no one standard HRV value because HRV values ​​vary from person to person.

With this in mind, we employ a long short-term memory (LSTM) neural network[9]with Convolutional Neural Networks (CNN)[10]The combined approach developed a driver drowsiness monitoring system. The pipeline mechanism proposed in this paper represents an advance in cardiac motion assessment methods, as it uses a low frame rate (25fps) camera to detect and extract key feature points in face images and analyze the pixel changes for each video frame. Precisely speaking, LSTM is a powerful solution for evaluating hidden nonlinear correlations between data.

Specifically, the output of the LSTM pipeline is the predicted time series of facial feature points after synthesizing the raw PPG target data detected by the sensor. In addition, the accurate classification of the CNN model indicates that the LSTM predictions are valid and can determine the level of attention of the car driver.

4 experiments

In total, 71 objects participated in our LSTM-CNN pipeline run. More specifically, the dataset is PPG samples from patients/drivers of different gender, age (between 20 and 70 years) and pathology. In this case, we collect data not only on healthy subjects, but also on patients with high blood pressure, diabetes, etc. Taking into account the difference between the two drowsiness states, the respective PPG signal samples of the two drowsiness were measured separately. Specifically, we simulated two scenarios of full wakefulness and sleepiness confirmed by synchronized ECG sampling signals, with Beta and Alpha waveforms confirming the state of brain activity during arousal and sleepiness, respectively. The simulation interval for each scenario was set at 5 minutes to ensure that the system had sufficient time to complete the initial calibration and continuous learning in real time.At the same time, we use a low frame rate (25fps) full HD camera to record a video of the driver’s face. As mentioned above, we first use the machine learning algorithm based on Kazemi and Sullivan [11] The dlib library detects previously recorded video frames, extracts facial feature points, and then calculates the pixel intensity associated with each feature point, as well as the change in pixel intensity of each frame, determines the time series of facial feature points, and converts the Input LSTM neural network.

4.1 CNN pipeline

This section will describe the CNN model architecture used in the experiments in more detail. The CNN architecture proposed in this paper provides strong evidence for validating LSTM predictions. Specifically, our CNN model is able to track and learn the facial expressions of car drivers, leading to improved drowsiness detection. To train the model, we set the batch size to 32 and the initial learning rate to 0.0001. Furthermore, we used 32 neurons in the hidden layer and 2 output neurons in the binary classification.

We are very optimistic about the experimental results, as the accuracy rate reaches 80%.

4.2 Long Short-Term Memory (LSTM, Long Short-Term Memory) pipeline

Driver fatigue monitoring system based on multi-layer deep learning framework and motion analysis

Fig. 1. LSTM pipeline

Regarding the ability of Long Short-Term Memory (LSTM) to detect the relevance of sequential data (time series), we constructed an LSTM model with facial feature point time series as input data and raw PPG signals as target data , to reconstruct the PPG signal (Fig. 1). After adjusting all the time series values ​​in the range of (0.2, 0.8) using the MinMaxScaler algorithm, we conducted the model training considering the following parameters. The simulation training uses 256 neurons, the batch size is 128, and the initial learning rate and dropout rate are set to 0.001 and 0.2, respectively. To evaluate the robustness of the PPG reconstructed signal, we calculated the frequencies of the PPG minima (Fourier spectrum), and we specifically analyzed the frequencies of these points, comparing the frequencies of the original PPG minima with the reconstructed PPG minima. frequency.

5 Conclusion

Driver fatigue monitoring system based on multi-layer deep learning framework and motion analysis

Figure 2. Fast Fourier Transform (FFT) spectrum of the original PPG minimum point (blue) and FFT of the reconstructed PPG minimum point (green).

Finally, we provide an effective monitoring system based on LSTM-CNN to determine the driver’s drowsiness by assessing cardiac activity through PPG signals. Different from other methods, our method reconstructs PPG signals from facial landmark data and does not involve sensor systems. As mentioned earlier, we construct an LSTM pipeline, using facial feature point time series as input data and PPG detected by sensors as target data, to demonstrate the robustness of PPG reconstructed signals. In addition, we build a CNN model that not only classifies the driver’s physiological state, but also validates the LSTM predictions. Finally, we calculated the Fast Fourier Transform (FFT) spectrum of the original PPG minima and the FFT spectrum of the reconstructed PPG minima (Fig. 2). Experimental results demonstrate the promising application of our method, as we are able to distinguish sleepy subjects from awake subjects with nearly 100% accuracy, which is consistent with the average results achieved by similar pipelines reported in the scientific literature.Using an improved PPG sensor[12]and utilizing the Stacked-AutoEconder architecture[13]What kind of improvements will be brought to the pipeline proposed in this paper is the direction that the author of this paper is currently researching.

The Links:   6MBI150FB-060 G320ZAN010