Report
Mobile App for Ergonomic Joint Angle Analysis of Real-Time and Recorded Visual Data
Introduction
Repetitive action is ingrained into the general nature of physical tasks, for example when performing food production activities, especially at scale. There has been a steady trend of people having emergency hospital visits over the past decade due to musculoskeletal injuries over the past decade, according to a 2019 report by U.S. Bureau of Labor Statistics. These kinds of injuries could occur at the workplace due to sudden, sharp movements of the body or through long-term improper ergonomic practice.
Data Scope
This study aims to primarily distinguish between three critical tasks of culinary work, which are chopping, slicing, and sawing. The subtlety of difference when performing and analyzing these tasks depends on the minor differences in hand motion and the holding and operation technique with the food. For example, the outcome of a potato being chopped as opposed to being sliced would be different. Li et al. (2020) came up with SEE, a proactive strategy-centric and deep learning-based ergonomic risk assessment system for risky posture recognition. Automated deep learning algorithms like Sequential Convolutional Neural Networks (S-CNN) were used for the validation process. A model that can distinguish between these tasks can help identify ergonomic departure from nominal expectations at a more granular level.
Data Collection
The experiment was performed by 5 participants (3 males and 2 females). They were between 19 and 24 years old (M=21.8; SD=2.28). An initial screen was conducted to confirm their eligibility based on age (minimum 18 years old) and physical ability to perform the culinary tasks. The experimental protocol used in this study was approved by Arizona State University (ASU). The tasks were performed using everyday food and culinary equipment in an average household or restaurant. The participants were given initial instructions about basic expectations and safety guidelines to avoid injuries with sharp tools, while three regular video cameras (See Figure 1) recorded their task performance.
Preliminary Data Analysis
Using landmark points set by OpenCV, the pose estimation has been carried out by calculating the location of the joints of interest, which involve upper limb and neck motion. Based on the participants’ working style and culinary techniques, the joint angle data were automatically processed and compared against the Assessment of Repetitive Tasks (ART) manual tool to rate ergonomic risk (Health and Safety Executive, 2010). This automated process pipeline uses software employing computer vision and poses estimation algorithms. The output of this algorithm was compared against a manual ergonomic assessment by an independent grader. A correlation of more than 95% was observed for manual and automated grading for three ART-C categories.
S-CNN Model
The tasks were classified into 12 classes (See Figure 2). An S-CNN has been trained using the data that was collected from the participants performing culinary tasks. It is designed to analyze and extract features from spatial data from images by using convolutional layers. The S-CNN has been trained on images generated by framing the video data collected from participants performing the delegated tasks. The S-CNN (See Figure 3) learns to recognize different aspects of the task, such as the types of dominant hand used, the approach to hold the food before and during the task, and the overall presentation of the finished product. By training an S-CNN on this type of data, we gain insights into how people perform culinary tasks and how different techniques and ingredients are used based on their expertise and experience. This information is then used to predict ergonomics and to classify task videos that the model has not seen during training.
The overall accuracy of the model converged between 97-99% (See Figure 4) for the classification for unforeseen data. The model was trained over 30 epochs, and the last epoch of training information has been noted below as an example:
Epoch 30/30: loss: 0.0258 - accuracy: 0.9931 - val_loss: 0.0026 - val_accuracy: 0.9996
Figure 4: Training and validation (a) loss and (b) accuracy of the model for ambiguous (chopping, slicing, and sawing) tasks.
Ongoing Tasks
The above-mentioned processing is being ported to the environment. An application (See Figure 5) is under development and testing to record and save video data. Graphical representation of the data on the app is also being worked on.
​
An external dataset (UCF101) has also been introduced as test data to this model and the results will be produced with retraining of the model. According to University of Central Florida (UCF), UCF101 gives the largest diversity in terms of actions and with the presence of large variations in camera motion, object appearance and pose, object scale, viewpoint, cluttered background, and illumination conditions with 13320 videos from 101 action categories.
iOS App UI
We opted for a very simple UI for our ART Ergonomics Application. So, even the naive user can understand the purpose of our app as soon as their look at it.
​
If you look at the Home Screen we have only four CTAs: ​
​
-
"Live Posture" lets the user perform posture analysis for the live view as seen from the iPhone camera.
-
"Analyze Video(s)" lets the user perform batch ergonomic analysis of recorded videos from the phone storage.
-
"Camera", which comes inbuilt with our app - makes the user's job easy to record videos by cutting the number of sequences of tasks like closing the app and opening the existing camera app on the mobile phone to record and then uploading the video to analyze the ART.
-
"About" gives information like the purpose of the ART Ergonomics application, GitHub code, Contact, and Author details.
Overall, we have enhanced the user experience of our ART Ergonomics app compared to other apps in the market.