Computer control with 9 hand gestures using a standard webcam
Control de computadoras con 9 gestos de las manos usando una cámara web estándar
Main Article Content
Abstract
Computer vision, which automates human visual capabilities, has become a promising field used in various industries, including implementations for human-computer interaction without the need to touch external control systems or peripherals. This paper presents a solution that recognizes hand gestures by analyzing three-dimensional landmarks at the joints. These points are obtained by using a webcam, and are used as input data for an artificial neural network that identifies nine different gestures. A network architecture was designed, a proprietary dataset was created and the network was trained. In addition, data pre-processing was implemented to normalize and transform the landmarks, thus improving the performance of the proposed model. The evaluation of the model showed a 99.87% hit rate in the recognition of the nine gestures. In this work, this model is implemented in an application called "Hand Controller", which allows controlling the keyboard and mouse of a computer by means of gestures and hand movements, achieving a high performance in the recognition of hand gestures in real time.
Downloads
Article Details
References (SEE)
Chhibber, N., Surale, H. B., Matulic, F., & Vogel, D. (2021). Typealike: Near-keyboard hand postures for expanded laptop interaction. ACM on Human-Computer Interaction, 1-20.
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019, 4171–4186.
Hu, F., He, P., Xu, S., Li, Y., & Zhang, C. (2020). Fingertrak: Continuous 3d hand pose tracking by deep learning hand silhouettes captured by miniature thermal cameras on wrist. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4, 1-24.
Hu, H., Zhao, W., Zhou, W., Wang, Y., & Li, H. (2021). Signbert: Pre-training of hand-model-aware representation for sign language recognition. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 11067–11076.
Joo, Y. R., Shiratori, T., & Hanbyul. (2021). FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration. ICCV Workshop 2021.
Kim, D. U., In Kim, K., & Baek, S. (2021). End-to-end detection and pose estimation of two interacting hands. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 11169–11178.
Kim, Y., An, S.-G., Lee, J., & Bae, S.-H. (2018). Agile 3d sketching with air scaffolding. Conference on Human Factors in Computing Systems CHI'18, 1-12.
Lee, J. H., An, S., Kim, Y., & Bae, S. (2018). Projective windows: Bringing windows in space to the fingertip. Conference on Human Factors in Computing Systems CHI'2018, 1-8.
Liao, J., & Wang, H. (2019). Gestures as intrinsic creativity support: Understanding the usage and function of hand gestures in computer-mediated group brainstorming. ACM on Human-Computer Interaction, 1-16.
Liu, D., Zhang, L., & Wu, Y. (2022). Ld-congr: A large rgb-d video dataset for long-distance continuous gesture recognition. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3294–3302.
Malik, Z. C., Radosavovic, I., Kanazawa, A., & Jitendra. (2021). Reconstructing Hand-Object Interactions in the Wild. International Conference on Computer Vision (ICCV).
Matulic, F., Arakawa, R., Vogel, B., & Vogel, D. (2020). Pensight: Enhanced interaction with a pen-top camera. Conference on Human Factors in Computing Systems CHI'20, 1-14.
Min, Y., Chai, X., Zhao, L., & Chen, X. (2019). Flickernet: Adaptive 3d gesture recognition from sparse point clouds. The British Machine Vision Conference (BMVC).
Min, Y., Zhang, Y., Chai, X., & Chen, X. (2020). An efficient pointlstm for point clouds based gesture recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5760–5769.
Osimani, C., Piedra Fernández, J. A., & Ojeda Castello, J. J. (2023). Point Cloud Deep Learning Solution for Hand Gesture Recognition. International Journal of Interactive Multimedia and Artificial Intelligence. doi:http://dx.doi.org/10.9781/ijimai.2023.01.001
Pei, S., Chen, A., Lee, J., & Zhang, Y. (2022). Hand interfaces: Using hands to imitate objects in ar/vr for expressive interactions. Conference on Human Factors in Computing Systems CHI ’22.
Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017). Pointnet: Deep learning on point sets for 3d classification and segmentation. IEEE conference on computer vision and pattern recognition, 652–660.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
Weng, Y., Yu, C., Shi, Y., Zhao, Y., Yan, Y., & Shi, Y. (2021). Facesight: Enabling hand-to-face gesture interaction on ar glasses with a downward-facing camera vision. CHI ’21 Conference on Human Factors in Computing Systems, 1-14.
Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C.-L., & Grundmann, M. (2020). Mediapipe hands: On-device real-time hand tracking. Fourth Workshop on Computer Vision for AR/VR (CV4ARVR).
Zhou, Q., Sykes, S., Fels, S., & Kin, K. (2020). Gripmarks: Using hand grips to transform in-hand objects into mixed reality input. CHI ’20: Conference on Human Factors in Computing Systems, 1–11.