Developing predictive precision medicine models by exploiting real-world data using machine learning methods (Theocharopoulos et al., n.d.)

Computational Medicine encompasses the application of Statistical Machine Learning and Artificial Intelligence methods on several traditional medical approaches, including biochemical testing which is extremely valuable both for early disease prognosis and long-term individual monitoring, as it can provide important information about a person’s health status. However, using Statistical Machine Learning and Artificial Intelligence algorithms to analyze biochemical test data from Electronic Health Records requires several preparatory steps, such as data manipulation and standardization. This study presents a novel approach for utilizing Electronic Health Records from large, real-world databases to develop predictive precision medicine models by exploiting Artificial Intelligence. Furthermore, to demonstrate the effectiveness of this approach, we compare the performance of various traditional Statistical Machine Learning and Deep Learning algorithms in predicting individuals’ future biochemical test outcomes. Specifically, using data from a large real-world database, we exploit a longitudinal format of the data in order to predict the future values of 15 biochemical tests and identify individuals at high risk. The proposed approach and the extensive model comparison contribute to the personalized approach that modern medicine aims to achieve.

TarBase-v9.0 extends experimentally supported miRNA-gene interactions to cell-types and virally encoded miRNAs (Skoufos et al. 2023)

TarBase is a reference database dedicated to produce, curate and deliver high quality experimentally-supported microRNA (miRNA) targets on protein-coding transcripts. In its latest version (v9.0), it pushes the envelope by introducing virally-encoded miRNAs, interactions leading to target-directed miRNA degradation (TDMD) events and the largest collection of miRNA-gene interactions to date in a plethora of experimental settings, tissues and cell-types. It catalogues 6 million entries, comprising 2 million unique miRNA-gene pairs, supported by 37 experimental (high- and low-yield) protocols in 172 tissues and cell-types. Interactions are annotated with rich metadata including information on genes/transcripts, miRNAs, samples, experimental contexts and publications, while millions of miRNA-binding locations are also provided at cell-type resolution. A completely re-designed interface with state-of-the-art web technologies, incorporates more features, and allows flexible and ingenioususe. The new interface provides the capability to design sophisticated queries with numerous filtering criteria including cell lines, experimental conditions, cell types, experimental methods, species and/or tissues of interest. Additionally, a plethora of fine-tuning capacities have been integrated to the platform, offering the refinement of the returned interactions based on miRNA confidence and expression levels, while boundless local retrieval of the offered interactions and metadata is enabled.

Neural Networks Voting for Projection Based Ensemble Classifiers (P. Anagnostou et al. 2023)

Ensemble learning has been proven effective in enhancing classification accuracy by aggregating predictions from multiple base classifiers. This paper introduces a novel approach to augmenting weak projection-based classifiers using a Neural Network within a stacking ensemble framework. The proposed method capitalizes on the diverse strengths of both linear and complex models, harnessing the interpretability of projection-based classifiers, while leveraging the pattern recognition capabilities of Neural Networks. We present a comprehensive algorithm involving dataset selection, preprocessing, base model training, meta-feature generation, and Neural Network architecture design and training. Extensive experiments demonstrate the efficiency of our approach on a variety of high-dimensional biomedical datasets. Our results showcase significant accuracy improvements over standalone projection-based classifiers and conventional ensemble methods. We analyze the interpretability of the hybrid ensemble, shedding light on the insights drawn from its Neural Network component. This work not only advances the field of ensemble learning, but also underscores the potential of combining disparate classifier paradigms to achieve superior predictive performance. The code for this study is available1.1.https://github.com/panagiotisanagnostou/NNv-MRPV

Two phase cooperative learning for supervised dimensionality reduction (Nellas et al. 2023)

The simultaneous minimization of the reconstruction and classification error is a hard non convex problem, especially when a non-linear mapping is utilized. To overcome this obstacle, motivated by the widespread success of Cooperative Neural Networks, an innovative supervised dimensionality reduction framework is proposed, based on a cooperative two phase optimization strategy. Specifically, the proposed framework that requires minimal parameter adjustment consists of an autoencoder for dimensionality reduction and a separator network for separability assessment of the embedding. This scheme results in meaningful and discriminable codes, which are optimized for the classification task and are exploitable by any trainable classifier. The experimental results showed that the proposed methodology achieved competitive results against the state-of-the-art competing methods, while being much more efficient in terms of parameter count. Finally, it was empirically justified that the proposed methodology introduces advanced behavioural explainability, while enabling applicability for image generation tasks.

Ensemble Clustering for Boundary Detection in High-Dimensional Data (Panagiotis Anagnostou, Pavlidis, and Tasoulis 2024)

The emergence of novel data collection methods has led to the accumulation of vast amounts of unlabelled data. Discovering well separated groups of data samples through clustering is a critical but challenging task. In recent years various techniques to detect isolated and boundary points have been developed. In this work, we propose a clustering methodology that enables us to discover boundary data effectively, discriminating them from outliers. The proposed methodology utilizes a well established density based clustering method designed for high dimensional data, to develop a new ensemble scheme. The experimental results demonstrate very good performance, indicating that the approach has the potential to be used in diverse domains.

References

Anagnostou, Panagiotis, Nicos G. Pavlidis, and Sotiris Tasoulis. 2024. “Ensemble Clustering for Boundary Detection in High-Dimensional Data.” In Machine Learning, Optimization, and Data Science, edited by Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Gabriele La Malfa, Panos M. Pardalos, and Renato Umeton, 324–33. Cham: Springer Nature Switzerland.

Anagnostou, P., P. Barmpas, S. K. Tasoulis, S. V. Georgakopoulos, and V. P. Plagianakos. 2023. “Neural Networks Voting for Projection Based Ensemble Classifiers.” In 2023 IEEE International Conference on Big Data (BigData), 4567–74. Los Alamitos, CA, USA: IEEE Computer Society. https://doi.org/10.1109/BigData59044.2023.10386944.

Nellas, Ioannis A., Sotiris K. Tasoulis, Spiros V. Georgakopoulos, and Vassilis P. Plagianakos. 2023. “Two Phase Cooperative Learning for Supervised Dimensionality Reduction.” Pattern Recognition 144: 109871. https://doi.org/https://doi.org/10.1016/j.patcog.2023.109871.

Skoufos, Giorgos, Panos Kakoulidis, Spyros Tastsoglou, Elissavet Zacharopoulou, Vasiliki Kotsira, Marios Miliotis, Galatea Mavromati, et al. 2023. “TarBase-v9.0 extends experimentally supported miRNA-gene interactions to cell-types and virally encoded miRNAs.” Nucleic Acids Research 52 (D1): D304–10. https://doi.org/10.1093/nar/gkad1071.