
In addition, to avoid overfiting problems due to noisy data or high dimensional data sets, we consider Principal Components Analysis (PCA) to focus on the main sources of variation avoiding the noise. The method has been extended also to the case with more than two groups making paiwise comparisons. We formulate an optimization problem to find a discriminating hyperplane between two data sets that can be useful to classify new individuals. In this work we propose an efficient alternative to the available classification methods in R without distributional assumptions. When dealing with small dimension the flexibility of the separating function can help to find a perfect separation, however with high dimensional data over-fitted problems can emerge and, as mentioned in, there is not need of additional flexibility that give this models, being the linear function a good option. SVM is an effective method in different situations. Machine learning approaches are useful in classification when dealing with high dimensional data sets, but for interpreting variable influence it is preferable Logistic Regression or LDA. This approach tries to find a stepwise rule that combines the best ranking variables in a training set also ignoring distributional assumptions.Īll these approaches could be considered complementary rather than competitive. We can also mention the Machine Learning approach, where we find alternative methods as Decision Trees, CART or Random Forest, and Neural Networks approach. These publications and other as demonstrate the exinting interest of addressing the classification problem through mathematical programming. In we find a discussion of mathematical optimization techniques proposed for SVM and reviews and compares supervised classification methods related to optimization. Support Vector Machine ( SVM) is the most popular classification method based in hyperplanes, that can be extended to nonlinear separating functions, as polynomial or radial kernel. In this sense, linear programming based methods look for a linear function that separates the classes avoiding parameters estimations. Mathematical programming is a natural way of dealing with the classification problem regardless of distributional assumptions. The high number of variables and the diverse type of distributional assumptions are challenging topics that researchers try to solve with non distributional approaches.

Statistical methods, usually are based in the evaluation of a scoring function that needs distributional assumptions as Fisher Linear Discriminant Analysis ( LDA) or Logistic Regression. Several approaches have been proposed to deal with this problem. One of the main goals in many recent data analysis projects is the classification of samples or individuals into predefined groups, according to the characteristics available.

Cervical cancer data is an available dataset in GEO, as it is mentioned in the paper.įunding: This research has been partially supported by Generalitat Valenciana, Grant GV/2017/177.Ĭompeting interests: The authors have declared that no competing interests exist. Default data is available in ISLR package as it is mentioned in the paper.
HYPERPLAN SUCH THAT ALL ELEMENTS ARE POSITIVE HOW TO
It was not the intention to publish the data alone, but in the package environment where the manual explains how to use it.

We would like also to include lpda package in as soon as possible. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.ĭata Availability: Regarding the datasets we have used in the paper to illustrate the method: - Palmdates data is within lpda package available in that is flexible for new versions of packages. Received: JanuAccepted: JPublished: July 7, 2022Ĭopyright: © 2022 Nueda et al. PLoS ONE 17(7):ĭartmouth College Geisel School of Medicine, UNITED STATES Citation: Nueda MJ, Gandía C, Molina MD (2022) LPDA: A new classification method based on linear programming.
