LEARNING STYLE DETECTION USING K-MEANS CLUSTERING

Learning theorist established the fact that learners are characterized according to their distinct learning styles. Investigating learners' learning style is important in the educational system in order to provide adaptivity and improve learning experience. Past researches have proposed various approaches to detect learning styles. Among unsupervised learning methods, the K-means clustering has emerged as a widely used method to predict patterns in data because of its simplicity. This paper evaluates the performance of K-means clustering in automatically detecting learners’ learning style in an online learning environment. The experimental results prove differences in learning thus characterizing learners based on learning style.


INTRODUCTION
Rapid advancements in technology have led to massive contributions in various sectors including the education sector. Innovation in technology entails new approaches that influence learning process and greatly improve educational system. For instance, many Institutions employ electronic-learning (e-learning) thereby allowing learners from all around the world to participate in learning irrespective of physical barrier. A Learning management system (LMS) is a platform that allows instructors to set up educational courses and support online learning delivery. Although traditional LMS supports online learning effectively, these systems are limited in their ability to personalize learning. Personalization is crucial to LMS because each learner has individual needs and characteristics such as different prior knowledge, cognitive abilities, learning styles and motivation (Graf, 2007).
A learning style refers to the way people receive, process, evaluate, understand, and utilize information in learning (Battaglia, 2008). Educational theories argue that providing courses that fit the individual characteristics of learners makes learning easier for them and thus, increases their learning progress (Graf, 2007). Obtaining information about learners' learning styles can help education instructors to apply appropriate teaching methods to improve learning process and experience.
There are two main types of learning style detection methods: (i) static learning style detection based on a static approach such as questionnaire and (ii) automatic learning style detection infers preference and needs through the learning behaviour and action. Although the static detection method is simplistic in its approach, this method actually depends on learners to fill in lengthy questionnaires to deduce learning styles, which might lead to poor accuracy measurement. A better approach is the automatic detection that requires little effort from learners and captures learning styles based on learning behaviour and action.
Several recent studies have attempted to address the issue of identifying learning styles for personalizing the learning experience. (Li & Abdul, 2018a) proposed tree augmented naive Bayesian method in detecting students' learning style in an online learning environment. The achieved results show a higher detection accuracy when compared to the Bayesian network. (Petchboonmee et al., 2015) implements a decision tree for the learning style prediction based on the Kolb learning style model. (Cha, et al., 2006) employs the use of Decision Trees and Hidden Markov Models for detecting learning styles according to Felder Silverman Learning Style Model FSLSM. (García et al., 2007) performed an experiment by observing the behaviour of learners during an online course to show the effectiveness of Bayesian networks for identifying learning styles based on the behaviour of students. (Hao et al., 2020) proposed a learning style classifier based on the deep belief network DBN for large-scale online education to identify students' learning styles and classify them. (Hao et al., 2020) approach improved the DBN model to identify student's learning style by analysing each individual's learning style features using the improved DBN. (Chang et al., 2009) introduced a method that combines k-nearest neighbour classification with genetic algorithm to classify and then identify students' learning styles. (Graf et al., 2008) developed an automatic approach for identifying learning styles with respect to the Felder-Silverman learning style model by inferring learning styles from learners' behaviour during an online course. This paper presents an automatic approach to detect learning style using K-means clustering. The detection process starts with variable setting for the study based on Felder Silverman learning style model FSLSM (Felder & Silverman, 1988).

Felder Silverman Learning Style Model
Felder-Silverman learning style model FSLSM (Felder & Silverman, 1988) characterize learners by values into four dimensions and each dimension is viewed independently from each other. In this paper, FSLSM is considered for the study because it is a powerful and reliable model for the computer-based analysis of learners' learning styles (Prabhani et al., 2013). The four learning styles dimensions are: i. Procession dimension (active/reflective), ii.
Procession dimension: active learners learn best by working actively with the learning material and by trying things out. Furthermore, active learners tend to learn by communicating with others and prefer to learn by working in large groups. In contrast, reflective learners prefer to think about and reflect FUDMA Journal of Sciences (FJS) ISSN online: 2616-1370ISSN print: 2645-2944Vol. 4 No. 3, September, 2020  376 on the material. They also prefer to work alone or in a small group. ii.
Perception dimension: Learners with a sensing learning style lean towards factual, detailed and concrete learning material. They like to solve practical, real world problems. In contrast, intuitive learners prefer to learn abstract and theoretical learning material. Furthermore, intuitive learners prefer conceptual instances as a preferred source of information. iii. Input dimension: deals with the preferred sensory channel through which environmental inputs are perceived. Visual learners like information presented in a visual form such as pictures, graphs, diagrams and flowcharts. Verbal learners tend to be more interested in material presented in textual form. iv. Understanding dimension: Sequential learners follow a consecutive reasoning process to understand problems and therefore have a linear learning progress. In contrast, global learners tend to learn randomly and in large leaps. Global learners tend to be more interested in learning material overview while sequential learners tend to absorb learning material presented in a linear sequence. (Jain & Dubes, 1988) defines clustering as the process of classifying objects into subsets that have meaning in the context of a particular problem. K-means clustering (Morissette & Chartier, 2013) is an unsupervised learning algorithm that aims to iteratively assign each data point to one of k clusters based on feature similarity. Different metrics such as Euclidean distance, Manhattan distance can be used to calculate this similarity.

K-Means Clustering
Euclidean distance estimates the distance between cluster centre c and data point x where i is the dimension of c or x and k is the total number of dimensions. Euclidean distance is calculated as follows: Given a set of data points { 1 , 2 , … , } ∈ and a set of k cluster centres { 1 , 2,… , } ∈ where is the data space of d dimensions, K-means algorithm aims to find a set of cluster centres that is a solution to minimization problem (Morissette & Chartier, 2013). Table 1 below describes the K-means clustering algorithm (Morissette & Chartier, 2013).

Table 1: K-means Algorithm K-means Algorithm
Step 1: Choose the number of clusters Step 2: Choose the metric to use Step 3: Choose method to pick initial centroids Step 4: Assign initial centroids Step 5: While metric (centroids, data points)>threshold a. For i<= data points Assign data points to closest cluster according to metric b. Recalculate the centroids

METHODOLOGY
This study aims to identify learning styles using unsupervised learning approach. Automatic learning style detection is carried out in this paper to avoid the inconvenience of filling out lengthy learning style questionnaires. The variables for the study are generated based on FSLSM (García et al, 2007;Graf et al., 2008;Prabhani et al., 2013) as shown in Table 2 for procession dimension, Table 3 for perception dimension, Table 4 for input dimension and Table 5 for understanding dimension.     (Graf, 2007), Table 6. describes the different features related to learners learning behaviour used for this study . 378 features of procession dimension and perception dimension respectively. The values indicate the students' behaviour and interaction with learning objects as explained in Table 6.  Prior to implementing K-means algorithm, a dimensionality reduction method principal component analysis PCA is used to reduce the dimension of the datasets. The K-means model then fits to the datasets to generate clusters based on similarity of data instances.

EXPERIMENTAL RESULT AND ANALYSIS
This section analyses the result of the experiment performed using K-means clustering. The main aim of K-means algorithm is to minimize inter cluster sum of squares. Inter cluster sum of squares is given below: where n represents the total number of data points, represents data point, C is the cluster and µ represents mean of cluster data points in a given cluster.
The experiment initially performs Principal component analysis PCA to reduce dimensionality of the dataset. PCA transforms the dataset into two PCA components thereby showing observations in the data that can be grouped using K-means. Prior to running Kmeans the number of clusters is specified as 2, then initial centroids are randomly chosen. After initialization, each data point is assigned to its nearest centroid. Then new centroids are created by taking the mean value of all the data points assigned to each previous centroid. K-means fits PCA components to learn clusters and returns labelled data corresponding to the specified clusters.
The results of K-means clustering for the procession dimension is shown in Figure 1. and the clusters for the perception dimension is shown in Figure 2. To perform result analysis, the K-means clustering method fits a model with k number of clusters within a range of values for k from 1 to 4. 380 Figure 3 below shows inter cluster sum of squares for both procession and perception dimension, the inter cluster sum of squares decreases as number of clusters increases. The clusters are well-separated in the procession dimension whereas for the perception dimension k-means fails to split clusters into welldefined clusters. Table 9 below displays a summary of K-means results.
Figure 3: Inter cluster sum of squares for procession and perception dimension The experiment is run on a Linux based PC with 4 Intel Core i3-5005U processor having a speed of 2.00 GHz and 7.7 GB of RAM. The implementation of principal component analysis PCA and K-means algorithm are carried out using scikit-learn machine-learning library in python (https://scikit-learn.org/stable/index.html).

CONCLUSION
This work evaluated the capability of k-means algorithm to detect pattern and cluster data based on feature similarity. The results obtained confirms differences in learning style for the procession dimension. Although K-means clustering fails to separate data set into well-defined clusters for perception dimension. This clustering algorithm serves as a good metric to monitor learners' behaviour.
The results of the experiment allows Instructors to monitor learning behaviour and to recommend learning material based on learners preference. For future work, the investigation will conducted on a larger data set to validate results. For further experiment, recommender system to predict learning materials to learners will be considered.