dc.contributor.author | Wambugu, Geoffrey M | |
dc.contributor.author | Onyango, George | |
dc.contributor.author | Kimani, Stephen | |
dc.date.accessioned | 2018-07-23T07:03:26Z | |
dc.date.available | 2018-07-23T07:03:26Z | |
dc.date.issued | 2018 | |
dc.identifier.uri | http://hdl.handle.net/123456789/3596 | |
dc.description.abstract | One challenging issue in application of Latent Dirichlet Allocation (LDA) is to select the optimal number of
topics which must depend on both the corpus itself and user modeling goals. This paper presents a topic
selection method which models the minimum perplexity against number of topics for any given dataset. The
research set up scikit-learn and graphlab on jupyter notebook in the google cloud compute engine’s custom
machine and then developed python code to manipulate selected existing datasets. Results indicate that the
graph of perplexity against number of topics (K) has a strong quadratic behaviour around a turning point and
opens upwards. This means that the graph has a minimum perplexity point that optimizes K. The paper presents
a model of the optimum K in an identified interval and guides the calculation of this value of K within three
iterations using quadratic approach and differential calculus. This improves inferential speed of number of
topics and hyper parameter alpha thereby enhancing LDA application in big data. | en_US |
dc.language.iso | en | en_US |
dc.publisher | International Journal of Advancements in Computing Technology | en_US |
dc.subject | Latent Dirichlet Allocation, Topic Modeling, Topics, Parameters | en_US |
dc.title | Quadratic Approach for Fast Topic Selection in Modelling Big Text Analytics | en_US |
dc.type | Article | en_US |