Show simple item record

dc.contributor.authorWambugu, Geoffrey M
dc.contributor.authorOnyango, George
dc.contributor.authorKimani, Stephen
dc.date.accessioned2018-07-23T07:03:26Z
dc.date.available2018-07-23T07:03:26Z
dc.date.issued2018
dc.identifier.urihttp://hdl.handle.net/123456789/3596
dc.description.abstractOne challenging issue in application of Latent Dirichlet Allocation (LDA) is to select the optimal number of topics which must depend on both the corpus itself and user modeling goals. This paper presents a topic selection method which models the minimum perplexity against number of topics for any given dataset. The research set up scikit-learn and graphlab on jupyter notebook in the google cloud compute engine’s custom machine and then developed python code to manipulate selected existing datasets. Results indicate that the graph of perplexity against number of topics (K) has a strong quadratic behaviour around a turning point and opens upwards. This means that the graph has a minimum perplexity point that optimizes K. The paper presents a model of the optimum K in an identified interval and guides the calculation of this value of K within three iterations using quadratic approach and differential calculus. This improves inferential speed of number of topics and hyper parameter alpha thereby enhancing LDA application in big data.en_US
dc.language.isoenen_US
dc.publisherInternational Journal of Advancements in Computing Technologyen_US
dc.subjectLatent Dirichlet Allocation, Topic Modeling, Topics, Parametersen_US
dc.titleQuadratic Approach for Fast Topic Selection in Modelling Big Text Analyticsen_US
dc.typeArticleen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record