https://oldena.lpnu.ua/handle/ntb/45493
Title: | A(n) Assumption in machine learning |
Authors: | Klyushin, Dmitry Lyashko, Sergey Zub, Stanislav |
Affiliation: | Taras Shevchenko National University of Kyiv |
Bibliographic description (Ukraine): | Klyushin D. A(n) Assumption in machine learning / Dmitry Klyushin, Sergey Lyashko, Stanislav Zub // Computational Linguistics and Intelligent Systems. — Lviv : Lviv Politechnic Publishing House, 2019. — Vol 2 : Proceedings of the 3nd International conference, COLINS 2019. Workshop, Kharkiv, Ukraine, April 18-19, 2019. — P. 32–38. — (Paper presentations). |
Bibliographic description (International): | Klyushin D. A(n) Assumption in machine learning / Dmitry Klyushin, Sergey Lyashko, Stanislav Zub // Computational Linguistics and Intelligent Systems. — Lviv Politechnic Publishing House, 2019. — Vol 2 : Proceedings of the 3nd International conference, COLINS 2019. Workshop, Kharkiv, Ukraine, April 18-19, 2019. — P. 32–38. — (Paper presentations). |
Is part of: | Computational Linguistics and Intelligent Systems (2), 2019 |
Journal/Collection: | Computational Linguistics and Intelligent Systems |
Volume: | 2 : Proceedings of the 3nd International conference, COLINS 2019. Workshop, Kharkiv, Ukraine, April 18-19, 2019 |
Issue Date: | 18-Apr-2019 |
Publisher: | Lviv Politechnic Publishing House |
Place of the edition/event: | Lviv |
Keywords: | machine learning sample homogeneity confidence interval order statistics variational series |
Number of pages: | 7 |
Page range: | 32-38 |
Start page: | 32 |
End page: | 38 |
Abstract: | The commonly used statistical tools in machine learning are two-sample tests for verifying hypotheses on homogeneity, for example, for estimation of corpushomogeneity, testing text authorship and so on. Often, they are effective only for sufficiently large sample (n> 100) and have limited application in situations where the size of samples is small (n < 30). To solve the problem for small samples, methods of reproducing samples are often used: jackknife and bootstrap. We propose and investigate a family of homogeneity measures based on A(n) assumption that are effective both for small and large samples. |
URI: | https://ena.lpnu.ua/handle/ntb/45493 |
ISSN: | 2523-4013 |
Copyright owner: | © 2019 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. |
References (International): | 1. Granichin, O., Kizhaeva, N., Shalymov, D., Volkovich, Z.: Writing style determination using the KNNtext model. In: Proceedings of the 2015 IEEE International Symposium on Intelligent Control, pp. 900–905. IEEE, Sydney (2015). 2. Zenkov, A., Sazanova, L.: A New Stylometry Method Basing on the Numerals Statistic. International Journal of Data Science and Technology 3(2), 16-23 (2017). 3. Kilgariff,A.: Comparing corpora. International Journal of Corpus Linguistics 6(1):97–133 (2001). 4. Kilgariff, A.: Language is never, ever, ever, random.Corpus Linguistics and Linguistic Theory, 1(2): 263–276(2005). 5. Eder, M., Piasecki, M.,Walkowiak, T.: An open stylometric system based on multilevel text analysis. Cognitive Studies | Études cognitives, 17 (2017). 6. Eder, M., Rybicki, J., Kestemont, M.: Stylometry with R: a package for computational text analysis. R Journal 8(1): 107–121(2016). 7. Hill, B.: Posterior distribution of percentiles: Bayes’ theorem for sampling from a population. Journal of American Statistical Association 63(322): 677691 (1968). 8. Klyushin, D., Petunin, Yu.: A Nonparametric Test for the Equivalence of Populations Based on a Measure of Proximity of Samples. Ukrainian Mathematical Journal, 55 (2): 181-198(2003). 9. Pires A.: Confidence intervals for a binomial proportion: comparison of methods and software evaluation. In: Klinke, S., Ahrend, P., Richter, L. (eds.) Proceedings of the Conference CompStat 2002, Short Communications and Posters(2002). |
Content type: | Article |
Appears in Collections: | Computational linguistics and intelligent systems. – 2019 р. |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.