The Institute of Computing and Cybersystems will present a Distinguished Lecture by James C. Bezdek on Friday, January 29, 2021, at 3:00 p.m. via online meeting. Dr. Bezdek will present his lecture, “Streaming Data Analysis: Old Clothes Don’t Fit.”
Bezdek is a visiting research fellow at The University of Melbourne, Australia. His interests include clustering in big data, woodworking, optimization, data visualization, cigars, fishing, anomaly detection, blues music, poker. He retired in 2007, and will be coming to a university near you soon.
Bezdek received a Ph.D. in Applied Mathematics from Cornell University in 1973. He is past president of NAFIPS (North American Fuzzy Information Processing Society), IFSA (International Fuzzy Systems Association), and the IEEE CIS (Computational Intelligence Society). He is founding editor the international journals Approximate Reasoning and IEEE Transactions on Fuzzy Systems. He is life fellow of the IEEE and IFSA; and a recipient of the IEEE 3rd Millennium award, the IEEE CIS Fuzzy Systems Pioneer award, and the IEEE Rosenblatt and Kampe de Feriet award.
Streaming Data Analysis: Old Clothes Don’t Fit
This talk concerns models and algorithms that are generally described as “streaming clustering.” Some of the semantics and methods that are used in this field are co-opted from static clustering. But often, they don’t serve their purposes for streaming data very well. A review of “state of the art” methods such as sequential k-means, Birch, CluStream, DenStream, etc. shows that methods borrowed from classical batch techniques don’t transfer well to the streaming data case. Most of these models fail to acknowledge that the data are seen but once in real streaming analysis (e.g., intrusion detection, quality control). When the data are not saved, batch clustering ideas such as pre-clustering assessment, partitioning, and cluster validity are not relevant. I do not argue that current approaches to streaming clustering are wrong: but they are described wrong. This class of algorithms comprises transitional methods for an intermediate case that lies between static and (near real time) dynamic analysis which will eventually lead to a new and useful paradigm for this type of computation. I call these methods start and stop streaming data analysis.
Five models are briefly reviewed and illustrated (albeit poorly, with small labeled data sets!). Then I will discuss four new incremental Stream Monitoring Functions and a new approach for visual assessment of streaming data. The conclusions? Useful analysis of real streaming data is in its infancy. We need to carefully define the objectives of streaming analysis, and then choose terminology and methods that suit this evolving paradigm.
Bezdek says his views on this topic are a bit controversial. You can read them here:
Bezdek, J. C. and Keller, J. M. (2021). Streaming data analysis: Clustering or Classification?, IEEE Trans. SMC, DOI: 10.1109/TSMC.2020.3035957