Sie sind hier: Startseite Kolloquium Scalable Spatial Anomaly Detection with Coresets

Scalable Spatial Anomaly Detection with Coresets

Speaker: Prof. Jeff M. Phillips (https://www.cs.utah.edu/~jeffp/) Kahlert School of Computing at the University of Utah

Kurzübersicht
Art des Termins
    Wann 29.11.2023
    von 09:00 bis 10:00
    Wo Institute of Computer Science, Friedrich-Hirzebruch-Allee 8, 53115 Bonn, Room no. 0.016 or via zoom
    Teilnehmer He is currently on sabbatical as a Senior Research Fellow at Leipzig University, and visiting ScaDS.AI and the MPI for Math in the Sciences. At Utah he runs the Data Science academic program, including a bachelors degree in Data Science that he led the creation of. He is also the Founder of the Utah Center for Data Science (on sabbatical from directing). He conducts research across data science, including in data mining, machine learning, data management, visualization, and statistics -- and also guided by and in computational geometry. He published an undergraduate textbook "Mathematical Foundations for Data Analysis" (mathfordata.github.io) with Springer-Nature in 2021.
    Termin übernehmen vCal
    iCal
    Research Talk:

    Titel: Scalable Spatial Anomaly Detection with Coresets

     

    Abstract:
    This talk will discuss algorithms for many related tasks that follow from a common approach that bridges big data, computational geometry, and statistics. The prototypical task is that of Spatial Scan Statistics where spatial data (a point set) is given, each with a baseline and measured value (like population count and disease count), and the goal is to find the hotspot region where the measured value most deviates from the baseline.  To avoid overfitting a region, it is typically restricted to a geometric shape (like a disk or rectangle), and then one needs to ``scan’’ all such shapes to find the one that maximizes some statistical discrepancy function.  While scanning *all* shapes may seem infeasible, reductions from computational geometry shows it is computable — but this computation scales poorly.  In this talk we show how the use of coresets (a clever reduction in the size of the data, with approximation guarantees) can be used to make this task practical while retaining statistical power.  Our algorithms are implemented (https://mmath.dev/pyscan/), and can scale to 100 million input data points.  

     

    Moreover, we will discuss how this core framework, and its scalable algorithms using coresets, show up in many other places. With shapes as halfspaces, this maps to the classic linear classification problem.  When the data points are not points, but trajectories, then one can determine and find anomalous regions determined by passing trajectories.  In the active mathematical area of discrepancy minimization, this is how to approximately compute the discrepancy.  If the ranges’s edges are smoothed as kernels, then this relates to measuring kernel density estimates.  And when the ranges are half-closed rectangles, then this generalizes to efficiently computing the multi-dimensional Kolmogorov-Smirnov distance between distributions.  

     
    zoom link:
     
    or Location: Institute of Computer Science, Friedrich-Hirzebruch-Allee 8, 53113 Bonn, Room no. 0.016
     


     

    Artikelaktionen