Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Tentative Agenda

  • 1:30 – 1:45: Opening remarks (Arpit Gupta, UCSB) [Paper]
  • 1:45 – 2:45: Session 1 “Going beyond the state-of-the-art: A need for better data and better models” (chair: Arpit Gupta, UCSB)
    • Auto-generating efficient data-plane ML pipelines for self-driving networks (Muhammad Shabhaz, Purdue) [Talk]

      Support for Machine Learning (ML) applications in networking has significantly improved over the last decade. The availability of public datasets and programmable switching fabrics (including low-level languages to program them) presents a full stack to the programmer for realizing sefl-driving networks. However, the diversity of tools involved, coupled with complex optimization tasks of ML model design and hyperparameter tuning while complying with the network constraints (like throughput and latency), puts the onus on the network operator to be an expert in ML, network design, and programmable hardware.

      In this talk, I will present Homunculus, a high-level framework that enables network operators to specify their ML requirements in a declarative rather than imperative way. Homunculus takes the training data and accompanying network and hardware constraints as input and automatically generates and installs a suitable model onto the underlying switching target. It performs model design-space exploration, training, and platform code-generation as compiler stages, leaving network operators to focus on acquiring high-quality network data. Our evaluations of real-world ML applications show that Homunculus’s generated models achieve up to 12% better F1 scores than hand-tuned alternatives while operating within the resource limits of the underlying targets.

    • Unlocking the black box with Trustee: XAI for networking problems (Ronaldo Ferreira, UFMS, Brazil) [Paper, Talk]

      In recent years, machine learning has shown promise in effectively detecting complex patterns in network traffic for various network security and performance problems. However, without understanding how these black-box models are making their decisions, network operators are reluctant to trust and deploy them in their production settings. One key reason for this reluctance is that these models are prone to the problem of underspecification, which refers to determining whether the success of a trained model is due to its ability to encode the essential structure of the underlying system or data or is simply the result of inductive biases the trained model encodes. In this talk, I will present Trustee, an eXplainable-AI (XAI) framework for detecting underspecification issues in learning models for networking problems. Trustee takes a given black-box model and the dataset used to train it as input and outputs a “white-box” model in the form of a high-quality, easy-to-interpret decision tree and an associated trust report. Using published machine-learning models, I will show how practitioners can use Trustee to identify instances of model underspecification.

    • CrystalBox: Future-based explanations for DRL network controllers (Sagar Patel, UC Irvine) [Paper, Talk]

      Lack of explainability is a key factor limiting the practical adoption of high-performance Deep Reinforcement Learning (DRL) controllers. Explainable RL for networking hitherto used salient input features to interpret a controller’s behavior. However, these feature-based solutions do not completely explain the controller’s decision-making process. Often, operators are interested in understanding the impact of a controller’s actions, which feature-based solutions cannot capture. In this talk, we present future-based explanations as an avenue to fill this gap, and present CrystalBox, a practical framework to explain a controller’s behavior in terms of the future impact on key network performance metrics.

    • Making decisions at data plane speeds (Srinivas Narayana, Rutgers) [Paper, Talk]

      We will discuss some principles for autonomous operation of networks from our prior work on high-speed switch monitoring, end-to-end congestion control, and control loops operating at middleboxes

    • Discussion
  • 2:45 – 3:00: Coffee break
  • 3:00 – 4:15: Session 2 “Challenges and opportunities” (chair: Ram Durairajan, Univ. Oregon)
    • Tackling deployability challenges in ML-powered networks (Noga Rotman, Hebrew Univ. of Jerusalem, Israel) [Paper, Talk]

      Over the past decade, Machine Learning (ML) has made remarkable progress in various fields such as Computer Vision and Natural Language Processing. Its implementation in communication networks, however, has faced significant obstacles.

      In this talk, I will be highlighting the specific challenges encountered when deploying ML-based networking solutions in real-world scenarios, and review potential approaches to overcome them.

    • Designing traffic monitoring systems for self-driving networks (Chris Misa, Univ. Oregon) [Paper, Talk]

      Traffic monitoring is a critical component of self-driving networks. In particular, any system that seeks to automatically manage a network’s operation must first be equipped with insights about traffic currently flowing through the network. Typically, dedicated traffic monitoring systems deliver such insights in the form of traffic features to high-level human or automated decision makers. Inspired by the exciting capabilities of programmable dataplanes and the persistent challenges of network management, the research community has focused on improving the flexibility and efficiency of traffic monitoring systems for a variety of management tasks. However, a significant gap remains between the traffic monitoring requirements of practical, deployable self-driving networks and the capabilities of current state-of-the-art systems. This talk will provide a brief background of traffic monitoring systems, discuss how their claims and limitations relate to requirements of self-driving networks, and propose several open challenges as exciting starting points for future research. Addressing these challenges requires large-scale efforts in traffic monitoring techniques and self-driving network design, as well as enhanced dialog between researchers in both domains.

    • Towards trustworthy telemetry and QoE measurements (Alan Liu, Boston Univ.) [Paper, Talk]

      With the growing diversity and heterogeneity of the Internet, it is critical to ensure both high performance and high availability of the networks underlying it. Emerging networked applications, such as cloud gaming or augmented reality, are expected to further stress both control systems and network monitoring by requiring real-time responses to rapid changes in traffic workloads in a self-driving manner. Such self-driving network control requires timely, accurate, and trusted information about ongoing activities in the network. This talk discusses the problem of designing trustworthy telemetry and QoE measurements in cloud and edge scenarios and chart paths to its future applications.

    • Perception-driven optimization: A new frontier of data-driven networking (Junchen Jiang, Univ. Chicago) [Paper, Talk]

      Service providers struggle to catch up with the rapid growth in bandwidth and latency demand of Internet videos and other applications. An essential contributor to this resource contention is the assumption that users are equally sensitive to service quality everywhere, so any low quality incidents must be avoided. However, this assumption is not true. For example, our work and other parallel efforts have shown that more video users can be served with better quality of experience (QoE) if we embrace the fact that the QoE’s sensitivity to video quality varies greatly with the video content. To unleash such benefits, the application systems must be driven by not only system measurement data but also user feedback data that capture user’s perception of service quality. In this short talk, I will highlight some of our recent efforts towards efficient collection of user feedback and enabling perception-driven optimization for Internet applications.

    • Discussion
  • 4:15 – 5:15: Session 3 “Learning from the past to succeed in the future” (chair: Walter Willinger, NIKSUN)
    • Using the COSMOS testbed for measurement-based wireless, optical, edge-cloud, and smart cities research (Gil Zussman, Columbia Univ.) [Talk]

      This talk will describe the beyond-5G COSMOS testbed (www.cosmos-lab.org), that is being deployed in West Harlem (New York City) as part of the NSF Platforms for Advanced Wireless Research (PAWR) program, and briefly review various ongoing experiments in the areas of wireless, optical, edge cloud, and smart cities. COSMOS targets the technology “sweet spot” of ultra-high bandwidth and ultra-low latency, a capability that will enable a broad new class of applications including augmented/virtual reality and cloud-based autonomous vehicles. Naturally, research conducted on the testbed has to take into account real-world constraints and is based on real-world measurements (that in several cases are shared with the community). As such, it provides new insights into the development of algorithms in different layers of the networking protocol stack. We will briefly review measurement-driven research efforts in mmWave wireless, full-duplex wireless, edge-cloud networking for smart city applications, and dynamic optical networking and sensing. We will aim to highlight the potential use of the platform by the community for research in the area of self-driving networks. The COSMOS testbed design and deployment is joint work with the COSMOS team (www.cosmos-lab.org).

    • Tackling data silos with synthetic data (Giulia Fanti, CMU) [Talk]

      Organizations are often unable to share network traces due to regulatory, business, and privacy concerns. The resulting data silos seriously inhibit the development, tuning, testing, and auditing of network algorithms. In this talk, I will discuss the promise and challenges of using synthetic data from deep generative models to share network traces across institutional boundaries. We study key challenges related to the fidelity, privacy, and interpretability of the synthetic data. Doing so involves both system design and addressing fundamental learning challenges for deep generative models. Ultimately, we demonstrate NetShare, a synthetic data generator for network packet header traces; NetShare matches microbenchmark distributions in real data 40% better than baselines, while also enabling synthetic data users to train models for downstream tasks. At the same time, we show that popular approaches for training privacy-preserving models (e.g., differentially-private optimization, pre-training on public data) are ill-suited to our application domain, and highlight the need for new privacy tools.

    • netUnicorn: A data-collection platform to develop generalizable ML models for network problems (Roman Beltiukov, UCSB) [Paper, Talk]

      The remarkable success of machine learning-based solutions for network problems has been impeded by the developed ML models’ inability to maintain efficacy when used in different network environments exhibiting different network behaviors. This issue is commonly referred to as the generalizability problem of ML models. The general ML community has recognized the critical role that training datasets play in this context and has developed various techniques to improve dataset curation to overcome this problem. Unfortunately, these methods are generally ill-suited or even counterproductive in the network domain, where they often result in unrealistic or poor-quality datasets.

      To address this issue, we propose an augmented ML pipeline that leverages explainable ML tools to guide the network data collection in an iterative fashion. To ensure the data’s realism and quality, we require that the new datasets should be endogenously collected in this iterative process, thus advocating for a gradual removal of data-related problems to improve model generalizability. To realize this capability, we develop a data-collection platform, netUnicorn, that takes inspiration from the classic ‘hourglass’ model and is implemented as its ‘thin waist’ to simplify data collection for different learning problems from diverse network environments. The proposed system decouples data-collection intents from the deployment mechanisms and disaggregates these high-level intents into smaller reusable, self-contained tasks. We demonstrate how netUnicorn simplifies collecting data for different learning problems from multiple network environments and how the proposed iterative data collection improves a model’s generalizability.

    • Discussion
  • 5:15 – 5:30: Closing remarks and wrap-up

Note: Each session will feature a number of consecutive lightning talks (10 minutes) that will be followed by a discussion period (open to all workshop participants).