SEElab | research

Current Research

Intelligent Memory Systems

System and Memory Architectures for Deep Analytics

AI Architectures and Optimization Techniques

Fully Homomorphic Encryption

Hyperdimensional Computing (HDC)

Internet of Things Security via Intrusion Detection

Large Language Models (LLMs)

Time-series Analysis

Bioinformatics Acceleration

Past Research

Internet Of Things

Trajectories for Persistent Monitoring
The Internet of Things, Smart Cities, and Wireless Healthcare
Internet of Things Applications to Smart Grid and Green Energy
Calibration Models for Environmental Monitoring
SensorRocks
SHiMmer

Energy-efficient Design and Management

Event-driven Power Management
Energy-efficient software design
Energy-efficient wireless communication
IoT System Characterization and Management
Energy Efficient Ad-hoc Wireless Networks Routing and Scheduling

Efficient Hardware Design

Approximate Computing

Current Research

Intelligent Memory Systems

Processing In Memory
In- and near-memory computing is an emerging paradigm where computation is performed close to or within the memory itself, reducing the data movement bottleneck and improving efficiency for data-intensive tasks. It demonstrated great potential in a broad range of applications and research opportunities exist in big data analysis and system integration.

1. Bioinformatics: Bioinformatics workloads, such as genome sequencing and proteomic analysis, require processing vast datasets characterized by high complexity and volume. We design specialized accelerators to enhance computational efficiency by integrating high-bandwidth memory with specialized compute units.

2. Machine learning: Many machine learning (ML) applications rely on compute-intense matrix-matrix multiplications that can naturally be accelerated using memory systems leveraging emerging memory technologies. Our team is studying how to co-design ML applications and biomedical applications leveraging hyperdimensional computing (HDC) with heterogeneous in-memory architectures, thereby facilitating faster and more scalable bioinformatics computations.

3. Compiler: Our team is developing a novel compiler technique for AI/ML workloads on processing-in-memory (PIM) architectures. In particular, we propose a brand-new problem formulation for workload mapping of emerging AI/ML programs based on mixed-integer linear programming (MILP). We integrate this model into the MLIR toolchain for fast end-to-end efficient deployment on PIM architectures.

4. Fast backend design: While significant amount of previous works have developed and implemented efficient PIM architectures, there have also been works on providing fast and accurate PIM architecture simulators for PIM architecture design space exploration (DSE). To further complement existing tools for PIM design, we are developing a novel and fast PIM backend DSE framework. The framework utilizes large language models (LLM) to generate register-transfer level (RTL) code for the backend flow and a hierarchical place and route (P&R) flow for extremely fast backend exploration.

System and Memory Architectures for Deep Analytics

CXL Systems
There is a lack of sophisticated simulators for CXL (Compute Express Link) heterogeneous systems in the literature. Our lab is conducting the very first comprehensive performance profiling, encompassing memory instructions, accelerators, and system functions, to uncover key differences between CXL systems and traditional memory architectures. Our goal is to offer programmers actionable insights and guidelines for optimizing the performance of CXL heterogeneous systems. Additionally, we are developing an in-house profiling framework to analyze the microarchitecture and latency of CXL devices. Based on real-world performance data and detailed microarchitectural analysis, we aim to develop an advanced heterogeneous CXL system simulator that will aid further research and system design.

Intelligent Scheduling and Memory Management
Our research also focuses on techniques such as intelligent scheduling, dynamic memory migration, and adaptive memory allocation, which are critical for ensuring optimal performance across varying workload demands.

MRAM Technology
Magnetoresistive Random Access Memory (MRAM) is emerging as a strong candidate in the field of non-volatile memory. MRAM stores bits using Magnetic Tunneling Junction (MTJ) cells, offering scalability that allows for optimization in write speed, power consumption, and endurance. This adaptability makes MRAM suitable for a range of applications, from embedded memory to stand-alone solutions. Our team is exploring how MRAM technology can enhance efficiency and performance, particularly in AI-based applications where MRAM’s unique characteristics could offer significant advantages.

AI Architectures and Optimization Techniques

Near-Memory Architecture for Transformer Acceleration
We are developing novel architectures to enable token pruning in memory and scheduling techniques to enable efficient inference of token-pruning transformers.

LLM Fine-Tuning and Co-Optimization
We propose a heterogeneous system for large language model (LLM) fine-tuning, using near-memory accelerators on CXL memory to accelerate critical operators on transformer inference. Additionally, we design algorithm-system co-optimizations to enable higher throughput LLM fine-tuning in GPU-constrained systems. This work integrates well with our broader efforts in multi-objective software-hardware co-optimization.

Multi-objective co-design
The need for multi-objective software-hardware co-optimization arises from the complex trade-offs between performance, power, and area in the design of ML accelerator systems. We are developing efficient multi-objective Bayesian optimization techniques for these accelerators to find the Pareto set for accuracy and power-performance-area (PPA) metrics. This co-optimization is crucial for balancing the competing demands of performance and efficiency in heterogeneous systems and advanced memory architectures.

Fully Homomorphic Encryption (FHE)

Privacy-Preserving Computation
Fully homomorphic encryption (FHE) is an encryption technique that allows computation on encrypted data. As a promising post-quantum cryptography technique, FHE can be utilized for many privacy-preserving applications, including machine learning. However, the size of encrypted data and the computational overhead of FHE are orders of magnitude larger compared to cleartext. Our work focuses on developing algorithmic solutions and hardware acceleration to address the overhead of FHE-based applications.

Algorithmic and Hardware Optimization
We design and find lightweight algorithms and frameworks that help mitigate the overhead for FHE. For example, hyperdimensional computing can be leveraged for privacy-preserving machine learning or federated learning as a more efficient alternative to costly deep neural network algorithms. Additionally, we optimize these algorithms to minimize costly FHE operations like bootstrapping and number theoretic transforms (NTTs). Also, we suggest hardware accelerators for FHE schemes, making FHE-based applications feasible. Notably, we proposed a unified FHE accelerator (UFC) to support the SIMD and logic schemes while considering scheme switching. UFC is the first FHE accelerator that supports multiple schemes. Additionally, memory is the most significant bottleneck in FHE applications that suffer from an explosion of data and computation after encryption. Our lab has proposed several accelerators based on emerging processing in-memory technologies, achieving significant improvements in performance, energy efficiency, and area efficiency over conventional architectures.

Hyperdimensional Computing (HDC)

Ensemble HDC
Ensemble learning, a technique that trains multiple classifiers and then combines their prediction during the inference stage, emerged as a promising approach. However, research on ensemble learning specifically targeting hyperdimensional computing (HDC) remains limited. Our research focuses on developing ensemble-based HDC models that improve accuracy while minimizing memory and computational overhead. We leverage key characteristics of HDC and ensemble learning to transform HDC ensembles into single-instance implementations, eliminating overhead during inference.

Federated HDC
Unsupervised Federated Learning (UFL) has emerged as a promising decentralized machine learning approach, prioritizing privacy by minimizing the reliance on extensive data labeling. Despite its potential, UFL encounters several significant challenges including: (1) the issue of non- independent and identically distributed (non-iid) data across devices, (2) high computational and communication costs at the edge, and (3) susceptibility to communication noise. Traditional UFL methods often depend on deep neural networks (NN), leading to considerable computational and communication overhead. In response, we are developing UFL frameworks grounded in Hyperdimensional Computing (HDC), a brain-inspired computing paradigm characterized by its lightweight training and inference capabilities, compact model sizes, and inherent robustness against communication noise.

Glucose HDC
Continuous glucose monitoring (CGM) applications face three key challenges for individuals with diabetes: 1) accurate, noise-resilient glucose level (GL) predictions to prevent incorrect insulin dosing, 2) online model personalization to address biological diversity among patients, and 3) lightweight models to accommodate low-power wearables. Current methods often rely on computationally intensive deep learning models that do not meet these requirements. To address these issues, our lab developed GlucoseHD, a hyperdimensional computing (HDC)-based approach that delivers accurate GL predictions within limited computing and memory budgets. Utilizing a novel HDC regressor, GlucoseHD achieves state-of-the-art accuracy while minimizing resource use and enabling model personalization for improved patient-specific predictions.

LiDAR HDC
LiDAR segmentation plays a vital role in autonomous vehicle applications, yet existing frameworks demand extensive computational resources, especially when retraining for new environments—making them impractical for on-device learning. Our research leverages the rapid convergence properties of HDC to develop a lightweight, adaptable model for segmentation tasks. This approach enables quick adaptation to new datasets using only a limited number of samples and labels, optimized for systems with constrained resources and low power consumption.

Lifelong HDC
On-device learning is increasingly important for overcoming the slow response times and high communication costs of cloud-based systems. To enhance this capability, we introduce LifeHD, the first lifelong learning system tailored for IoT applications with limited supervision. LifeHD is built on Hyperdimensional Computing (HDC), enabling efficient learning in dynamic environments. Our design utilizes a two-tier associative memory structure to manage high-dimensional, low-precision vectors representing historical patterns as cluster centroids. We also propose two variants of LifeHD to address challenges with scarce labeled data and power constraints. Implemented on off-the-shelf edge platforms, LifeHD has undergone extensive evaluation across three scenarios, demonstrating its effectiveness in resource-constrained environments.

Mathematical Foundations of HDC
Hyperdimensional computing has a rigorous mathematical backing and is closely related to several influential areas in statistics and machine learning. As an effort to expand the capabilities of HD computing, the SEE Lab team is also working towards theoretically founded and practical machine learning approaches. Through the lens of kernel method and density estimation, we are developing novel classification and clustering algorithms based on HD representation, which facilitates future generations of efficiency learning applications and neuromorphic hardware design.

Secure HDC
The vulnerability of HDC to adversarial attacks poses significant risks to the reliability and safety of these systems, especially in safety-critical applications. Adversarial attacks aim to exploit weaknesses in the decision-making processes of these models, often by introducing imperceptible perturbations to input data, resulting in inaccurate outputs. Our research focuses on analyzing the susceptibility of various phases of the HDC algorithm to such errors and proposes effective defense mechanisms to address them. We introduce HDC-specific attacks, demonstrating that they are more effective at fooling HDC models than state-of-the-art attacks. In response, we propose an adaptive adversarial training defense framework, A2HD, designed to protect HD-based intrusion detection against these adversarial threats. A2HD identifies the most effective adversarial attack and incorporates these attack samples into our adaptive training process. Our results show the effectiveness of the proposed defense, improving HD prediction performance over the state-of-the-art adversarial training defense.

Internet of Things Security via Intrusion Detection

Continual Intrusion Detection
Internet of Things (IoT) security poses significant challenges due to increased connectivity and large-scale networks. Intrusion detection systems (IDS) are vital for monitoring system data and alerting to suspicious activities. While machine learning (ML) offers promising solutions for IDS with high accuracy, many ML-IDS overlook two key issues: evolving data streams and a lack of attack labels. As streaming network traffic and cyber-attacks change, the performance of deployed ML models can degrade. To address these challenges, we propose CND-IDS, a continual novelty detection IDS framework that includes (i) a learning-based feature extractor for continuously updating feature representations and (ii) a novelty detector that identifies new cyber-attacks using principal component analysis (PCA) reconstruction.

Secure Intrusion Detection
While machine learning (ML) offers a promising solution for IDS, its vulnerability to adversarial attacks raises concerns about the reliability of these systems. To address these issues, we present a rigorous evaluation framework designed to assess the performance of ML- based IDS against various adversarial attacks in IoT environments. This framework incorporates a wide range of techniques, including white-box, gray-box, and black-box attacks, across four realistic and recent IoT intrusion datasets. In response to this vulnerability, we propose a Robust Layered Defense (ROLDEF) against adversarial attacks. Our approach utilizes a denoising autoencoder (DAE) to detect and eliminate adversarial components from incoming data before passing the purified input to the machine learning model. Testing on realistic IoT intrusion datasets shows that ROLDEF outperforms state-of-the-art defenses. Importantly, this defense is compatible with any underlying ML model, enhancing protection against adversarial threats in IoT systems.

Self-supervised Intrusion Detection
State-of-the-art ML-IDS solutions still rely on labeled attack data to perform well. Self-Supervised Learning (SSL) can identify patterns and anomalies by training on only normal data points with minimal to no supervision. Building on the success of Masked Autoencoders (MAEs) in the image domain, we propose a novel methodology that harnesses their capabilities to extract meaningful features from network intrusion data. By leveraging MAEs, we can generate rich feature representations that enhance the performance of machine learning models in intrusion detection tasks. Our experimental results demonstrate that these regenerated features significantly enhance the classification performance of both traditional and state-of-the-art ML models.

Large Language Models (LLMs)

Efficient LLMs
Despite the high-quality text generation achieved by large language models (LLMs), their growing sizes introduce significant latency challenges. Our research addresses these issues through innovative decoding methods, model and KV cache compression techniques, and system- and hardware- algorithm co-design, all aimed at accelerating LLM inference. These approaches enable faster inference across a wide range of applications, from large-scale server deployments to consumer devices. Additionally, we focus on co-designing LLM serving systems and emerging applications to enhance efficiency, particularly in long context serving, program-centric engine design, and scheduling and memory management.

Sensor Question Answering
Natural language interaction with sensing systems is essential for helping users understand sensor data in their daily lives. However, existing interfaces are limited in their question range and data handling capabilities, requiring substantial resources for training LLMs, making them impractical for edge devices. To address these challenges, we introduce SensorChat, the first edge system enabling real-time multimodal conversations with users. Utilizing a three-stage process—question decomposition, sensor data querying, and answer assembly—SensorChat leverages LLMs for natural interactions while ensuring accurate data extraction. This innovative approach enhances the efficiency of natural language processing on edge devices, improving user experience and enabling practical applications in real-world scenarios.

Time-series Analysis

Anomaly Detection
Time-series anomaly detection (TAD) is critical across various domains, including manufacturing and healthcare. However, many machine learning solutions are benchmarked with misleading evaluation metrics, which obstruct reliable analysis and the development of robust detection methods. We introduce a comprehensive evaluation framework for TAD, encompassing state-of-the-art deep learning (DL) and traditional machine learning (ML) algorithms, relevant baselines, and diverse scoring, thresholding, and evaluation functions. Our analysis reveals that: (i) simple ML algorithms and TAD baselines can perform comparably to advanced DL solutions; (ii) the choice of scoring and thresholding functions significantly affects prediction performance; and (iii) common evaluation metrics focused on post-thresholding outputs can lead to overestimated predictions.

Out-of-Distribution Detection
Time-series out-of-distribution (OOD) detection is essential for maintaining the reliability of machine learning models in dynamic environments. As data sources diversify, models may encounter inputs that significantly deviate from their training distribution, leading to inaccurate predictions and potential safety risks, especially in critical applications like healthcare and finance. Effective OOD detection enables systems to identify and appropriately handle anomalous data, preserving performance and safety. Our approach involves employing advanced techniques such as feature extraction and dimensionality reduction, enabling us to capture the underlying data distribution and identify anomalies. Through rigorous evaluation, we aim to improve the accuracy of OOD detection, ensuring that models can adapt to novel or unexpected inputs without compromising performance.

Bioinformatics Acceleration

Traditional biomedical research tends to focus on analyzing a single type of data at a time. However, diseases such as cancer, diabetes, and neurodegenerative disorders are driven by intricate interactions between multiple biological factors. The future of medicine lies in the rapid analysis of multi-omics data encompassing genomics, transcriptomics, proteomics, and metabolomics—enabling a deeper understanding of these complex diseases. Outpacing Moore’s law, genomic data is doubling every seven months and is expected to surpass YouTube and Twitter by 2025. Similarly, proteomic and metablomic data hosted at UCSD MassiVE database is nearing 700TB and continues to expand daily. Starting with sequencing and alignment, bioinformatics runs a variety of algorithms on big data, from variant calling to classification. Memory/storage and computation requirements of these applications extend from hundreds of CPU hours and gigabytes of memory to millions of CPU hours and petabytes of storage. This tremendous amount of data entails redesigning the entire system stack with the goal of superior architectural solutions for memory/storage systems, e.g., intelligent use of HBM granted by advances in hardware, as well as significantly faster computation platforms for an expedited decision to enable clinical use of technology in real-time, e.g., leveraging emerging memory device such as ReRAM, Phase Change Memory, 3D NAND to achive massive parallelism and superior energy efficiency for full-stack Mass Spectrometry clustering and database search. e.g., to shrink precision microbiome from its three- month per individual to a few hours through hardware-software co-design that supports and optimizes for data-intensive processing. To achieve that, our team jointly put together our experience on microbiome algorithms and innovative hardware design, including PIM, FPGAs, GPUs, and near-data accelerators to develop a full-stack infrastructure, which accelerates the bioinformatics applications to novel hardware in an end-to-end manner. Furthermore, we rethink the design and implementation of novel algorithmic alternatives driven by the new hardware infrastructure, e.g., mapping applications on hyperdimensional computing paradigm which can maximize parallelism and error tolerance, e.g., utilizing state-of-the-art AI models for complex multi-omics data integration.

Case Study: Sequence Alignment using PIM
The global sequence alignment can be formulated as finding the optimal edit operations, including deletion, insertion, and substituting of the base pairs required to transform sequence x to sequence y. The search space for evaluating all possible alignments is exponentially proportional to the length of the sequences and becomes computationally intractable. The algorithmic advances, e.g., Needleman-Wunsch, reduced the search space down to quadratic, but the limited computing resources of the CPU severely restrict the achievable performance. As such, hardware accelerators for alignment have been proposed, but they suffer from limited on- chip memory, costly data movement, and poorly optimized alignment algorithms. We proposed a ReRAM-based PIM accelerator called RAPIDx, which maximizes the performance and its efficiency via software-hardware co-design. The proposed RAPIDx is reconfigurable to serve as a co-processor integrated into the existing genome analysis pipeline to boost sequence alignment or edit distance calculation. Our solution achieves one or two of magnitude speedup over the state-of-the-art CPU and GPU libraries.

Case Study: De novo Assembly using de Bruijn Graphs (DBGs)
DBGs are the core of the so-called de novo assembly that pieces short DNA reads together to construct the original genome. Processing DBGs is extremely challenging because of the massive graph size. Algorithms on DBG require an excessive amount of linked list traversals, pointer chasing, and hash-table lookups, which are inefficient on compute-centric systems. In this project, we seek a novel architecture that utilizes emerging in-storage and in-memory processing technologies to accelerate DBG assemblers to tackle the challenges of parallelism and memory throughput. As the initial milestones, we investigate state-of-the-art DBG-based methods and recognize critical operations in three processing phases: graph construction, graph cleaning, and sequence assembly. Next, we design a software-hardware solution to effectively utilize processing capability in different system hierarchies. We aim to split and distribute the DBG over multiple memory vaults for algorithmic parallelism, where each sub-graph further enjoys accelerated pruning and traversing operations.

Case Study: Mass spectrometry-based proteomics analysis pipeline acceleration
Mass spectrometry (MS) has been a key to proteomics and metabolomics due to its unique ability to identify and analyze protein structures. In a typical MS experiment, the equipment generates a massive amount of spectra data. Spectra analysis pipeline consists of three major blocks: preprocessing, clustering, and searching. Our profiling results of existing tools shows that the data movement and lack of parallelism is the key bottleneck. Our team developed a near- storage computing solution to mitigate data movement bottlenecks during preprocessing. For clustering and searching, we redesign an algorithm to maximize the parallelism for end-to-end runtime. Furthermore, we devised a novel PIM hardware tailored for the proposed algorithm and a scheduling algorithm to maximize the parallelism.

Past Research

Trajectories for Persistent Monitoring

Traditionally, environmental phenomena have been measured using stationary sensors configured into wireless sensor networks or through participatory sensing by user-carried devices. Since the phenomena are typically highly correlated in time and space, each reading from a stationary sensor is less informative than that of a similarly capable sensor on the move. User-carried sensors can take more informative readings, but we have no control over where the sensors travel.

Our work in Trajectories for Persistent Monitoring helps to close this gap by optimizing the path that robotic platforms travel to maximize the information gained from data samples. Multi-objective goals are formed using information gain and additional goals, such as system responsiveness to dynamic points of interest, multi-sensor fusion, and information transfer using cognitive radios. The resulting robots can adapt to dynamic environments to rapidly detect evolving wildfires, support first responders in emergency situations, and collect information to improve air quality models for a region.

Approximate Computing

Today’s computing systems are designed to deliver only exact solutions at a high energy cost, while many of the algorithms that are run on data are at their heart statistical, and thus do not require exact answers. However, the solutions to date are isolated to only a few of the components in the system stack. The real challenge arises when developers want to employ approximation across multiple layers simultaneously. Much of the potential gains are not realized since there are no system-level solutions.
We develop novel architectures with software and hardware support for approximate computing. Hardware components are enhanced with the ability to dynamically adapt approximation at a quantifiable and controllable cost in terms of accuracy. Software services complement hardware to ensure the user’s perception is not compromised while maximizing the energy savings due to approximations. The changes to hardware design include approximation-enabled CPU and GPU. GPU is enhanced with a small associative memory placed close to each stream core. The main idea of the approximation is, instead of accurate computing on existing processing units, to return pre-computed results from the associative memory, not only for perfect matches of operands but also for inexact matches. CPUs are designed with a set of small size associative memories next to each core. We also presented the nearest distance associative memory, which returns the precomputed result with the nearest distance to the input words on each search. The data in associative memory can replace normal CPU execution when a match is close enough. Such inexact matching is subject to a threshold that is set by the software layer.
Approximate computing solutions provide up to 10x improvements to both acceleration and energy-delay-product for various applications including image processing, CUDA benchmarks, and several machine learning algorithms, while incurring acceptable errors. Further, as the maximum allowable error per operation is increased, the performance also increases. For some applications like k-means, 91% of the operations can be approximated using our solution, resulting in only 1.1% classification error compared to one run on exact hardware only. hardware implementation and high computation energy cost are the main bottlenecks of machine learning algorithms in big data domain. We search for alternative architectures to address the computing cost and memory movement issues of traditional cores.

The Internet of Things, Smart Cities, and Wireless Healthcare

In an increasingly informed world, generating and processing information encompasses several computing domains from embedded systems in smart appliances to datacenters powering the cloud. We have worked on efficient distributed data collection and aggregation for processing the data in a hierarchical, context-focused manner. By using hierarchical processing, systems can distill relevant information, increase privacy, and optimize communication energy for Smart Cities, Data Centers, and distributed Smart Grid and Healthcare applications.

Calibration Models for Environmental Monitoring

Sensor nodes at the edge of the Internet of Things often require sensor-specific calibration functions that relate input features to a phenomenon of interest. For example: in air quality sensing, the calibration function transforms input data from onboard sensors to target pollutant concentrations, and for application power prediction, internal performance metrics can be used to predict device power. Edge devices are typically resource constrained, meaning that traditional machine learning models are difficult to fit into the available storage and on-device training can strain available processing capabilities. We seek novel methods of reducing the complexity of training machine learning models on the edge by efficiently reducing training datasets, focusing calibration efforts into important regions using application-specific loss functions, and improving regression methods for resource-constrained devices.

The Internet of Things with Applications to Smart Grid and Green Energy

The emergence of the Internet of Things has resulted in an abundance of data that can help researchers better understand their surroundings and create effective and automated actuation solutions. Our research efforts on this topic target several problems: (1) renewable energy integration and smart grid pricing in large scale systems, (2) individual load energy reduction and automation and (3) improved predictions mechanisms for context-aware energy management that leverage user activity modeling.
We have designed and implemented multiple tools that span from individual device predictors to a comprehensive representation of this vast environment.

Wireless Healthcare

With the proliferation of personal mobile computing via mobile phones and the advent of cheap, small sensors, we propose that a new kind of "citizen infrastructure", can be made pervasive at low cost and high value. Though challenges abound in mobile power management, data security, privacy, inference with commodity sensors, and "polite" user notification, the overriding challenge lies in the integration of the parts into a seamless yet modular whole that can make the most of each piece of the solution at every point in time through dynamic adaptation. Using existing integration methodologies would cause components to hide essential information from each other, limiting optimization possibilities. Emphasizing seamlessness and information sharing, on the other hand, would result in a monolithic solution that could not be modularly configured, adapted, maintained, or upgraded.

IoT System Characterization and Management: from Data Centers to Smart Devices and Sensors

The Internet of Things is a growing network of heterogeneous devices, combining commercial, industrial, residential and cloud-fog computing domains. These devices range from low-power sensors with limited capabilities to multi-core platforms on the high-end. The IoT systems creates both new opportunities and challenges in several different domains. The abundance of data helps researchers to better understand their surroundings and create automated solutions which effectively model and manage diverse constrained resources in IoT devices and networks, including power, performance, thermal, reliability and variability. SEELab's research efforts on this topic target to solve these problems, including renewable energy integration in large scale systems, individual load energy reduction and automation, energy storage, context-aware energy management for smart devices, user activity modeling, smart grid pricing and load integration. To solve these problems, we design and implement multiple tools that not only model and analyze smaller individual pieces but also create a comprehensive representation of this vast environment.

SensorRocks

Long-term research requiring high-resolution sensor data need platforms large enough to house solar panels and batteries. Leveraging a well-defined sensor appliance created using Sensor-Rocks, we develop novel context-aware power management algorithms to maximize network lifetime and provide unprecedented capability on miniaturized platforms.

Energy Efficient Routing and Scheduling For Ad-Hoc Wireless Networks

In large-scale ad hoc wireless networks, data delivery is complicated by the lack of network infrastructure and limited energy resources. We propose a novel scheduling and routing strategy for ad hoc wireless networks which achieves up to 60% power savings while delivering data efficiently. We test our ideas on a heterogeneous wireless sensor network deployed in southern California - HPWREN.

SHiMmer

SHiMmer is a wireless platform that combines active sensing and localized processing with energy harvesting to provide long-lived structural health monitoring. Unlike other sensor networks that periodically monitor a structure and route information to a base station, our device acquires data and processes it locally before communicating with an external device, such as a remote controlled helicopter.

Event-driven Power Management

Power management (PM) algorithms aim at reducing energy consumption at the system-level by selectively placing components into low-power states. Formerly, two classes of heuristic algorithms have been proposed for power management: timeout and predictive. Later, a category of algorithms based on stochastic control was proposed for power management. These algorithms guarantee optimal results as long as the system that is power managed can be modeled well with exponential distributions. Another advantage is that they can meet performance constraints, something that is not possible with heuristics. We show that there is a large mismatch between measurements and simulation results if the exponential distribution is used to model all user request arrivals. We develop two new approaches that better model system behavior for general user request distributions. These approaches are event driven and give optimal results verified by measurements. The first approach is based on renewal theory. This model assumes that the decision to transition to low power state can be made in only one state. Another method we developed is based on the Time-Indexed Semi-Markov Decision Process model (TISMDP). This model allows for transitions into low power states from any state, but it is also more complex than our other approach. The results obtained by renewal model are guaranteed to match results obtained by TISMDP model, as both approaches give globally optimal solutions. We implemented our power management algorithms on two different classes of devices and the measurement results show power savings ranging from a factor of 1.7 up to 5.0 with insignificant variation in performance.

Energy-efficient software design

Time to market of embedded software has become a crucial issue. As a result, embedded software designers often use libraries that have been preoptimized for a given processor to achieve higher code quality. Unfortunately, current software design methodology often leaves high-level arithmetic optimizations and the use of complex library elements up to the designers' ingenuity. We present a tool flow and a methodology that automates the use of complex processor instructions and pre-optimized software library routines using symbolic algebraic techniques. It leverages our profiler that relates energy consumption to the source code and allows designers to quickly obtain energy consumption breakdown by procedures in their source code.

Energy-efficient wireless communication

Today’s wireless networks are highly heterogeneous with diverse range requirements and QoS. Since the battery lifetime is limited, power management of the communication interfaces without any significant degradation in performance has become essential. We show a set of different approaches that efficiently reduce power consumption under different environments and applications.When multiple wireless network interfaces (WNICs) are available, we propose a policy to decides what WNIC to employ for a given application and how to optimize the its usage leading to a large improvement in power savings. In the case of client-server multimedia applications running on wireless portable devices, we can exploit the server knowledge of the workload. We present a client- and a server-PM that by exchanging power control information can achieve more than 67 % with no performance loss. Wireless communication represents a critical aspect also in the design of specific applications such as distributed speech recognition in portable devices. We consider quality-of-service tradeoffs and overall system latency and present a wireless LAN scheduling algorithm to minimize the energy consimption of a distributed speech recognition front-end.