Research Centers





Current Research



Past Research

Internet Of Things

Trajectories for Persistent Monitoring
The Internet of Things, Smart Cities, and Wireless Healthcare
Internet of Things Applications to Smart Grid and Green Energy
Calibration Models for Environmental Monitoring
SensorRocks
SHiMmer

Energy-efficient Design and Management

Event-driven Power Management
Energy-efficient software design
Energy-efficient wireless communication
IoT System Characterization and Management
Energy Efficient Ad-hoc Wireless Networks Routing and Scheduling

Efficient Hardware Design

Approximate Computing



Hyperdimensional Computing

Biological brains are capable of performing remarkably sophisticated cognitive tasks using Biological brains are capable of performing remarkably sophisticated cognitive tasks using hardware that is robust to noise, runs nearly instantaneously, and requires vanishingly little energy compared to conventional computer architectures. Hyperdimensional computing (HDC) is an emerging field at the intersection of theoretical neuroscience, machine learning, and low- power hardware design, that aims to develop a new generation of highly efficient digital devices based around biologically inspired models of data representation and computation. In HDC, the basic units of computation are high-dimensional and distributed representations of data, that can be manipulated using simple element-wise operators to effect cognitive information processing tasks like learning and memory. Our work on hyperdimensional computing covers the full spectrum from theoretical foundations to implementation in silicon and broadly falls into three main areas:

1. Lifelong and Continual Learning: Biological organisms are typically able to learn continuously over their lifetimes and adapt previously gained knowledge to new settings with limited supervision. By contrast, modern AI algorithms are typically slow to adapt to new settings and often suffer from “catastrophic forgetting” in which previously acquired knowledge is overwritten and lost. As learning moves onto low-power devices deployed in the real world, the need for AI algorithms that can learn continuously and rapidly adapt to changing environments is becoming increasingly pressing. Our work focuses on leveraging the algorithmic capabilities of HDC, along with efficient implementation in hardware, to develop simple learning algorithms that can be deployed on lightweight edge devices while exhibiting adaptability to a dynamic data environment.

2. Theoretical Foundations: Our work on the theoretical foundations of HDC uses techniques from probability and statistical learning to develop understanding of how choices about the HD architecture and properties of the underlying data effect the capabilities of HD-information processing algorithms. Our work is interested in answering questions like: what kinds of structure in the input are preserved by the mapping to HD space? what kinds of noise can be tolerated in HD space? how do properties if the underlying data effect the computational complexity of algorithms expressed on HD representations? Our work involves frequent collaboration with researchers in theoretical computer science, cognitive science and electrical engineering.

3. Hardware and Implementation HD computing's attractive properties of lightweight computation, high parallelism, and interpretability, have promoted its applications to various workloads, including classification, clustering, and pattern matching. Recent works from SEELab leverage HD computing to accelerate few-shot learning [1] and mass spectrometry data analysis [2,3], showing orders of magnitude efficiency improvements. Our HD-based few-shot classifier [1] demonstrates comparable accuracy and learning speed to state-of-the-art algorithms while delivering even higher computing and energy efficiency. HD computing also provides a fast and efficient solution for large-scale mass spectrometry data analysis. Our open-source library, HyperSpec, utilizes HD computing and GPU to shorten the current spectrum clustering time from 4 hours to 15 minutes. Moreover, HD computing can be efficiently implemented using emerging processing-in-memory techniques. Our HD-based in-DRAM accelerator for open modification search (OMS) of mass spectrometry shows >100× latency reduction over the state-of-the-art GPU baselines.

[1] Weihong Xu, Jaeyoung Kang, and Tajana Šimunić Rosing. “FSL-HD: Accelerating Few-Shot Learning on ReRAM using Hyperdimensional Computing”. Design, Automation and Test in Europe Conference (DATE), 2023.
[2] Weihong Xu, Jaeyoung Kang, Wout Bittremieux, Niema Moshiri, and Tajana Šimunić Rosing. “HyperSpec: Fast Mass Spectra Clustering in Hyperdimensional Space”. Journal of Proteome Research, 2023.
[3] Jaeyoung Kang, Weihong Xu, Wout Bittremieux, and Tajana Šimunić Rosing. “DRAM-based Acceleration of Open Modification Search for Mass Spectrometry-Based Proteomics”. submitted to IEEE Transactions on Emerging Topics in Computing (TETC).


Processing in Memory and Intelligent Memory Systems

Processing In Memory
We live in a world where technological advances are continually creating more data than what we can cope with. With the emergence of the Internet of Things, sensory and embedded devices will generate massive data streams demanding services that pose huge technical challenges due to limited device resources. Even with the evolution of processor technology to serve computationally complex tasks in a more efficient way, data movement costs between processor and memory are the major bottleneck in the performance of most applications. We seek to perform hardware/software co-design of novel hybrid processing in-memory platforms that accelerate fundamental operations and diverse data analytic procedures using processing in-memory technology (PIM). PIM enables in-situ operations, thereby reducing the effective memory bandwidth utilization. In the hardware layer, the proposed platform has a hybrid structure comprising of two units: PIM-enabled processors and PIM-based accelerators. The PIM-enabled processors enhance traditional processors by supporting fundamental block-parallel operations inside processor cache structure and associated memory, e.g., addition, multiplication, or bitwise computations. To fully get the advantage of PIM for popular data processing procedures and machine learning algorithms, we also design specialized PIM-based accelerator blocks. To deliver a seamless experience for application developers, we also develop a software infrastructure that provides abstracted interfaces corresponding to the PIM-enabled memory and accelerators.
Our solutions can process several applications including deep neural network training and inference, graph processing, brain-inspired hyperdimensional computing, multimedia applications, query processing in databases, bioinformatics, and security entirely in-memory. The proposed system, which integrates all optimized software components with improved hardware designs, can bring more than 10x speedup and at least 100-1000x improvement in energy efficiency. Our solutions ensure that the loss of accuracy when running applications with realistic data sets is kept small enough to not be perceivable, thus meeting the user’s quality requirements.
Over the past few years, the group has been actively working to answer the WHY, the WHERE, the WHEN, and the HOW of processing in memory. We are redesigning memory, all the way from systems to architecture down to the low-level circuits, to enable PIM for various applications like machine learning, bioinformatics, data analytics, and graph processing. Recently, we proposed FloatPIM, a highly-parallel and flexible architecture that implemented high precision training and testing of neural networks entirely in memory. The design was flexible enough to support both fixed and floating-point operations and provided stable training of complex neural networks. Such PIM-based architectures have shown multiple orders of magnitude of improvement in performance as well as energy efficiency.

Intelligent Memory Systems
We live in a world where technological advances are continually creating more data than what we can cope with. With the emergence of the Internet of Things, sensory and embedded devices will generate massive data streams demanding services that pose huge technical challenges due to limited device resources. Even with the evolution of processor technology to serve computationally complex tasks in a more efficient way, data movement costs between processor and memory are the major bottleneck in the performance of most applications. We seek to perform hardware/software co-design of novel hybrid processing intelligent memory platforms that accelerate fundamental operations and diverse data analytic procedures in intelligent memory systems (IMS). IMS enables in-situ operations across different levels in the memory stack. IMS can potentially reduce the effective memory bandwidth utilization, therefore boosting the performance and energy-efficiency of memory-bound applications. We explore various types of IMS acceleration, including but not limited to processing in-memory (PIM), processing near-memory (PNM), and in-storage processing (ISP). Each IMS architecture features specific advantages for computing, there we designed various architectures to fit different application scenarios by exploiting one or more IMS technologies for both conventional general-purpose systems and domain-specific accelerators. To deliver a seamless experience for application developers, we also develop software infrastructures that provide abstracted interfaces corresponding to IMS including full-stack simulation infrastructure, compiler-level data layout optimization for deep neural networks, and compiler/binary-level automatic offloading framework. We also extensively explore cutting-edge materials in the circuit-level to push the boundaries of future IMS technologies. Over the past few years, the group has been actively working to answer the WHY, the WHERE, the WHEN, and the HOW of intelligent memory systems. We are redesigning memory and storage, all the way from systems to architecture down to the low-level circuits, to enable IMS for various applications like machine learning, bioinformatics, data analytics, graph processing, and cryptography. The proposed system, which integrates all optimized software components with improved hardware designs, can bring more than 10x speedup and at least 100-1000x improvement in energy efficiency.


Bioinformatics Acceleration

Outpacing Moore’s law, genomic data is doubling every seven months and is expected to surpass YouTube and Twitter by 2025. Starting with sequencing and alignment, bioinformatics runs a variety of algorithms on big data, from variant calling to classification. Memory/storage and computation requirements of these applications extend from hundreds of CPU hours and gigabytes of memory to millions of CPU hours and petabytes of storage. This tremendous amount of data entails redesigning the entire system stack with the goal of superior architectural solutions for memory/storage systems, e.g., intelligent use of HBM granted by advances in hardware, as well as significantly faster computation platforms for an expedited decision to enable clinical use of technology in real-time, e.g., to shrink precision microbiome from its three- month per individual to a few hours through hardware-software co-design that supports and optimizes for data-intensive processing. To achieve that, our team jointly put together our experience on microbiome algorithms and innovative hardware design, including PIM, FPGAs, GPUs, and near-data accelerators to develop a full-stack infrastructure, which accelerates the bioinformatics applications to novel hardware in an end-to-end manner. Furthermore, we rethink the design and implementation of novel algorithmic alternatives driven by the new hardware infrastructure, e.g., mapping applications on hyperdimensional computing paradigm which can maximize parallelism and error tolerance.

Case Study: Sequence Alignment using PIM
The global sequence alignment can be formulated as finding the optimal edit operations, including deletion, insertion, and substituting of the base pairs required to transform sequence x to sequence y. The search space for evaluating all possible alignments is exponentially proportional to the length of the sequences and becomes computationally intractable. The algorithmic advances, e.g., Needleman-Wunsch, reduced the search space down to quadratic, but the limited computing resources of the CPU severely restrict the achievable performance. As such, hardware accelerators for alignment have been proposed, but they suffer from limited on- chip memory, costly data movement, and poorly optimized alignment algorithms. We proposed a ReRAM-based PIM accelerator called RAPIDx, which maximizes the performance and its efficiency via software-hardware co-design. The proposed RAPIDx is reconfigurable to serve as a co-processor integrated into the existing genome analysis pipeline to boost sequence alignment or edit distance calculation. Our solution achieves one or two of magnitude speedup over the state-of-the-art CPU and GPU libraries.

Case Study: De novo Assembly using de Bruijn Graphs (DBGs)
DBGs are the core of the so-called de novo assembly that pieces short DNA reads together to construct the original genome. Processing DBGs is extremely challenging because of the massive graph size. Algorithms on DBG require an excessive amount of linked list traversals, pointer chasing, and hash-table lookups, which are inefficient on compute-centric systems. In this project, we seek a novel architecture that utilizes emerging in-storage and in-memory processing technologies to accelerate DBG assemblers to tackle the challenges of parallelism and memory throughput. As the initial milestones, we investigate state-of-the-art DBG-based methods and recognize critical operations in three processing phases: graph construction, graph cleaning, and sequence assembly. Next, we design a software-hardware solution to effectively utilize processing capability in different system hierarchies. We aim to split and distribute the DBG over multiple memory vaults for algorithmic parallelism, where each sub-graph further enjoys accelerated pruning and traversing operations.

Case Study: Mass spectrometry-based proteomics analysis pipeline acceleration
Mass spectrometry (MS) has been a key to proteomics and metabolomics due to its unique ability to identify and analyze protein structures. In a typical MS experiment, the equipment generates a massive amount of spectra data. Spectra analysis pipeline consists of three major blocks: preprocessing, clustering, and searching. Our profiling results of existing tools shows that the data movement and lack of parallelism is the key bottleneck. Our team developed a near- storage computing solution to mitigate data movement bottlenecks during preprocessing. For clustering and searching, we redesign an algorithm to maximize the parallelism for end-to-end runtime. Furthermore, we devised a novel PIM hardware tailored for the proposed algorithm and a scheduling algorithm to maximize the parallelism.


Fully Homomorphic Encryption

Fully homomorphic encryption (FHE) is an encryption technique that allows computation over encrypted data. As a promising post-quantum cryptography technique, FHE can be applied to many secure systems, including privacy-preserving machine learning (PPML) and multi-party computation (MPC). However, the size of encrypted data and computational overhead of FHE are orders of magnitudes larger compared to plaintext. Our work focuses on algorithmic solutions and hardware acceleration to deal with the overhead of FHE-based applications. First, we try to find light-weight and efficient algorithms to replace recent expensive models. For example, hyper-dimensional computing can be leveraged for privacy-preserving machine learning, instead of recent expensive DNN algorithms. Secondly, we suggest hardware accelerators for FHE schemes, making FHE-based applications feasible. As memory is the most significant bottleneck in FHE applications which suffer from explosion of data and computation after encryption. Our lab proposed several accelerators based on emerging processing in-memory technologies, achieving significant performance, energy efficiency, and area efficiency improvements over conventional architectures.


IoT and Edge Computing

Recent years have witnessed an exponential increase in the pervasive deployment of Internet-of-Things (IoT) devices for real-world applications, for example, smart cities and smart agriculture. Along with the recent advancements of lightweight machine learning and powerful platforms (e.g., NVIDIA Jetson Nano), computing at the edge has become the next tide of IoT. However, there remains a research question on how to enable intelligence computing on IoT devices considering the heterogeneous nature of IoT infrastructures and the limited resources and energy.

The IoT research in our lab contains the following aspects:

(1) Designing robust, adaptive and lightweight machine learning algorithms for edge devices.
Three major challenges to deploy intelligence in IoT networks come from the nature of real world: (i) the drifting of data distribution across time, for example, sensor aging or dynamic environmental changes (e.g., the seasonal changes from spring to winter), (ii) the data heterogeneity across space, for example, spatially distributed sensors may be located in different environment contexts (e.g. mountain versus sea), and (iii) the lack of supervision and prior knowledge in the field. We are actively working on designing new machine learning algorithms for edge devices, specializing in self-supervised learning, transfer learning, lifelong learning, multimodal learning and federated learning. Our goal is to learn adaptively, robustly and continually on a single edge device, while collaboratively from distributed devices while preserving the personalized patterns on each device.

(2) Hardware-software co-design to enable energy-efficient computing.
Computing on embedded systems is subject to the variable hardware power and area constraints, as well as the restricted energy supply, all of which largely limit the scalability. Hence the design of edge computing requires the considerations from both hardware and software aspects: we optimize the learning algorithms so that the hardware execution is more time- and energy-efficient; we improve the hardware architecture to better host the software algorithms. A major research force in this category is around Hyperdimensional Computing. Active research projects include developing new HDC-based algorithms for edge applications, and integrating new hardware design for HDC on embedded systems, aligning with the tinyML direction. Another research direction is optimizing resources and energy across the IoT infrastructure while ensuring sufficient (learning) performance. A typical project is resource allocation and optimization in a federated learning-enabled IoT network, given data, system and networking heterogeneities.

(3) Connections to real-world applications, for example, Industrial IoT, persistent monitoring with drones, IoT management and reliability, etc.

Industrial IoT: Industry 4.0 or fourth industrial revolution is an important milestone in factories and production systems, where smart manufacturing has become an essential component. This leverages the notion of Industrial Internet of Things (IIoT) which is an adaptation of traditional IoT for production environments focusing on machine-to-machine communication, big data, and machine learning for higher system efficiency and reliability. Due to data-rich characteristic of these systems, it is crucial to utilize the collected big data. We focus on the data analytics part of I-IoT systems where data collected from smart devices are analyzed. To reach that goal, our lab is working on the following research topics:

Persistent monitoring with drones:
Traditionally, environmental phenomena have been measured using stationary sensors configured into wireless sensor networks or through participatory sensing by user-carried devices. Since the phenomena are typically highly correlated in time and space, each reading from a stationary sensor is less informative than that of a similarly capable sensor on the move. User-carried sensors can take more informative readings, but we have no control over where the sensors travel. Our work in Trajectories for Persistent Monitoring helps to close this gap by optimizing the path that robotic platforms travel to maximize the information gained from data samples. Multi-objective goals are formed using information gain and additional goals, such as system responsiveness to dynamic points of interest, multi-sensor fusion, and information transfer using cognitive radios. The resulting robots can adapt to dynamic environments to rapidly detect evolving wildfires, support first responders in emergency situations, and collect information to improve air quality models for a region.

IoT management and reliability:
We work on “maintenance preventive” dynamic control strategies of the IoT devices to minimize the often ignored costs of maintenance. Our work has already demonstrated the importance of dynamic reliability management in mobile systems, by controlling the frequency, voltage and core allocations, while respecting user experience constraints. Our current research goal is to extend this into the whole IoT domain. Initially, we showed that the battery health of IoT devices can be improved with reliability-aware network management . We further propose optimal control strategies for diverse devices by adjusting their sampling rates, communication rates or frequency and voltage levels. The distributed solutions work towards limiting maintenance costs while keeping data quality within desired levels. We also develop smart path selection and workload offloading algorithms that ensure a balanced distribution of reliability across the network. The combined approach is to minimize operational and expected maintenance costs in a distributed and scalable fashion while respecting the user and data quality constraints imposed by the end-to-end IoT applications.

Design of accurate real-time ensemble learning models: IIoT systems consist of large number of individual systems and components and finding a single method that works best in these various settings is a difficult task. Instead of a single method, ensemble learning combines multiple algorithms (i.e., base learners) and improves base learner performance. This results in more successful data analytics methodology, increasing system robustness and decreasing maintenance costs of I- IoT systems. We aim to propose accurate real-time ensemble learning solutions for IIoT analytics.

Design of secure machine learning models: IIoT systems possess numerous security vulnerabilities due to inter-connectivity and limited computational power. An adversary can exploit these vulnerabilities to sabotage communication, prevent asset availability, and corrupt monitoring data which may have serious financial consequences. Attacks on ML models is one type of cyber-attack which draws considerable attention as these methods are becoming more widely adopted. These attacks are especially a grave threat to I-IoT analytics leading to serious outcomes such as delayed maintenance, undetected failures, or replacement of a machine. Our goal is to design secure ML models and novel defense mechanisms that can minimize the impact of adversarial attacks.
industrial_iot industrial_iot


Past Research

Trajectories for Persistent Monitoring

Traditionally, environmental phenomena have been measured using stationary sensors configured into wireless sensor networks or through participatory sensing by user-carried devices. Since the phenomena are typically highly correlated in time and space, each reading from a stationary sensor is less informative than that of a similarly capable sensor on the move. User-carried sensors can take more informative readings, but we have no control over where the sensors travel.

Our work in Trajectories for Persistent Monitoring helps to close this gap by optimizing the path that robotic platforms travel to maximize the information gained from data samples. Multi-objective goals are formed using information gain and additional goals, such as system responsiveness to dynamic points of interest, multi-sensor fusion, and information transfer using cognitive radios. The resulting robots can adapt to dynamic environments to rapidly detect evolving wildfires, support first responders in emergency situations, and collect information to improve air quality models for a region.

Approximate Computing

Today’s computing systems are designed to deliver only exact solutions at a high energy cost, while many of the algorithms that are run on data are at their heart statistical, and thus do not require exact answers. However, the solutions to date are isolated to only a few of the components in the system stack. The real challenge arises when developers want to employ approximation across multiple layers simultaneously. Much of the potential gains are not realized since there are no system-level solutions.
We develop novel architectures with software and hardware support for approximate computing. Hardware components are enhanced with the ability to dynamically adapt approximation at a quantifiable and controllable cost in terms of accuracy. Software services complement hardware to ensure the user’s perception is not compromised while maximizing the energy savings due to approximations. The changes to hardware design include approximation-enabled CPU and GPU. GPU is enhanced with a small associative memory placed close to each stream core. The main idea of the approximation is, instead of accurate computing on existing processing units, to return pre-computed results from the associative memory, not only for perfect matches of operands but also for inexact matches. CPUs are designed with a set of small size associative memories next to each core. We also presented the nearest distance associative memory, which returns the precomputed result with the nearest distance to the input words on each search. The data in associative memory can replace normal CPU execution when a match is close enough. Such inexact matching is subject to a threshold that is set by the software layer.
Approximate computing solutions provide up to 10x improvements to both acceleration and energy-delay-product for various applications including image processing, CUDA benchmarks, and several machine learning algorithms, while incurring acceptable errors. Further, as the maximum allowable error per operation is increased, the performance also increases. For some applications like k-means, 91% of the operations can be approximated using our solution, resulting in only 1.1% classification error compared to one run on exact hardware only. hardware implementation and high computation energy cost are the main bottlenecks of machine learning algorithms in big data domain. We search for alternative architectures to address the computing cost and memory movement issues of traditional cores.


The Internet of Things, Smart Cities, and Wireless Healthcare

In an increasingly informed world, generating and processing information encompasses several computing domains from embedded systems in smart appliances to datacenters powering the cloud. We have worked on efficient distributed data collection and aggregation for processing the data in a hierarchical, context-focused manner. By using hierarchical processing, systems can distill relevant information, increase privacy, and optimize communication energy for Smart Cities, Data Centers, and distributed Smart Grid and Healthcare applications.


Calibration Models for Environmental Monitoring

Sensor nodes at the edge of the Internet of Things often require sensor-specific calibration functions that relate input features to a phenomenon of interest. For example: in air quality sensing, the calibration function transforms input data from onboard sensors to target pollutant concentrations, and for application power prediction, internal performance metrics can be used to predict device power. Edge devices are typically resource constrained, meaning that traditional machine learning models are difficult to fit into the available storage and on-device training can strain available processing capabilities. We seek novel methods of reducing the complexity of training machine learning models on the edge by efficiently reducing training datasets, focusing calibration efforts into important regions using application-specific loss functions, and improving regression methods for resource-constrained devices.


The Internet of Things with Applications to Smart Grid and Green Energy

The emergence of the Internet of Things has resulted in an abundance of data that can help researchers better understand their surroundings and create effective and automated actuation solutions. Our research efforts on this topic target several problems: (1) renewable energy integration and smart grid pricing in large scale systems, (2) individual load energy reduction and automation and (3) improved predictions mechanisms for context-aware energy management that leverage user activity modeling.
We have designed and implemented multiple tools that span from individual device predictors to a comprehensive representation of this vast environment.

Wireless Healthcare

With the proliferation of personal mobile computing via mobile phones and the advent of cheap, small sensors, we propose that a new kind of "citizen infrastructure", can be made pervasive at low cost and high value. Though challenges abound in mobile power management, data security, privacy, inference with commodity sensors, and "polite" user notification, the overriding challenge lies in the integration of the parts into a seamless yet modular whole that can make the most of each piece of the solution at every point in time through dynamic adaptation. Using existing integration methodologies would cause components to hide essential information from each other, limiting optimization possibilities. Emphasizing seamlessness and information sharing, on the other hand, would result in a monolithic solution that could not be modularly configured, adapted, maintained, or upgraded.


IoT System Characterization and Management: from Data Centers to Smart Devices and Sensors

The Internet of Things is a growing network of heterogeneous devices, combining commercial, industrial, residential and cloud-fog computing domains. These devices range from low-power sensors with limited capabilities to multi-core platforms on the high-end. The IoT systems creates both new opportunities and challenges in several different domains. The abundance of data helps researchers to better understand their surroundings and create automated solutions which effectively model and manage diverse constrained resources in IoT devices and networks, including power, performance, thermal, reliability and variability. SEELab's research efforts on this topic target to solve these problems, including renewable energy integration in large scale systems, individual load energy reduction and automation, energy storage, context-aware energy management for smart devices, user activity modeling, smart grid pricing and load integration. To solve these problems, we design and implement multiple tools that not only model and analyze smaller individual pieces but also create a comprehensive representation of this vast environment.


SensorRocks

Long-term research requiring high-resolution sensor data need platforms large enough to house solar panels and batteries. Leveraging a well-defined sensor appliance created using Sensor-Rocks, we develop novel context-aware power management algorithms to maximize network lifetime and provide unprecedented capability on miniaturized platforms.


Energy Efficient Routing and Scheduling For Ad-Hoc Wireless Networks

In large-scale ad hoc wireless networks, data delivery is complicated by the lack of network infrastructure and limited energy resources. We propose a novel scheduling and routing strategy for ad hoc wireless networks which achieves up to 60% power savings while delivering data efficiently. We test our ideas on a heterogeneous wireless sensor network deployed in southern California - HPWREN.


SHiMmer

SHiMmer is a wireless platform that combines active sensing and localized processing with energy harvesting to provide long-lived structural health monitoring. Unlike other sensor networks that periodically monitor a structure and route information to a base station, our device acquires data and processes it locally before communicating with an external device, such as a remote controlled helicopter.


Event-driven Power Management

Power management (PM) algorithms aim at reducing energy consumption at the system-level by selectively placing components into low-power states. Formerly, two classes of heuristic algorithms have been proposed for power management: timeout and predictive. Later, a category of algorithms based on stochastic control was proposed for power management. These algorithms guarantee optimal results as long as the system that is power managed can be modeled well with exponential distributions. Another advantage is that they can meet performance constraints, something that is not possible with heuristics. We show that there is a large mismatch between measurements and simulation results if the exponential distribution is used to model all user request arrivals. We develop two new approaches that better model system behavior for general user request distributions. These approaches are event driven and give optimal results verified by measurements. The first approach is based on renewal theory. This model assumes that the decision to transition to low power state can be made in only one state. Another method we developed is based on the Time-Indexed Semi-Markov Decision Process model (TISMDP). This model allows for transitions into low power states from any state, but it is also more complex than our other approach. The results obtained by renewal model are guaranteed to match results obtained by TISMDP model, as both approaches give globally optimal solutions. We implemented our power management algorithms on two different classes of devices and the measurement results show power savings ranging from a factor of 1.7 up to 5.0 with insignificant variation in performance.


Energy-efficient software design

Time to market of embedded software has become a crucial issue. As a result, embedded software designers often use libraries that have been preoptimized for a given processor to achieve higher code quality. Unfortunately, current software design methodology often leaves high-level arithmetic optimizations and the use of complex library elements up to the designers' ingenuity. We present a tool flow and a methodology that automates the use of complex processor instructions and pre-optimized software library routines using symbolic algebraic techniques. It leverages our profiler that relates energy consumption to the source code and allows designers to quickly obtain energy consumption breakdown by procedures in their source code.


Energy-efficient wireless communication

Today’s wireless networks are highly heterogeneous with diverse range requirements and QoS. Since the battery lifetime is limited, power management of the communication interfaces without any significant degradation in performance has become essential. We show a set of different approaches that efficiently reduce power consumption under different environments and applications.When multiple wireless network interfaces (WNICs) are available, we propose a policy to decides what WNIC to employ for a given application and how to optimize the its usage leading to a large improvement in power savings. In the case of client-server multimedia applications running on wireless portable devices, we can exploit the server knowledge of the workload. We present a client- and a server-PM that by exchanging power control information can achieve more than 67 % with no performance loss. Wireless communication represents a critical aspect also in the design of specific applications such as distributed speech recognition in portable devices. We consider quality-of-service tradeoffs and overall system latency and present a wireless LAN scheduling algorithm to minimize the energy consimption of a distributed speech recognition front-end.