|The Internet of Things, Smart Cities, and Wireless Healthcare|
|Internet of Things Applications to Smart Grid and Green Energy|
|Calibration Models for Environmental Monitoring|
|Event-driven Power Management|
|Energy-efficient software design|
|Energy-efficient wireless communication|
|IoT System Characterization and Management|
|Energy Efficient Ad-hoc Wireless Networks Routing and Scheduling|
Neuroscience has proven to be a rich source of inspiration for the machine learning community: from the Perceptron, which in-troduced a simple and general-purpose learning algorithm for linear classifiers, convolutional architectures inspired by visual cortex, to sparse coding and independent component analysis. One of the most consequential discoveries from the neuroscience community – which has underlaid much research at the intersection of neuroscience and machine learning - has been the notion of high-dimensional distributed representations as the fundamental data structure for diverse types of information. In the neuroscience context, these representations are also typically sparse. To give a concrete example, the sensory systems of many organisms have a critical component consisting of a trans-formation from relatively low dimensional sensory inputs to much higher-dimensional sparse codes. These latter representations are then used for subsequent tasks such as recall and learning. HD computing builds on this line of research by using high-dimensional randomized data representations as the basic units of computation. Typical values for the dimension of hypervectors are above 5,000. The elements of a hypervector are typically either bipolar (e.g. 0/1 or ±1) or integers. Arbitrary real numbers are generally avoided for computational reasons but are not inherently unsupported. There are two core operations in HD computing: bundling and binding. Bundling is used to compile a collection of related objects into a single representation while binding is used to form a semantic link between two objects. Because of their high-dimensionality, any randomly chosen pair of hypervectors will be nearly orthogonal with high-probability. A useful consequence of this is that bundling can be easily implemented as a sum; for a collection of vectors P,Q,V their element-wise sum S=P+Q+V is, in expectation, closer to P, Q and V than any other randomly chosen vector in the space. Thus, we can represent sets simply by summing the component vectors. Given HD representations of data, this suggests a simple scheme for classification. We can simply take the data points corresponding to a particular class and superimpose them into a single represen-tation for the set. Then, given a new piece of data for which the correct class label is unknown, we can simply compute the similarity with the hypervectors representing each class and return the label corresponding to the most similar one. The processing of generating HD representations from low-dimensional data is known as ``encoding'' and is an active area of research in our group. We have developed novel techniques for encoding continuous data and generating quantized and sparse representations which are still effective for learning.
We live in a world where technological advances are continually creating more data than what we can cope with. With the emergence of the Internet of Things, sensory and embedded devices will generate massive data streams demanding services that pose huge technical challenges due to limited device resources. Even with the evolution of processor technology to serve computationally complex tasks in a more efficient way, data movement costs between processor and memory are the major bottleneck in the performance of most applications.
We seek to perform hardware/software co-design of novel hybrid processing in-memory platforms that accelerate fundamental operations and diverse data analytic procedures using processing in-memory technology (PIM). PIM enables in-situ operations, thereby reducing the effective memory bandwidth utilization. In the hardware layer, the proposed platform has a hybrid structure comprising of two units: PIM-enabled processors and PIM-based accelerators. The PIM-enabled processors enhance traditional processors by supporting fundamental block-parallel operations inside processor cache structure and associated memory, e.g., addition, multiplication, or bitwise computations. To fully get the advantage of PIM for popular data processing procedures and machine learning algorithms, we also design specialized PIM-based accelerator blocks. To deliver a seamless experience for application developers, we also develop a software infrastructure that provides abstracted interfaces corresponding to the PIM-enabled memory and accelerators.
Our solutions can process several applications including deep neural network training and inference, graph processing, brain-inspired hyperdimensional computing, multimedia applications, query processing in databases, bioinformatics, and security entirely in-memory. The proposed system, which integrates all optimized software components with improved hardware designs, can bring more than 10x speedup and at least 100-1000x improvement in energy efficiency. Our solutions ensure that the loss of accuracy when running applications with realistic data sets is kept small enough to not be perceivable, thus meeting the user’s quality requirements.
Over the past few years, the group has been actively working to answer the WHY, the WHERE, the WHEN, and the HOW of processing in memory. We are redesigning memory, all the way from systems to architecture down to the low-level circuits, to enable PIM for various applications like machine learning, bioinformatics, data analytics, and graph processing. Recently, we proposed FloatPIM, a highly-parallel and flexible architecture that implemented high precision training and testing of neural networks entirely in memory. The design was flexible enough to support both fixed and floating-point operations and provided stable training of complex neural networks. Such PIM-based architectures have shown multiple orders of magnitude of improvement in performance as well as energy efficiency.
Outpacing Moore’s law, genomic data is doubling every seven months and is expected to surpass YouTube and Twitter by 2025. Starting with sequencing and alignment, bioinformatics run a variety of algorithms on such big data from variant calling to classification to graph-based analysis for different objectives, e.g., to understand the disease-causing mutilations, personalized treatment, and protein harvesting drug production, just to name a few. Memory/storage and computation requirements of these applications extend from hundreds of CPU hours and gigabytes of memory, to millions of CPU hours and petabytes of storage. This tremendous amount of data entails redesigning the entire system stack with the goal of superior architectural solutions for memory/storage systems, e.g., intelligent use of high-bandwidth memory granted by advances in hardware, as well as significantly faster computation platforms for expedited decision to enable clinical use of technology in real-time, e.g., to shrink precision microbiome from its three-month per individual to a few hours through hardware-software co-design that supports and optimizes for data-intensive processing. To achieve that, we jointly put together our experience on microbiome algorithms and datasets, along with innovative hardware design including processing-in-memory (PIM) acceleration of alignment, clustering and classification as well as high-bandwidth FPGAs and GPUs and near-data accelerators to develop a full-stack infrastructure to map the aforementioned bioinformatics applications to novel hardware in an end-to-end manner, meanwhile rethinking on the design and implementation of novel algorithmic alternatives driven by the new hardware infrastructure, e.g., mapping applications on hyperdimensional computing paradigm which, thanks to its error tolerance, can benefit from novel technologies such as multi-level memory which is suitable to store DNA data.
The Internet of Things is a growing network of heterogeneous devices, combining commercial, industrial, residential and cloud-fog computing domains. These devices range from low-power sensors with limited capabilities to multi-core platforms on the high-end. The common property for these devices is that they age, degrade and eventually require maintenance in the form of repair, component replacement or complete device replacement. In general, power dissipation on devices makes the temperature rise, which in turn creates temperature stress that dramatically increases the impact of reliability degradation mechanisms leading to early failures. To analyze the effects of reliability degradation in IoT networks, we implemented a reliability framework. Utilizing this framework, we are able to explore trade-offs between energy, performance, and reliability. Currently the framework works with established models for servers, gateways and edge devices, which are obtained by fitting the parameters into our characterization results.
We work on “maintenance preventive” dynamic control strategies of the IoT devices to minimize the often ignored costs of maintenance. Our work has already demonstrated the importance of dynamic reliability management in mobile systems, by controlling the frequency, voltage and core allocations, while respecting user experience constraints. Our current research goal is to extend this into the whole IoT domain. Initially, we showed that the battery health of IoT devices can be improved with reliability-aware network management . We further propose optimal control strategies for diverse devices by adjusting their sampling rates, communication rates or frequency and voltage levels. The distributed solutions work towards limiting maintenance costs while keeping data quality within desired levels. We also develop smart path selection and workload offloading algorithms that ensure a balanced distribution of reliability across the network. The combined approach is to minimize operational and expected maintenance costs in a distributed and scalable fashion while respecting the user and data quality constraints imposed by the end-to-end IoT applications. In the future, we will validate our approaches on a large-scale sensor network testbed as part of the HPWREN setup.
Traditionally, environmental phenomena have been measured using stationary sensors configured into wireless sensor networks or through participatory sensing by user-carried devices. Since the phenomena are typically highly correlated in time and space, each reading from a stationary sensor is less informative than that of a similarly capable sensor on the move. User-carried sensors can take more informative readings, but we have no control over where the sensors travel.
Today’s computing systems are designed to deliver only exact solutions at a high energy cost, while many of the algorithms that are run on data are at their heart statistical, and thus do not require exact answers. However, the solutions to date are isolated to only a few of the components in the system stack. The real challenge arises when developers want to employ approximation across multiple layers simultaneously. Much of the potential gains are not realized since there are no system-level solutions.
We develop novel architectures with software and hardware support for approximate computing. Hardware components are enhanced with the ability to dynamically adapt approximation at a quantifiable and controllable cost in terms of accuracy. Software services complement hardware to ensure the user’s perception is not compromised while maximizing the energy savings due to approximations. The changes to hardware design include approximation-enabled CPU and GPU. GPU is enhanced with a small associative memory placed close to each stream core. The main idea of the approximation is, instead of accurate computing on existing processing units, to return pre-computed results from the associative memory, not only for perfect matches of operands but also for inexact matches. CPUs are designed with a set of small size associative memories next to each core. We also presented the nearest distance associative memory, which returns the precomputed result with the nearest distance to the input words on each search. The data in associative memory can replace normal CPU execution when a match is close enough. Such inexact matching is subject to a threshold that is set by the software layer.
Approximate computing solutions provide up to 10x improvements to both acceleration and energy-delay-product for various applications including image processing, CUDA benchmarks, and several machine learning algorithms, while incurring acceptable errors. Further, as the maximum allowable error per operation is increased, the performance also increases. For some applications like k-means, 91% of the operations can be approximated using our solution, resulting in only 1.1% classification error compared to one run on exact hardware only. hardware implementation and high computation energy cost are the main bottlenecks of machine learning algorithms in big data domain. We search for alternative architectures to address the computing cost and memory movement issues of traditional cores.
In an increasingly informed world, generating and processing information encompasses several computing domains from embedded systems in smart appliances to datacenters powering the cloud. We have worked on efficient distributed data collection and aggregation for processing the data in a hierarchical, context-focused manner. By using hierarchical processing, systems can distill relevant information, increase privacy, and optimize communication energy for Smart Cities, Data Centers, and distributed Smart Grid and Healthcare applications.
Sensor nodes at the edge of the Internet of Things often require sensor-specific calibration functions that relate input features to a phenomenon of interest. For example: in air quality sensing, the calibration function transforms input data from onboard sensors to target pollutant concentrations, and for application power prediction, internal performance metrics can be used to predict device power. Edge devices are typically resource constrained, meaning that traditional machine learning models are difficult to fit into the available storage and on-device training can strain available processing capabilities. We seek novel methods of reducing the complexity of training machine learning models on the edge by efficiently reducing training datasets, focusing calibration efforts into important regions using application-specific loss functions, and improving regression methods for resource-constrained devices.
The emergence of the Internet of Things has resulted in an abundance of data that can help researchers better understand their surroundings and create effective and automated actuation solutions.
Our research efforts on this topic target several problems:
(1) renewable energy integration and smart grid pricing in large scale systems,
(2) individual load energy reduction and automation and
(3) improved predictions mechanisms for context-aware energy management that leverage user activity modeling.
We have designed and implemented multiple tools that span from individual device predictors to a comprehensive representation of this vast environment.
With the proliferation of personal mobile computing via mobile phones and the advent of cheap, small sensors, we propose that a new kind of "citizen infrastructure", can be made pervasive at low cost and high value. Though challenges abound in mobile power management, data security, privacy, inference with commodity sensors, and "polite" user notification, the overriding challenge lies in the integration of the parts into a seamless yet modular whole that can make the most of each piece of the solution at every point in time through dynamic adaptation. Using existing integration methodologies would cause components to hide essential information from each other, limiting optimization possibilities. Emphasizing seamlessness and information sharing, on the other hand, would result in a monolithic solution that could not be modularly configured, adapted, maintained, or upgraded.
The Internet of Things is a growing network of heterogeneous devices, combining commercial, industrial, residential and cloud-fog computing domains. These devices range from low-power sensors with limited capabilities to multi-core platforms on the high-end. The IoT systems creates both new opportunities and challenges in several different domains. The abundance of data helps researchers to better understand their surroundings and create automated solutions which effectively model and manage diverse constrained resources in IoT devices and networks, including power, performance, thermal, reliability and variability. SEELab's research efforts on this topic target to solve these problems, including renewable energy integration in large scale systems, individual load energy reduction and automation, energy storage, context-aware energy management for smart devices, user activity modeling, smart grid pricing and load integration. To solve these problems, we design and implement multiple tools that not only model and analyze smaller individual pieces but also create a comprehensive representation of this vast environment.
Long-term research requiring high-resolution sensor data need platforms large enough to house solar panels and batteries. Leveraging a well-defined sensor appliance created using Sensor-Rocks, we develop novel context-aware power management algorithms to maximize network lifetime and provide unprecedented capability on miniaturized platforms.
In large-scale ad hoc wireless networks, data delivery is complicated by the lack of network infrastructure and limited energy resources. We propose a novel scheduling and routing strategy for ad hoc wireless networks which achieves up to 60% power savings while delivering data efficiently. We test our ideas on a heterogeneous wireless sensor network deployed in southern California - HPWREN.
SHiMmer is a wireless platform that combines active sensing and localized processing with energy harvesting to provide long-lived structural health monitoring. Unlike other sensor networks that periodically monitor a structure and route information to a base station, our device acquires data and processes it locally before communicating with an external device, such as a remote controlled helicopter.
Power management (PM) algorithms aim at reducing energy consumption at the system-level by selectively placing components into low-power states. Formerly, two classes of heuristic algorithms have been proposed for power management: timeout and predictive. Later, a category of algorithms based on stochastic control was proposed for power management. These algorithms guarantee optimal results as long as the system that is power managed can be modeled well with exponential distributions. Another advantage is that they can meet performance constraints, something that is not possible with heuristics. We show that there is a large mismatch between measurements and simulation results if the exponential distribution is used to model all user request arrivals. We develop two new approaches that better model system behavior for general user request distributions. These approaches are event driven and give optimal results verified by measurements. The first approach is based on renewal theory. This model assumes that the decision to transition to low power state can be made in only one state. Another method we developed is based on the Time-Indexed Semi-Markov Decision Process model (TISMDP). This model allows for transitions into low power states from any state, but it is also more complex than our other approach. The results obtained by renewal model are guaranteed to match results obtained by TISMDP model, as both approaches give globally optimal solutions. We implemented our power management algorithms on two different classes of devices and the measurement results show power savings ranging from a factor of 1.7 up to 5.0 with insignificant variation in performance.
Time to market of embedded software has become a crucial issue. As a result, embedded software designers often use libraries that have been preoptimized for a given processor to achieve higher code quality. Unfortunately, current software design methodology often leaves high-level arithmetic optimizations and the use of complex library elements up to the designers' ingenuity. We present a tool flow and a methodology that automates the use of complex processor instructions and pre-optimized software library routines using symbolic algebraic techniques. It leverages our profiler that relates energy consumption to the source code and allows designers to quickly obtain energy consumption breakdown by procedures in their source code.
Today’s wireless networks are highly heterogeneous with diverse range requirements and QoS. Since the battery lifetime is limited, power management of the communication interfaces without any significant degradation in performance has become essential. We show a set of different approaches that efficiently reduce power consumption under different environments and applications.When multiple wireless network interfaces (WNICs) are available, we propose a policy to decides what WNIC to employ for a given application and how to optimize the its usage leading to a large improvement in power savings. In the case of client-server multimedia applications running on wireless portable devices, we can exploit the server knowledge of the workload. We present a client- and a server-PM that by exchanging power control information can achieve more than 67 % with no performance loss. Wireless communication represents a critical aspect also in the design of specific applications such as distributed speech recognition in portable devices. We consider quality-of-service tradeoffs and overall system latency and present a wireless LAN scheduling algorithm to minimize the energy consimption of a distributed speech recognition front-end.