When you get your home electricity bill every month, you just get an aggregate measure: you used this many kilowatt-hours, so we will charge you $X. But what if you instead received a detailed receipt that broke down how much energy you spent on each device?
Providing that information can ‘close the loop’ and allow people to better understand and adjust their energy consumption. Studies have shown this kind of feedback can improve user efficiency by 15%, reducing both user costs and environmental impact (see Sections 1 and 2 of the paper “REDD: A Public Data Set for Energy Disaggregation Research” and its references 1 and 13). Also, with a substantial amount of home electrical power coming from coal and, increasingly, controversial shale gas sources, improving efficiency is a big win for everybody.
But how can we provide that detailed electricity usage breakdown? It is not feasible to retrofit every electrical device with monitoring equipment, or for a single device on the power mains to measure the whole-home power consumption signal, which is the sum of all power being consumed by all devices. The power consumption for each device would need to be disaggregated from that sum signal.
The figure in disagg_data.pdf (and aggregate_signal.pdf) shows the power consumption measured for a home over a 24 hour period:
By identifying statistical patterns in the signal, like the refrigerator’s regular pulses, we can hope to perform the disaggregation into device consumption signals. So our goal is to identify statistical models for power-consuming devices, and use those models to disaggregate whole-home signals into their component device signals (this problem is known as power disaggregation or non-invasive load monitoring (NILM) and it has a long history; we didn’t invent it. Our statistical learning methods provide a great way to solve it as well as many other unrelated problems. NILM is the coolest application so far, though.)
We build general, flexible statistical models and corresponding inference algorithms that can efficiently learn diverse and powerful representations for many kinds of data, including representations for electrical device power consumption. In particular, we built the Hierarchical Dirichlet Process Hidden Semi-Markov Model (HDP-HSMM, alphabet soup) to discover patterns in data where duration regularity can play a significant role. Being a Bayesian method, it also models uncertainty directly, and in difficult problems it’s important to estimate how certain or uncertain answers are. In the case of power disaggregation, patterns in duration statistics can be the key to separate a refrigerator’s regular pulses from the switching of lights or the rapid “bang-bang” pulses of a washer/dryer (bang-bang is a technical term!)
We learned device models by applying an HDP-HSMM to separate data from several devices, and then tested how well it could separate aggregated data across new devices in another home (e.g. different refrigerator model, different lighting). We compared that to similar models that do not capture the details of the duration regularities (the HDP-HMM) and that do not model the uncertainty in device variation from apartment to apartment (the EM-HMM, which represents a fairly “standard” approach to problems like this one) the HDP-HSMM improved upon the other models’ performance and proved to be a promising approach to the power disaggregation problem.
Other applications for the HDP-HSMM:
The biggest win with the HDP-HSMM and related models is the generality and flexibility of the model and its inference algorithms. Using the same HDP-HSMM model and even much of the same code, we’re automatically discovering patterns in human speech signals and modeling behavioral patterns in laboratory mice.
All our code is free and open-source software, so fork it on GitHub!