This paper investigates adaptive optimal control of a grid-independent photovoltaic system consisting of a collector, storage, and a load. The control algorithm is based on Q-Learning, a model-free reinforcement learning algorithm, which optimizes control performance through exploration. Q-Learning is used in a simulation study to find a policy which performs better than a conventional control strategy with respect to a cost function which places more weight on meeting a critical base load than on those non-critical loads exceeding the base load.
1.
Watkins
, C.
, and Dayan
, P.
, 1992
, “Q-Learning
,” Machine Learning
, 8
, pp. 279
–292
.2.
Gullapalli
, V.
, 1990
, “A Stochastic Reinforcement Learning Algorithm for Learning Real-Valued Functions
,” Neural Networks
, 3
, pp. 671
–692
.3.
Sutton, R. S., and Barto, A. G., 1998, Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
4.
Bellman, Richard E., 1957, Dynamic Programming, Princeton Univ. Press, Princeton, NJ.
5.
Bertsekas, D. P., and Tsitsiklis, J. N., 1996, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA.
6.
Cybenko, G., Gray, R., and Moizumi, K., 1995, “Q-Learning: A Tutorial and Extensions,” Mathematics of Artificial Neural Networks, Oxford Univ. England, July 1995.
7.
Sheppard, M., Oswald, A., Valenzuela, C., Sullivan, G., and Sotudeh, R., 1993, “Reinforcement Learning in Control,” 9th Int. Conf. on Mathematical and Computer Modeling, Berkeley, CA, July 1993.
8.
Cardinale, J., 1994, “Model of RMSE Photovoltaic Design,” Report, Univ. of Colorado at Boulder, Joint Center for Energy Management.
Copyright © 2003
by ASME
You do not currently have access to this content.