This paper investigates adaptive optimal control of a grid-independent photovoltaic system consisting of a collector, storage, and a load. The control algorithm is based on Q-Learning, a model-free reinforcement learning algorithm, which optimizes control performance through exploration. Q-Learning is used in a simulation study to find a policy which performs better than a conventional control strategy with respect to a cost function which places more weight on meeting a critical base load than on those non-critical loads exceeding the base load.

1.
Watkins
,
C.
, and
Dayan
,
P.
,
1992
, “
Q-Learning
,”
Machine Learning
,
8
, pp.
279
292
.
2.
Gullapalli
,
V.
,
1990
, “
A Stochastic Reinforcement Learning Algorithm for Learning Real-Valued Functions
,”
Neural Networks
,
3
, pp.
671
692
.
3.
Sutton, R. S., and Barto, A. G., 1998, Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
4.
Bellman, Richard E., 1957, Dynamic Programming, Princeton Univ. Press, Princeton, NJ.
5.
Bertsekas, D. P., and Tsitsiklis, J. N., 1996, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA.
6.
Cybenko, G., Gray, R., and Moizumi, K., 1995, “Q-Learning: A Tutorial and Extensions,” Mathematics of Artificial Neural Networks, Oxford Univ. England, July 1995.
7.
Sheppard, M., Oswald, A., Valenzuela, C., Sullivan, G., and Sotudeh, R., 1993, “Reinforcement Learning in Control,” 9th Int. Conf. on Mathematical and Computer Modeling, Berkeley, CA, July 1993.
8.
Cardinale, J., 1994, “Model of RMSE Photovoltaic Design,” Report, Univ. of Colorado at Boulder, Joint Center for Energy Management.
You do not currently have access to this content.