Citations of my papers can be found at Google Scholar.


  • H. Busch, H. Essafi, M. Geipel, O. Heyer, T. Kloss, M. Tokic. Event-based temporal synchronization. Patent: EP3521792
  • S. Depeweg,  H. Frank, R. Grothmann, F. Rudolph, V. Sterzing,  M. Tokic, S. Vogl. Method for predicting a switching time of a set of signals of signalling facility. Patent: EP3438946.


  • D. Hein, S. Depeweg, M. Tokic, S. Udluft, A. Hentschel, T. A. Runkler, and V. Sterzing. A Benchmark Environment Motivated by Industrial Control Problems. To appear in IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2017. [ http ]
  • D. Hein, S. Udluft, M. Tokic, A. Hentschel, T. Runkler, and V. Sterzing. Batch Reinforcement Learning on the Industrial Benchmark: First Experiences. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2017), pages 4214-4221. IEEE Press. [ http ]


  • D. Hein, A. Hentschel, V. Sterzing, M. Tokic, and S. Udluft. Introduction to the “Industrial Benchmark”. CoRR, arXiv:1610.03793 [cs.LG], pages 1-11. 2016. [ pdf sourcecode ]


  • W. Hauptmann, A. Hentschel, C. Otte, V. Sterzing, M. Tokic, S. Udluft, and H.-G. Zimmermann. ALICE: Autonomes Lernen in komplexen Umgebungen. Siemens AG, Munich, 2015. [ http ]


  • M. Tokic. Reinforcement Learning mit adaptiver Steuerung von Exploration und Exploitation. PhD thesis, Universität Ulm, Institut für Neuroinformatik, 2013. [ http ]
  • M. Tokic. Reinforcement Learning: Psychologische und neurobiologische Aspekte. Künstliche Intelligenz, 27(3):213-219, 2013. [ pdf ]
  • M. Tokic, F. Schwenker, and G. Palm. Meta-learning of exploration/exploitation parameters with replacing eligibility traces. In Partially Supervised Learning, volume 8183 of Lecture Notes in Artificial Intelligence, pages 68-79. Springer Berlin / Heidelberg, 2013. [ pdf ]


  • P. Ertle, M. Tokic, R. Cubek, H. Voos, and D. Söffker. Towards learning of safety knowledge from human demonstrations. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2012), pages 5394-5399, Vilamoura, Algarve, Portugal, 2012. IEEE Press. Nominated (1 of 4) for the “New Technology Foundation Award for Entertainment Robots and Systems”. [  pdf ]
  • M. Tokic, P. Ertle, G. Palm, D. Söffker, and H. Voos. Robust exploration/exploitation trade-offs in safety-critical applications. In Proceedings of the 8th International Symposium on Fault Detection, Supervision and Safety of Technical Processes, pages 660-665, Mexico City, Mexico, Aug. 2012. IFAC. [ pdf ]
  • P. Ertle, M. Tokic, B. Tobias, M. Ebel, H. Voos, and D. Söffker. Conceptual design of a dynamic risk-assessment server for autonomous robots. In Proceedings of the 7th German Conference on Robotics, pages 250-254. VDE Verlag, May 2012. [ pdf ]
  • M. Tokic and G. Palm. Adaptive exploration using stochastic neurons. In A. Villa, W. Duch, P. Érdi, F. Masulli, and G. Palm, editors, Artificial Neural Networks and Machine Learning – ICANN 2012, volume 7553 of Lecture Notes in Computer Science, pages 42-49. Springer Berlin / Heidelberg, 2012. [ pdf ]
  • M. Tokic and G. Palm. Gradient algorithms for exploration/exploitation trade-offs: Global and local variants. In N. Mana, F. Schwenker, and E. Trentin, editors, Artificial Neural Networks in Pattern Recognition, volume 7477 of Lecture Notes in Computer Science, pages 60-71. Springer Berlin / Heidelberg, 2012. [ pdf ]
  • M. Tokic and H. Bou Ammar. Teaching reinforcement learning using a physical robot. In Proceedings of the Workshop on Teaching Machine Learning at the 29th International Conference on Machine Learning, pages 1-4, Edinburgh, UK, 2012. [ pdf ]


  • M. Tokic and G. Palm. Value-difference based exploration: Adaptive control between epsilon-greedy and softmax. In J. Bach and S. Edelkamp, editors, KI 2011: Advances in Artificial Intelligence, volume 7006 of Lecture Notes in Artificial Intelligence, pages 335-346. Springer Berlin / Heidelberg, 2011. The original publication is available at [ pdf ]
  • S. Montresor, J. Kay, M. Tokic, and J. Summerton. Work in progress: Programming in a confined space – a case study in porting modern robot software to an antique platform. In Proceedings of the 41st ASEE/IEEE Frontiers in Education Conference, pages T3H-1-T3H-3, Rapid City, SD, USA, 2011. IEEE Press. [ pdf ]


  • M. Tokic, A. Usadel, J. Fessler, and W. Ertel. On an educational approach to behavior learning for robots. AT&P Journal Plus, 2010(2):103-108, 2010.
  • M. Tokic, A. Usadel, J. Fessler, and W. Ertel. On an educational approach to behavior learning for robots. In Proceedings of the 1st International Conference on Robotics in Education, pages 171-176, Bratislava, Slovak Republic, 2010. Slovak University of Technology in Bratislava. [pdf ]
  • M. Tokic. Adaptive ε-greedy exploration in reinforcement learning based on value differences. In R. Dillmann, J. Beyerer, U. Hanebeck, and T. Schultz, editors, KI 2010: Advances in Artificial Intelligence, volume 6359 of Lecture Notes in Artificial Intelligence, pages 203-210. Springer Berlin / Heidelberg, 2010. [ pdf ]


  • M. Tokic, J. Fessler, and W. Ertel. The crawler, a class room demonstrator for reinforcement learning. In C. Lane and H. Guesgen, editors, Proceedings of the 22th International Florida Artificial Intelligence Research Society Conference FLAIRS’09, pages 160-165, Menlo Park, California, USA, 2009. AAAI Press. [ pdf ]
  • W. Ertel, M. Schneider, R. Cubek, and M. Tokic. The Teaching-Box: A universal robot learning framework. In Proceedings of the 14th International Conference on Advanced Robotics ICAR’09., pages 1-6, 2009. [ pdf ]


  • M. Tokic, W. Ertel, H. Radtke, J. Akmal, and W. Krökel. Reinforcement learning on a simple real walking robot. In Proceedings of the 29th Annual German Conference on Artificial Intelligence, pages 1-3, Bremen, Germany, 2006.