Title: Bridge: a new challenge for AI?
Speaker: Dr. Véronique Ventos, Associate Professor at University of Paris-Saclay (France)
Date & Time: 4pm, 12th December 2017
Room: Eng3.24, Engineering Building, QMUL Mile End campus (building 15 on the campus map)
As usual, refreshments will be served before and after the seminar in the hub. Please register for helping preparing the refreshments.
Games have always been an excellent field of experimentation for the nascent techniques in computer science and in different areas of Artificial Intelligence (AI) including Machine Learning (ML). Despite their complexity, game problems are much easier to understand and to model than real life problems. Systems initially designed for games are then used in the context of real applications. In the last decades, designs of champion-level systems dedicated to a game (game AI) were considered as milestones of computer science and AI.
Go and Poker are the two most recent successes. In May 2017, AlphaGo (DeepMind) defeated by 3 to 0 the Go world champion Ke Jie. In January 2017, the Poker AI Libratus (Carnegie Mellon University) won a heads-up no-limit Texas hold’em poker event against four of the best professional players.
This success has not yet happened with regard to another incomplete information cards game, namely Bridge, which then provides a challenging problem for AI.
We think that Deep Learning (DL) cannot be the only AI future. There are many Machine Learning and more generally AI fields which can interact with DL. Bridge is a great example of an application needing more than black box approaches. The AlphaBridge project is dedicated to the design of a Bridge AI taking up this challenge by using hybrid framework in the field of Artificial Intelligence.
The first part of the webinar is devoted to the presentation of the different aspects of bridge and of various challenges inherent to it. In a second part, we will present our work concerning the optimization of the AI Wbridge5 developed by Yves Costel. This work is based on a recent seed methodology (T. Cazenave, J. Liu and O. Teytaud 2015, 2016) which optimizes the quality of Monte-Carlo simulations and which has been defined and validated in other games. The Wbridge5 version boosted with this method won the World Computer-Bridge Championship twice, in September 2016 and in August 2017. Finally, the last part is about various ongoing works related to the design of a hybrid architecture entirely dedicated to bridge using recent numeric and symbolic Machine Learning modules.
PhD in Artificial Intelligence (Knowledge Representation and Machine Learning) in 1997.
Associate professor at University Paris Saclay, France since 1998. Before joining in 2015 the group A&O in the interplay of Machine Learning and Optimization, she worked in the group LaHDAK (Large-scale Heterogeneous DAta and Knowledge) at Laboratory of Computer Science (LRI).
She started playing bridge in 2004 and is now 59th French woman player out of 48644 players.
In 2015, she set up the AlphaBridge project combining her two passions. AlphaBridge is dedicated to solve the game of bridge by defining a hybrid architecture including recent numeric and symbolic Machine Learning modules.
If you don’t know Bridge and want to know how to learn it: http://www.learn2playbridge.com/
If you want to play Bridge online: https://www.bridgebase.com/
Nick Slaven [LinkedIn] – Head of Technology at Stainless Games
Jeff Rollason [LinkedIn] – CEO and Co-Founder of AI Factory
Date & Time:
4-5:30pm, 28 Nov 2017
Eng3.24, Engineering Building, QMUL Mile End campus (building 15 on the campus map)
Refreshments will be served before and after the panel session.
Speaker: Dr. Bruno Bouzy
, Associate Professor, Laboratory of Informatics Paris Descartes (LIPADE), Université Paris Descartes, Paris, France.
Time and date: 4pm, 17th October 2017 (Tuesday)
Title: Computer Hanabi: Playing Near-Optimally or Learning by Reinforcement ?
Room: Eng3.24, Engineering Building, QMUL
Free registration will be highly appreciated as it will be helpful for booking refreshment.
Refreshment will be served before and after the seminar (3:30pm and 5:05pm) in the hub of Bancroft Road Teaching Rooms, 10, Godward Square, Queen Mary University of London, London E1 4FZ
Hanabi is a multi-player cooperative card game in which a player sees the cards of the other players but not his own cards. The team of players aims at maximizing a score. After a brief presentation of the rules of the game, this talk will describe two sets of experiments. The first one is an exploitation experiment (how to play as well as possible ?) and the second one explores some pros and cons of the reinforcement learning approach.
The first part will describe computer players corresponding to the state-of-the-art in computer Hanabi. Particularly, we will describe players using the hat principle and depth-one search. The hat principle is well-known in recreational mathematics and gives amazing results on the game of Hanabi, resulting in scores that are almost perfect.
In front of this, the new trend about deep learning led us to perform a second set of experiments to build reinforcement learners using neural networks – not necessarily deep – as function approximators. Hanabi being an incomplete information game, the preliminary results with self-play and shallow neural networks show that the game of Hanabi is a hard game to tackle with a learning approach. We will present our results and discuss the features of the game of Hanabi such as the number of players, the number of cards per player, the possibility to play with open cards or not, the problem of learning a convention, that make this game a good opportunity to test many techniques of reinforcement learning: with TD learning or with Q learning, the use of a replay memory or not, the number of layers in the network, and tuning considerations on the gradient descent.
Born in Paris (France), Bruno Bouzy is Assistant Professor of Computer Science in the Department of Mathematics and Computer Science at the Paris Descartes University since 1997, and in the Laboratory of Informatics of PAris DEscartes (LIPADE) since its creation in 2005. His academic degrees include two engineering school diplomas (Ecole Polytechnique 1984, Ecole Nationale Supérieure des Techniques Avancées, 1986), a Ph.D. in Computer Science (1995) and an Habilitation for Research Supervising in Computer Science (2004). Between 1986 and 1991, he held a consulting engineer position with GSI, a leading software advisory company. Bruno Bouzy is the author of the Go playing program Indigo which won three bronze medals at the computer olympiads: two on 19×19 board (2004, 2006), and one on 9×9 board (2005). These achievements resulted from using the Monte-Carlo (MC) approach for the first time in a competitive Go playing program: playing out simulations until the end, and computing action values with the outcomes of the simulations. After these promising results, the Computer Go community adopted the MC approach, and the Monte-carlo Tree Search (MCTS) framework was created in 2006, and became the standard approach for many games. Since 2007, Bruno Bouzy took a step back from Computer Go – all Go playing progams were MCTS based programs – and moved to other interesting and difficult challenges such as Multi-Agent Learning (2008-2010), the game of Amazons (2004-2010), the Voronoi game (2009-2011), Cooperative Path-Finding (2012-now), the game of Hex (2013), the Rubik’s cube (2014), the weak Schur problem (2014-2015), the Pancake problem (2015-2016). Today, the incomplete information games remaining hard obstacles for Artificial Intelligence, Bruno Bouzy works on the game of Hanabi, a cooperative card game. Practically, to obtain results as good as possible in all these domains, Bruno Bouzy uses various methods such as Game Theory, Heuristic Search, MCTS, Neural Networks, Reinforcement Learning, and domain dependent tools as well.