AI-Portal Artificial Intelligence for Enthusiasts
- Written by Administrator
- Category: Uncategorised
Let''s say we have 2 agents running in a world, each with a set of possible actions ( common for each agent ). Each agent can pick one of the actions ( strategies ) at a time in one round. Depending on his action and the action of the other agent, a reward is given, different for each agent. We can summarize this in the following table:
|Reward||Action 1||Action 2|
|Action 1||3, 3||5, 0|
|Action 2||0, 5||5, 5|
For example if Agent 1 and Agent 2 both pick Action 1 then their rewards are respectively 3 and 3.
From this matrix we can see the following : If both agents use Action 2 then they have maximum reward. If the agents choose Action 1, then they have a good reward but not the best. If they choose differently, one of them maximizes its reward but the other has a zero one.
In this situation we call the pair 5,5 the Nash equilibrium pair because if any of the agents change their strategy independently they cannot get a better reward.
The situation 3,3 is Pareto Efficient because any change of strategy could make an agent go better but could make another agent worse.
The social welfare is the sum of the rewards for a given situation. For example the social welfare for Action 2, Action 2 is 10, which is the biggest value for total received reward.