Brian Christian & Tom Griffiths: “Algorithms to Live By” | Talks at Google

Brian Christian & Tom Griffiths: “Algorithms to Live By” | Talks at Google

everybody to one more Authors at Google talk. Today with us, Brian
Christian and Tom Griffiths. Their book, “Algorithms
to Live By,” is now available in all the
fine bookstores in the area and in the US and around
the world and whatnot. Brian Christian is the author
of “The Most Human Human,” a “Wall Street Journal”
bestseller, “New York Times” editor’s choice,
and “New Yorker” favorite book of the year. His writing has appeared in “The
New Yorker,” “The Atlantic,” “Wired,” the “Wall Street
Journal,” “The Guardian,” and “The Paris Review,” as
well as in scientific journals such as “Cognitive Science.” He has been translated
into 11 languages. He lives in San Francisco. Tom Griffiths is a professor
of psychology and cognitive science at UC Berkeley,
where he directs the computational
cognitive science lab. He has published more
than 150 scientific papers on topics ranging from
cognitive science psychology to cultural evolution. He has received awards from the
National Science Foundation, the Sloan Foundation,
the American Psychology Association, and the Psychonomic
Society, among others. He lives in Berkeley. So I’m not going to talk
too much about the book because we have
Peter Norvig here, who will explain
to you why this may be the– we call it the gateway
drug to computer science. Come on, Peter. PETER NORVIG: OK. So you remember Apple
had this campaign that said, think different? And I think we
realize that we all think different in some ways. Right? So we’re all nerds or
whatever you want to call us. And sometimes when you go out
with your friends and family, you realize, oh, I think
differently than them. And I think that’s what
this book is about. Boris talks about it
as a gateway drug, and I think that’s great. That’s a good way to think
about it because we’re coming up now– there’s another
hour of code thing, and we’re going to celebrate. And we’re going to try to
teach everybody to code. But learning how to draw
a rectangle on the screen and figuring out that
the third argument is the color and the first two are
the x- and the y-coordinates, that’s not really it. That doesn’t change
the way you think. I mean, that’s a
good skill, to be able to draw rectangles
and circles on the screen. But really what’s important is
being able to model the world. Jeanette Wing talks about it
as computational thinking. And I think there’s a mix
between different types of thinking. There’s this
computational thinking. There’s a mathematical thinking. There’s a statistical thinking. And we all have
some combinations of all of those things. And those skills
together, I think, are more important than
the details of the API for JavaScript objects. And this book really gets at it. It doesn’t teach
you how to code, but it teaches you how
to think in that way. And it gives you examples to
ponder about your everyday life and why it might be
important to think that way and when you can do with it. So tell us all about
Thank you so much, Boris and Peter, for
the introduction, and thanks to Google for the
invitation to come speak. The talk is “Algorithms
to Live By.” I’m Brian. This is Tom, in case you were
curious which of us is which. We sometimes get that question
if we’ve forgotten to introduce ourselves at the top. And the book opens
with an example that I think will be acutely,
perhaps uncomfortably, familiar to many of us
here in the Bay Area, which is looking for housing. So in a typical consumer
situation, the way you make a choice is you
consider a pool of options. You think hard about which
one you like the best. And then you go with that. But in a sufficiently
crowded real estate market, in a sufficiently competitive
market– which the Bay Area certainly
is– you don’t have the luxury of making the
decision in that way. Rather, the decision
takes the form of evaluating a sequence
of options one at a time, going to a number of
different open houses, and at each point
in time, you must make an irrevocable commitment. You either take the place that
you’re looking at right there on the spot, never knowing what
else might have been out there, or you walk away,
never to return. You have almost no chance
of going back and getting the place. This is certainly a much
more fraught setting for making a decision because
here the critical question that you have to ask yourself
is not which option to pick but how many options
to even consider. Intuitively, we have
this idea that you want to look before you leap. You don’t want to make
a premature choice. You want to get a sense
of what’s out there. But you don’t want
to hold out too long for some kind of perfect
thing that doesn’t exist And let the best
options pass you by. So our intuition
tells us that we have to strike some
kind of balance between getting a feel
for what’s out there and setting a standard and
knowing a good thing when we see it and being
ready to commit. And the story that we get
from mathematics and computer science is that this
notion of balance is, in fact, precisely correct. But our intuition
alone doesn’t tell us what that balance should be. The answer is 37%. If you want the very
best odds of finding the very best
apartment, consider exactly 37% of the pool of
options in front of you. Or alternately, you can think
of it in terms of time– 37% of your time
just calibrating. Leave your checkbook at home. If you’ve given yourself a month
for the search, in this case, that would be 11 days. After that point, be prepared
to immediately commit to the first thing you see
that’s better than what you saw in that first 37%. This is not only the
intuitively-satisfying compromise between
looking and leaping, this is the provably optimal
solution to this problem. So, for example,
if you’re committed to living in one of the painted
ladies here in San Francisco, and they have their open
houses on successive weekends, the algorithm tells you
to look at the first two, and no matter how tempting
they seem, hold tight. And then, starting
with the third one, immediately leap
for the first one better than what you
saw at the first two. Now, more broadly,
this is known as what’s called an optimal
stopping problem. And this structure
of encountering a series of options
and being forced to make a commitment one way or
another– either you’re all in or you walk away–
some people have argued that this is a structure
that describes not only things like the apartment hunt
or real estate in general. It also, many people have
argued, describes dating. You’re in a relationship,
and you have this decision of when to commit. Have you met enough people
to have a sense of who your best match really is? And so you can do some
back-of-the-envelope math and say things like, OK, the
average expected lifespan of an American is 79. 37% of that gives me 29. And this roughly
divides my romantic life into dating for fun
versus dating seriously to really evaluate for a mate. Of course, as we will see, it
all depends on the assumptions that you’re willing
to make about love. TOM GRIFFITHS: So
as you’ve seen, simple algorithms can solve
some of the problems which we actually normally think about
not as problems for computers but as problems for people. So there are a set
of problems which we have to solve just as
a consequence of the fact that our lives are lived in
finite space and finite time– so having to do things like try
and figure out how to organize our house or our closet
or our office in a way to make it most efficient
or trying to figure out how to schedule our
time so that we can do the most possible things. And we normally
think about those as being fundamentally
human problems. But really, the argument
that we make in the book is that they’re not. They have analogs
to the problems that computers have to solve. So, you know, your
computer has to figure out how to manage its space–
its space on the hard disk and its space in memory. And it also has
to figure out how to manage its time– what
it’s going to do next, what program it’s going to run. And as a consequence,
computer scientists have put a lot of thought into
coming up with good algorithms for solving those problems. So what we do in
the book is explore how taking this perspective
gives us insight into the problems that
human beings have to solve, in some cases offering us
nice, elegant, simple solutions to these problems,
like the 37% rule, in other cases giving us new
ways of thinking about how those problems might be
described in mathematical terms and given, perhaps, a kind
of quantitative analysis. So we consider a range of
different problems– so the optimal stopping problem,
which you’ve heard a little bit about, as well as a
variety of other contexts in which thinking
algorithmically actually gives us a new insight into
a human sort of problem. And what we’re going to do
today is talk about just a few of these problems. In the book, we
also talk about some of the more general
principles which go into designing
good algorithms or thinking about the
kinds of underlying ideas that are useful
when we’re trying to engage with the world
in this computational way. So the three problems that
we’re going to focus on today are optimal stopping, the
explore or exploit trade-off, and caching. And optimal stopping
you’ve already heard a little bit about. The 37% rule is
just one instance of a strategy which is useful
for solving an optimal stopping problem. It’s the solution
to a problem which is known in the mathematical
literature as the secretary problem. So the canonical set-up
for this is that you’re trying to hire a secretary. You have a pool of applicants. Each applicant comes
in and is interviewed. You don’t have a way of
evaluating the applicants except against one another. So you can only make a
judgement of how good they are based on the applicants
that you’ve seen so far. And for each person
who comes in, you have to make a
decision immediately as to whether you hire that
person or whether you dismiss them, in which case
you’ll never see them again. So this secretary
problem was first presented to the
public in the 1960s in a column that was
written by Martin Gardner. But in the book, we trace the
history of this problem back, and it turns out that
romance was at its core. So what we actually found by
doing some archival research is that the person who claims
to have originated the problem is a mathematician
called Merrill flood. And the story that
he tells about it is that his daughter,
who had just graduated from high school,
was engaging in a relationship which he and his wife
didn’t really approve of with a much older man. And so he wanted to
somehow convince her that this was a bad idea. So being a
mathematician, the way that he approached
this problem was she turned out to be taking the
minutes at a mathematics conference that Flood was
scheduled to present at. And so he went up, and
he presented this problem of a woman who’s entertaining
a series of suitors and faced with the
challenge of deciding which of those suitors’
proposals she should accept. And he didn’t actually
know the solution to the problem at that point. But he was pretty sure that
the number was larger than one. And so he was hoping that
his daughter would sort of take the message as she
was writing it down, and sort of think about how it
applied to her own situation. So as we’ve seen, the
solution to this problem turns out to be 37%. But as Brian was
saying, a lot of that depends on the
assumptions that you make. And there’s actually been
a history of mathematicians seeking love that provide some
cautionary tales and perhaps some insights into variants
on this problem, which are things that are worth
paying attention to. So one kind of
situation that you can encounter when trying
to pursue this approach is rejection. And we actually found a
story of Michael Trick, who’s an operations researcher
at Carnegie Mellon University. And he told the story of
applying the secretary problem in very much the same way
that Brian was describing– calculating the
period over which he thought he’d be searching,
working out what 37% was. And it happened that the age
at which he should switch from having fun to being serious
was precisely his current age. And so he went and
proposed to his girlfriend, who turned him down. So Trick ran into one
of the assumptions that are being made here,
which is that when you make an offer
to somebody, they should be willing to accept it. And only under
those circumstances is the 37% rule valid. If somebody is potentially
able to reject your offer, then it changes the strategy
that you should follow. And in particular, it means
that the period that you spend looking should be reduced. So, for example, if you have a
50% chance of being rejected, then you should look
at the first 25% of your potential
candidates, and then be willing to make an offer
to the first person who’s better than anybody you
saw in that first 25%. Another variant on
the secretary problem is what’s called recall. And so this is having
the opportunity to go back to somebody
who you passed over. So you can imagine your
candidates come in, but rather than dismissing them meaning
that you’ll never see them again, dismissing
them just decreases the chance that you’ll be
able to go back to them. Or in the dating
setting, basically, breaking up with somebody
means that maybe there’s a chance that they’d
take you back later on. So there’s actually
a mathematician who experienced exactly
this phenomenon. This is the astronomer
Johannes Kepler, who after his first wife died went
through an extended period of courting various
women, trying to sort of evaluate and come up
with the person who he thought was going to be the very
best person for him. And so over an
extended period, he ended up interacting
with 11 women. And having sort of explored
those possibilities, he decided, getting to
the end of that process, that number five, who
he’d previously dismissed, was actually the
best one for him. So he went back, and he
made an offer to her. And it turned out she hadn’t
accepted any other proposals in the meantime. And so luckily, they
were able to be married and had a happy
marriage together. So this possibility– that
you can actually go back to somebody, and they
can still say yes– changes where you should
set your threshold again. So in this case, it
makes it something where you should spend a
little longer looking around. So, for example, if
there’s a 50% chance that somebody who
you go back to is going to be willing to
accept your offer anyway, then you should look
at the first 61%, and then use that to set the
standard which you’ll then use for evaluating the remainder. And then if you get to the
very end of this period and haven’t found
anybody, then you go back and make an
offer to the person who was the very best, which
you now know because you’ve explored all of the options. PETER NORVIG: Tom? So if you had prior
knowledge of the distribution that you’re drawing
from, that should also [INAUDIBLE] the period, right? TOM GRIFFITHS: That’s right. So in this case, we’re
assuming that the only way that you can evaluate people is
relative to one another, right? And so that’s
something which then leads to this kind
of strategy, where you have to spend
some time building up an impression of what the
distribution looks like. So basically, pretty
much any variant on this problem
you can imagine has been explored by mathematicians
in the last 50 years. The case that you’re talking
about, where you have what’s called full information–
you actually know what the
distribution is like– is one where the overall strategy
looks quite different. So in that case, what you
should do is set a threshold. And then as you go
through the pool and you’re sort of remaining
candidates become sparser, that threshold becomes
lower and lower. Some of us might
have experienced this in the context of other
kinds of romantic interactions. But it certainly makes
dire predictions, perhaps, as you age and people
start getting married, that you should be willing
to accept, perhaps, a slightly– set a
slightly lower threshold in terms of the way that
you approach the problem. And there are also
variants which are what are called partial
information versions, where you come in not knowing
what the distribution is but having some information about
what that distribution is. And those turn out to
be some of the sort of most interesting and
mathematically intricate cases. So these examples
illustrate some of the ways in
which the secretary problem can be made
more complicated and can accommodate some of
the other kinds of situations that we might encounter in
realistic romantic scenarios. But this is, by all
means, not the limits of optimal stopping in
terms of its implications for human lives. So basically, any
situation where you have to make a decision
about whether to continue to gather information or
to continue to consider opportunities or to
act on the information that you have so
far is going to be one which is going to fit
this same kind of schema of optimal stopping. So another example is deciding
when to sell an asset. So be it your house
or your company, you have to make a
decision in a situation where you might receive
a series of offers. But in this case,
each of those offers is going to cost
you money, right? You’re going to
be waiting around for somebody to come along
and make another offer. And your goal is to
maximize the amount of money that you get out
of those offers. So this can be formulated as
an optimal stopping problem. You have to decide
for each offer whether you’re going
to take that offer. And there’s a
simple formula that describes the strategy
that you should take for solving this problem. So basically, the strategy
takes the general form of a threshold. You should set a
particular value based on your expectations
of the distribution of offers that you’re going to receive. And then the first offer that
exceeds that value should be the one that you accept. This is more closely
analogous to the situation that you were talking about. So, for example, if you have,
say, a uniform distribution between getting an
offer of $400,000 and an offer of $500,000,
then as a function of the cost per offer, we can
actually work out what this threshold looks like. And so it gives you a very
simple kind of approach that you can take in the context
of trying to decide whether you should evaluate an offer. And it also says that you
should never look back, right? So if you turned down
an offer in the past, that was because the offer
was below your threshold. And as a consequence,
you shouldn’t regret having given up
on that opportunity. You should just
be waiting around for the next offer that
exceeds your threshold. Another example of an
optimal stopping problem, which I think many of
us have encountered, is the problem of figuring out
how to find a parking spot. So in this case, the
kind of ideal scenario that the mathematicians
imagine– although, again, there are lots of
variants– is one where you’re driving
towards a destination, and you kind of have a series
of possibly available parking spots that are along
the side of the road. And your goal is to minimize
the distance that you have to end up walking. So it turns out that the
solution to this problem depends on what’s
called the occupancy rate, the proportion
of those parking spots which are occupied. And so for each occupancy
rate, what we can do is identify at what
point you should switch from driving to
being willing to take the next available parking spot. And so we calculate for you
a nice convenient table. You can cut this out,
stick it on your dashboard. But basically, you can get
some interesting conclusions from this, such as if
10% of the parking spots are free– so a 90%
occupancy rate– then you should start looking
for a parking spot seven spaces away
from your destination. Whereas if 1% of those
parking spots are free, then it’s more like 70. And maybe there’s a
scenario down here which fits to some
parts of San Francisco. So another famous problem which
has been analyzed in this way is one which, unfortunately,
ruins the plot of a lot of heist movies. So this is the
problem of a burglar who has to make a
decision about when to give up a life of crime. So there’s a scene that happens
in a lot of heist movies where the team has to sort
of coax the old thief out of retirement. The thief is kind
of going, well, you know, I kind of
like living in my castle and trying to figure out exactly
what they’re going to do. But to sort of spoil all
of those plots, in fact, there’s an exact
solution, and the thief need only crunch the numbers to
work out what should be done. And if you assume that you have
a thief who for each robbery is going to get, on average,
about the same amount of money and has some probability
that they succeed in executing those robberies,
and if they get caught, they lose everything. So the goal is to maximize
the expected amount of money that you end up with,
then the optimal stopping rule is to stop after
a number of robberies equal to the probability
of success divided by the probability of failure. So if you are a ham-fisted
thief who succeeds about half the time and fails
about half the time, you could pull off one robbery. And then you should just quit
if you got away with it away with it and you’ve been lucky. But if you’re a skilled thief
who succeeds 90% of the time and fails 10% of the time, you
can pull off nine robberies, and then that’s the point at
which you should call it quits and go and live in
the castle and not listen to any of these
guys who come calling. So these sorts of scenarios
might seem a little bit artificial, right? They’re all cases where
you’re kind of forced into a situation where you have
to make a decision about when to stop doing something. But I think there’s
a deeper point here, which is that while we normally
think about decision-making as being something
where we’ve got all the information in front
of us, and it’s just a matter of choosing what
we’re going to pursue, in fact, in most
human decisions, we’re in a scenario which
is much more like one of these optimal
stopping problems, where we have to be
gathering information. And one of the decisions
that we have to make is whether we’ve got
enough information to act. And so I think there’s a
deeper point here about the way that we think about the
structure of human decisions, that often we’re engaging in
exactly this kind of process even though we might
not realize it. BRIAN CHRISTIAN:
The second class of problems that we
consider in the book are ones that we describe as
explore/exploit trade-offs. And this encompasses
a wide range of problems where we get to
make a similar kind of decision over and over and over
in an iterated fashion. And typically,
that takes the form of a tension between doing our
favorite things– the things we know and love– or
trying new things. And this type of a
structure appears in a number of different
domains and in everyday life, restaurants being
the obvious case. Do you go to your
favorite place? Or do you try the new
place that just opened up? It happens in music,
where do you do you listen to your favorite album? Or do you put on the radio? It also describes
our social lives. How much time do you spend with
your close circle of friends, your spouse, your family,
your best friends, versus going out to
lunch with that colleague that you just met and you think
you have something in common and want to kind of foster
a new acquaintanceship? This same trade-off
also occurs in a number of different places in more
societal ways– in business and also in medicine. So in business, this
comes up in the context of ads, which I
hear is something that you guys know a little
bit about here at Google. So when you’re deciding what
ad to run next to a keyword, for example, you’ve got
this tension between there is some ad that historically
has the best track record of getting the clicks. But there’s also this
ever-changing dynamic pool of new ads that are
entering the system that might be worth trying. They might be better. They probably aren’t,
but they could be. In medicine, this
structure describes the way that clinical trials work,
where for any condition, there is some known best
treatment with some known chance of success and
some known drawbacks. And then there’s
a series– there’s kind of a pipeline of these
experimental treatments that, again, might be
better, might be worse. And so we have to trade off
between exploring– that is, spending our energy
gathering new information– and exploiting, which is
leveraging the information that we’ve gathered so far
to get a known good outcome. The canonical
explore/exploit problem that appears in the
computer science literature is known as the
multi-armed bandit problem. And this colorful
name– I’m sure this is familiar to
some of you in the room. But the name comes from the
moniker of the one-armed bandit for a slot machine. So a multi-armed bandit you
can think of as just a roomful of different slot machines. So the basic setup
of the formal problem is this– you walk
into a casino. There’s all sorts of different
slot machines, each of which pays off with some probability. But every machine is different. Some of them are more
lucrative than others. But there’s no way for you
to know that until you just start pulling the
levers and seeing which ones seem more promising. So again, here’s
this case where you have to trade off
between gathering information and leveraging
the information that you have. And so there’s this
question, which vexed an entire generation
of mathematicians, of, OK. Let’s say you walk into the
casino, and you have 100 pulls. You’re there for an afternoon. You have enough
time for 100 pulls. What strategy is going to give
you the highest expected payout before you leave the casino? In fact, for much
of the 20th century, this was considered unsolvable. And during World War II, Allied
mathematicians in Britain joked about dropping the
multi-armed bandit problem over Germany as the
ultimate instrument of intellectual
sabotage to just waste the brainpower of the
German scientists. And I think one of the
simplest explanations of the way in which this problem
can be very tricky to think about is the following choice. So let’s say you’ve played
one machine 15 times. And nine times, it paid out. Six times it did not. Another machine
you’ve played twice. Once it paid out. Once it didn’t. Now, if we just want to very
straightforwardly compute the expected value of
each of these machines, the nine-and-six machine’s
got a payout rate of 60%. The one-and-one machine
has a payout rate of 50%. And so there are these two kind
of competing intuitions here. One is, well, you
should obviously just do the thing with
the better expected value. The other is, well, there’s
a sense in which we just don’t know enough about
the second machine to walk away from it forever. Certainly, it must be worth one
more pull or two more pulls. How do you decide what
that threshold is? And it turns out this is,
in a way, a trick question because it all depends on
something that we haven’t given you in this description
of the problem yet, which is how long you
plan to be in the casino. So this is a concept
that sometimes gets referred to as the horizon,
we refer to as the interval. And this concept has,
just speaking personally, given me a bit of
an axe to grind with one of my favorite films
from my own childhood, which is the inspirational 1980s Robin
Williams movie, “Dead Poets Society.” It’s one of these
really feel-good movies, and he plays this
inspiring poetry teacher who says to his students
in this rousing monologue, “Seize the day, boys. Make your lives extraordinary.” And going back through the
lens of the multi-armed bandit problem, I can’t help feeling
that Robin Williams is actually giving two conflicting
pieces of advice here. If we’re just trying
to seize the day, we probably want to pull
that nine-six because it’s got the higher expected value. But if we want to make
our life extraordinary, then we should
certainly see if there isn’t some value in trying
these new things because we can always go back. In standard American English,
we have all of these idioms like “eat, drink, and be
merry, for tomorrow we die.” But it feels that we’re missing
the idioms on the explore side of the equation, which are
things like, life is long, so learn a new language and
reach out to that new colleague because who knows what could
blossom over many years time. We’re still honing
the messaging, but it does feel like there’s
a gap in the culture that can be filled here. So when you’re working
with a finite interval, the solution to the
multi-armed bandit problem comes from a method
described by Richard Bellman, dynamic programming,
where you basically work backwards and are
able to compute the expected value of every
pull given all of the possible pulls that you could make as a
result of whether that succeeds or fails. And you can actually work
out the expected value all the way back to walking in
the door of the casino, what should you do? And this provides
an exact solution to the multi-armed
bandit problem, but there’s a catch, which is
that it requires that you know exactly how long you’re
going to be there and exactly how many
machines there are. And it also requires doing a
lot of computation up front. But I think the
critical thing is that we’re able to look
at these solutions, these exact solutions, and get
some broader principles out of it. So, for example, the
value of exploration is greatest the minute you
walk through the door for two reasons. The first is that if you think
about it in the restaurant analogy, you just
moved to Mountain View. You go out to eat that night. The first place you
try is guaranteed to be the best
restaurant you’ve ever experienced in Mountain View. The next night, you
try a different place. It has a 50% chance
of being the best restaurant you’ve ever
seen in Mountain View, and so on and so forth. So the likelihood
that something new is better than what
we already know about goes down as we gain experience. The second reason
that exploring is more valuable at the
beginning of an interval is that when we
make that discovery, we have more chances to go back. So discovering a really
amazing restaurant on your last night in town is
actually a little bit tragic. It would have been great
to find that sooner. And so this gives us
this general intuition that as we perceive ourselves
to be on some interval of time, we should kind of
front-load our exploration and weight our
exploitation to the end, when we both have the most
experience with what to exploit and the least time remaining
to discover and enjoy something new, even if we did find it. This is significant, I
think, because it gives us a new way of thinking about
the arc of a human lifespan. And so ideas from the
explore/exploit trade-off are now influencing
psychologists and changing the way we think about
both infancy and old age. So to demonstrate
infancy, we have a picture of a baby eating a power cord. And I think this
demonstrates what a lot of us kind of
culturally intuitively think of as the
irrationality of babies. They’re totally– have the
attention span of a goldfish. They put everything
in their mouth. They’re distracted
really easily. They’re really bad at just
generally doing things. We give the example in a
book– like a baby gazelle is expected to be
able to run away from a wolf within the
first day of being alive. But, you know, it
takes us 18 years before we’re allowed
to get behind the wheel of a car, that kind of thing. And the psychologist
Alison Gopnik uses ideas from the explore or
exploit trade-off as a way of saying, well,
maybe this extended period of dependency is actually
optimal in some sense because if you’re in the
casino, for the first 18 years that you’re in the casino,
someone else is buying your food and paying your rent. And so you don’t need to be
getting those early payouts to buy your lunch. And so you can really use that
period of time to explore, which is exactly what
you should be doing at the beginning of your life. There’s a sense in
which just putting every item in the
house in your mouth at least just once
sort of resembles walking into the casino and
just pulling all of the levers. Similarly, the idea
of exploitation is changing the way we
think about getting older. So here we have a gentleman
who I like think of, imagine it as enjoying the same
lunchtime restaurant that he’s been to
hundreds of times. And he knows exactly
what he’s going to get and exactly what he
likes, and it’s great. There’s a lot of
psychological data that says that as
we go through life, older folks have a smaller
circle of social connections. They spend their time
with fewer people. And there’s one interpretation
of this that just says, well, they’re lonely, or they’re
detached or disinterested, or it’s just kind
of sad to get older. But thinking about it from the
perspective of exploitation gives a totally different story. And this comes up in the work of
the Stanford psychologist Laura Carstensen, who studies aging in
an attempt to sort of overturn some of the prejudices we have. And so, for example,
one of the intuitions you get from the idea of the
explore or exploit trade-off is that towards the
end of your life, you really should
be spending more of your time doing the things
you already know and love both because it’s unlikely
to make a discovery that’s better than the things you
already really care about and also because
there’s less time to enjoy it should you do that. And so it just makes more sense
to spend more of your energy on the things that you
already know and love. And so as a result,
what I think is actually a very encouraging story here
is that as you spend more time in the casino, your average
payouts per unit of time should go up. So there’s a sense in
which we should actually expect to get steadily
happier as we go through life. We are less disappointed
and less stressed out. And her research supports this,
which I think is really lovely. Now, there are many cases
where we don’t necessarily know where we are on
the interval of time, or it doesn’t
necessarily make sense to think of there being
some finite interval. So maybe you move
to Mountain View, and you don’t know
how long you’re going to live in Mountain View. Or if you’re a
company, you imagine yourself being interested in
being around indefinitely. But nonetheless,
there’s still a sense in which you care about the
present more than the future. This framing of the problem
led to a different series of breakthroughs, starting
with an Oxford mathematician named John Gittins,
who was hired by the pharmaceutical
company Unilever to tell them, basically,
how much of their money to invest in R&D. And so
he frames the– he almost accidentally made this enormous
breakthrough in the problem by thinking of it
not as there being some finite interval of
time but as there being some indefinite future that
is geometrically discounted. So I don’t know
how long I’m going to be living in the Bay Area. But a really good meal
next week is maybe only 90% as valuable as a really
good meal this week. Or making X dollars
next quarter is only 90% as valuable as making
those dollars this quarter. And it turns out
that, again, you can get a very precise
solution to this problem. So he explored it in
this business context. But it’s also, I think,
very interesting that it was a pharmaceutical context,
because this also gives us a way of thinking
differently about how a clinical trial should be run. The basic idea behind
Gittins’s breakthrough here, which is called
the Gittins index, is he imagines that– the
word we use is a bribe. So if you think about– there’s
a game show that is on TV called “Deal or No
Deal,” where you have a briefcase that
has somewhere between one and a million dollars in it. And someone calls you
on the phone and says, I will pay you $10,000 not
to open that briefcase. What do you do? This is the basic intuition
behind the Gittins index. So Gittins says,
for every machine that we’ve tried and have
incomplete knowledge of– or maybe we have no
knowledge of whatsoever– there’s some machine with
a guaranteed payout that’s so good, we’ll never try
that machine ever again. Maybe the nine-six versus the
one-one is more of a toss-up. But if it was 9-0
and the one-one, then there’s a sense
in which maybe it’s just never worth pulling the
one-one machine ever again. And so Gittins comes
up with what he thinks is a nice approximation,
which is just always play the machine with
the highest bribe price. And to his own astonishment,
as well as the field’s, this turns out to be, in fact, not
merely a good approximation but the solution. So this is another case where,
like with the parking problem, we present this
table in the book, and you can cut it out
and take it home with you. And it provides
values for situations where you’re trying to
weigh the value between two different options. And so going back to our
two slot machines, the nine and six machine has a
Gittins index of 6,300. But the one-and-one machine
has a Gittins index of 6,346. So case closed. Pull the one-one
machine one more time. What I think is kind of
philosophically significant about this is the
zero-zero square has a value of 70.29%, which
means that you should consider something you’ve never tried as
being just as good as something that you know works
70% of the time, even though it only has
an expected value of 50%. And you can see if you follow
the diagonal down to the right, it goes from 70% to
63.46% to 60.10%, 58.09%. And it does indeed converge
on 0.5 as you gain experience. But there’s this boost applied
for not having that experience. So we tongue-in-cheek suggest
that you just print this out and just use it to
decide where to eat. But there are
sometimes some problems that come up with this. One is that we don’t always
geometrically discount our payoffs. The other is that
actually computing these values on the fly is kind
of computationally intensive. And so there’s a third way of
thinking about the problem that has come in the wake
of Gittins’s work, and that is the idea
of minimizing regret. So we give a lot of
examples that touch on Google’s work, but the
best illustration of this actually comes from Amazon. So Jeff Bezos talks about being
in this really lucrative hedge fund position and deciding
whether to give up his cushy job to start
an online bookstore. And he approaches it from what
he calls a regret minimization framework. “I knew looking back
I wouldn’t regret it.” In the context of the
multi-armed bandit problem, you can formulate
regret as every time you tried something that wasn’t
the best thing in hindsight. And so you can ask yourself,
how does my regret– what does that look like as I proceed
through my time in the casino? And we have good
news and bad news. We’ll start with the bad news. The bad news is that you will
never stop getting more regrets as you go through life. The good news is that the rate
at which you add new regrets goes down over time. Specifically, if you were
following an optimal algorithm, the rate at which
you add new regrets is logarithmic with
respect to time. And this has led to a series
of breakthroughs in computer scientists looking for simpler
solutions than the Gittins index that still have this
optimality of minimal regret. One of our favorites, which
I think is the most thematic, is called upper
confidence bound. And this says that for every
slot machine– you know, you’ve got some error
bars around what you think the payoff might be. So the expected value would be
in the middle of that range. But there’s some error bars
on either side of that. Upper confidence
bound says, simply, always do the thing with the
highest top of the range. Don’t care about the
actual expected value, and don’t care about
the worst case scenario. Just always do the thing with
the highest top of the range. And I think that’s sort
of a lovely, lyrical note that the math brings us to,
which is that optimism is the best prevention for regret. TOM GRIFFITHS: So
our third example begins in a different place,
which is in the closet. So I think all of us have
encountered a problem of an overflowing closet. Things need to be
organized, but you also need to make a
decision about what you’re going to get rid of. And in order to
solve this problem, we’d like to be able
to turn to experts. Fortunately, there are experts
on exactly this kind of thing. So we could consult one of
these– Martha Stewart, who says to ask yourself a
series of questions– how long have I had it? Does it still function? Is it a duplicate of
something I already own? And when was the last
time I wore it or used it? And then based on the
answers to these questions, you can make a decision
about whether you should keep that thing or not,
give it away to charity, and, as a consequence, end up
with a more organized closet. So there are a couple of
interesting observations here. The first is that here there
are in fact, multiple questions that you should be answering. And the answers
to these questions could be quite different. And the other is
that, in fact, there’s another group of experts who
have thought about exactly these problems and come up
with slightly different advice. In particular, they discovered
that one of these questions is, in fact, far better
than any of the others. So this other group of experts
don’t think about closets. They think about the
memory of computers. This is the picture of
the Atlas computer, which was a computer which
was built at Manchester University in the 1960s. And Atlas had an
interesting structure, where it had two kinds of memory. It had a drum which could
store information in a way where it was very
slow to access. And then it also
had a set of sort of magnets which could be
used to store information in a format which was
relatively fast to access. And when they first
built the machine, the way that they were using it
was to read off the information would be needed for a
computation from the drum, and then store it in
the magnetic memory, and then do the operations, and
then write it back to the drum, and then take the next
part of the computation and read off all of the relevant
information from the drum, and then do the operations,
then write it back to the drum. But a mathematician who
was working on Atlas named Maurice Wilkes realized
that there was a better way to solve this problem. He realized that they
could make the whole system work much faster if
they didn’t always take all of the information
out of the fast memory and put it back into
the slow memory, but rather they kept around
the pieces of information which they thought
they were going to need to use again in the future. So the reason why
this speeds things up is that then you
don’t have to spend the extra time reading
those things back off the slow memory. And as a consequence, the
computer runs much faster. So this is an idea which
computer scientists now recognize as caching. So it’s the idea of keeping
the information which you’re most likely to need in the
future in the part of memory which is most easily and
most rapidly accessed. But it brings with
it another kind of algorithmic problem, which
is the problem of figuring out exactly what those
items that you’re likely to need in the future
are going to be or, to put it another way, what it is
that you should throw away. And this is a
problem that’s called the problem of cache eviction. So cache eviction
is something which requires us coming up
with good algorithms for deciding what
we’re not going to need again in the future. And the person who really
made the first breakthrough in thinking about this is
this man, Laszlo Belady, who was working at IBM. So before he worked at IBM,
Belady had grown up in Hungary, and then fled during
the revolution there with only a bag containing
one change of underpants and his thesis paper. And then he ended up
having to then emigrate from Germany, which is
where he moved to, again with very minimal equipment. He just had $100 and his wife. So by the time he’d
reached IBM in the 1960s, he’d built up a
significant amount of experience in deciding
what it was that it made sense to leave behind. So Belady described
the optimal algorithm for solving the problem
of deciding what you should evict from your cache. And this optimal algorithm
is essentially clairvoyance. What you should do is evict
from the cache that piece of information which
you are going to need the furthest into the future. So as long as you can
see into the future as far as you need to go in
order to make that decision, then you can solve this problem. You can sort of do the
best possible solution to this problem that
you can imagine. Unfortunately,
when engineers have tried to implement
this algorithm, they’ve run into problems. And so for mere mortals, we need
to have some different kinds of algorithms. And Belady actually evaluated
three different kinds of algorithms– one where
you just randomly evict items from the cache, one where the
things which were first entered into the cache are the
ones which first leave it, and one where you evict
those things which are least recently used. That is, those items which have
been used the furthest distance into the past are the ones
which leave the cache first. And doing an
empirical evaluation of these different
schemes, he discovered that there was one clear winner,
which is the least recently used algorithm. So basically, the idea
is that the information that you used least
recently is least likely to be the
information which you’re going to need again in the
future as a consequence of a principle that
computer scientists call temporal locality– basically
that if you just touched a piece of information, you’re
going to be likely to need that information again
in the near future because there’s a kind
of correlation over time in the pieces of information
which an algorithm might need to access. So taking this insight, you
can build caching systems which work very
efficiently in a variety of different situations. So nowadays, caches are
used all over the place. So if we look on
computers, we find that you’ll see multiple chips
that are dedicated to caching. There are caches that are
built into hard disks. There are other
kinds of caches which are used in servers
for delivering websites to people as quickly and
efficiently as possible. But the one place where these
caching algorithms perhaps haven’t been applied and perhaps
should is back in our closet. And if we look at these
ideas that Martha provides us with how to organize
our possessions, then as we go through
these possibilities, it’s clear that
one of them might be a better recommendation
than the others, which is when was the last time
I wore it or used it, which is actually an
instantiation of the least recently used principle. So next time you’re
thinking about trying to organize your closet,
it might be worth keeping this in mind, that as
long as your possessions obey the same kind of principle
of temporal locality, focusing on those possessions
that were least recently used might be the most
predictive of those things that you’re least likely to
need again in the future. So this kind of principle
isn’t just something which is useful in
thinking about how to organize your closet. It’s also something
that might be useful in thinking about
how to organize your office. And this was a discovery that
was made somewhat accidentally by a Japanese economist
called Yukio Noguchi. Co Noguchi is a tax economist. He was constantly
receiving reports and papers and documents
that needed to be filed away. And he was kind of overwhelmed
with all of this information. He didn’t have time
to file it properly, so he came up with
a simple solution, which was just to put all
of those papers into a box. But he didn’t just
dump them into a box. What he did was
actually put them into a box in a
very orderly way. So basically, he had a box
which was sort of horizontally aligned. And he put the information into
the box at one end of the box. So as he’d get some
new papers, he’d put those papers in
at the left-hand side. And as a consequence,
you know, the papers would sort of move down the
box as new things came in. And then he did
something else important, which is that as he used
one of those papers, he’d pull it out of the box. And then when he was
finished using it, he’d put it back in again at
the left-hand side of the box. So this has a clear connection
to least recently used caching, right? Once your box fills up, you
need to get rid of something. And you can get rid
of the things that hit the right end of
the box because those are the things which you
used furthest in the past and consequently, least likely
to need in the near future. But this principle
of taking out a file and then putting it back at
the left-hand side of the box also corresponds to
another idea which has shown up in theoretical
computer science. So we can actually show
that this way of organizing his information
is something which is near optimal, or
at least as close to optimal as we’re
likely to be able to get. So the actual data structure
that he had created here is something that a computer
scientist would recognize as a self-organizing list. So basically, in a
self-organizing list, you have a sequence of
pieces of information. And then as you access
those pieces of information, you have the opportunity
to change the order that those pieces of
information appear in. And so this idea of taking the
information that’s accessed and then putting it at the
very front of the list, or at the left-hand
side of Noguchi’s box, is actually something
which turns out to be a very
effective algorithm. So Robert [? Tagin ?]
and Daniel Slater proved in 1985 that moving
the most recent item to the front of the list
is, at worst, twice as bad as clairvoyance. So clairvoyance is the best
that you could possibly do. And it turns out, this
is the only algorithm that comes with a
theoretical guarantee of being at least
close to clairvoyance in multiplicative terms. So if you’re thinking about
implementing the Noguchi filing system in your
office, it might be reassuring to realize that
perhaps you already have. So we normally think
about a messy office as being a bad thing. And in particular, a
giant pile of papers on your desk like
this is something which kind of seems like a
poor method of organizing information. But you can kind
of think about this as taking the Noguchi system,
and then literally turning it on its side. Right? So a big pile of papers is, in
fact, a self-organizing list. And if you’re taking
things out of the pile and then sticking
them back on the top and putting the most
recently used items on the top of the
pile, then you’re implementing exactly this
relatively optimal strategy for organizing the information
that’s contained in that list. So if you’re somebody who
is familiar with these kinds of messes, you’ll
also be reassured by some of the message in
our sorting chapter, where we argue against sorting in
many domestic situations. Thinking about
caching isn’t just useful for thinking about
how to organize information around you. It’s also something which might
give you new insights into how human memory works. So I think there’s an
intuition that a lot of us would have about
memory, which is kind of thinking
about it as something sort of like Noguchi’s box. Right? You’ve got kind of a limited
amount of space in your memory. And so when you
learn something new, you have to put it in there. And when you do that, maybe
it pops something else out. And as a consequence,
you forget that thing. And so forgetting is
just a consequence of kind of hitting a
capacity limit in terms of the amount of information
that we can store. But a cognitive scientist called
John Anderson has actually proposed that there’s a
different way of thinking about how memory works and
thinking that the analogy is less like a box of finite
size and more like a library of infinite size. So if you have a library which
has infinitely many books arrayed along a sort of
linear shelf like this, then the problem
that you have is one not of figuring out
where to fit information but rather one of figuring out
how to organize information. Another good analog of this
is thinking about something like web search, where you’ve
got a whole lot of web pages, and you want to organize
those web pages in such a way that you can find the
things that people are likely to be looking for
with high probability close to the top of the
list that you produce. So from this perspective,
forgetting something isn’t that you’ve sort of
had that thing sort of pop out of your memory but
rather that the time it would take you to find
that thing is greater than the amount of time that
you’re willing to spend, that the way that
information is organized is not sufficiently
good to allow you to identify that item quickly. And so this makes an
interesting prediction, which is that we
should be trying to organize those
items in memory such that the things that we
think we’re most likely to need in the future are going
to be the things which we find easiest to recall. And Anderson actually showed
some evidence for this. So he took some famous data. This is data from
an experiment which was done in the 19th century
by an early psychologist, Ebbinghaus, who
basically taught himself some lists of
nonsense syllables, and then would look at how well
he could recall those lists some number of hours
after he’d memorized them. And so what Anderson
did was look at whether this pattern of
recall, where, basically, you get this kind of rapid
falloff followed by a slow, sort of long tail
could be predicted by looking at how
likely it is that we’re going to encounter particular
pieces of information in our environment. And so what he did was go
to the “New York Times,” look at headlines in
the “New York Times,” and then look at how
likely a word was to appear in the headline
in the “New York Times” as a function of how
long ago it previously appeared in those headlines. So he could look at the
relationship between the number of days that had elapsed,
and then the probability of an item appearing. And he found that this showed
a very similar kind of curve. So this kind of correspondence
between the probability that something is
likely to be needed as a function of how
long ago it was used and the patterns of recall
that we see in human memory provides one suggestive
form of evidence, that one of the things
that our minds are doing is trying to organize
information in such a way that we are keeping
around those things that we are likely to
need again in the future. BRIAN CHRISTIAN:
Thinking computationally about the types of problems that
we encounter in everyday life has payoffs at a number
of different scales. The overarching
argument of the book is there are deep parallels
between the problems that we face in our lives and the
ones that are considered some of the fundamental and canonical
problems in mathematics and computer science. And this is significant
because as a result, there are these simple,
optimal strategies that are directly relevant to those
domains in our own lives. And there’s something
very concrete that we can use and learn from. And even if we’re not literally
going to stop at exactly, you know, 37% or so
forth, having a vocabulary and knowing what
an optimal stopping or an explore or exploit
problem looks like and having a general sense
of how the solution is structured gives us a way to
bolster our own intuitions when we find ourselves
in those situations. And I think most broadly,
a lot of the solutions that we explore
don’t necessarily look like what we think of when
we think of what computers do, which gives us an opportunity
to actually rethink our notion of
rationality itself. So intuitively, we kind of have
this bias that being rational means being exhaustive, exact,
deterministic, considering everything, getting
an exact answer that’s correct 100% of the time. In fact, this is
not what computers do when they’re up against the
hardest classes of problems. This is kind of the
luxury of an easy problem. And up against an
intractable problem, computer scientists turn
to a totally different set of techniques. When we take into account the
cost, the labor of thought itself, the best strategies may
not be to consider everything, to think indefinitely, to
always get the right answer in each situation. We may want to, in fact,
trade off the labor of the computation versus
the quality of the result. And as the book progresses,
especially in the second half, we look at what
these strategies are for dealing with
intractable problems, which most of the ones that we
face in real life are. And that leads to
this conclusion, that what computer
scientists do up against the hardest
classes of problems is they use approximations. They trade off
the costs of error against the costs of delay. They relax some of the
constraints of the problem, and they turn to chance. These aren’t the
concessions that we make when we can’t be rational. These are what being
rational means. We explore this line of
thinking through a number of different domains. We’ve talked about three today. We also look at
sorting algorithms– what do they tell you about
how to arrange your bookshelf and, more importantly,
whether you should? We look at scheduling theory. Every operating
system has a scheduler that tells the CPU what to
be doing when, for how long, and what to do next. So we look at what the
parallels are there for thinking about time
management in our own lives. And in the context
of Bayes’s rule, we think about problems of
predicting the future– how long a process will go on, how
much money something will make based on what it’s made so far. And, on a personal
level, you’ve been dating someone for a couple months. It’s going pretty well so far. Is it premature to
book that weekend place in Tahoe at the
end of the summer? What should you do? And we provide some rational
answers there, as well. And unlike the types
of advice that you might find in, for
example, self-help books, the insights that are derived
from thinking computationally about these problems
are backed by proofs. Thank you so much. [APPLAUSE] BORIS DEBIC: Questions? BRIAN CHRISTIAN: Do you want
to get more in the light? Yeah. AUDIENCE: So how useful is it to
have a proof of an abstraction when the real life doesn’t
match the abstraction? TOM GRIFFITHS: Yeah. So I think the
way that we really think about this is
there are definitely simplifications that go into
formulating these problems. But what you get out
of them is if you’re in exactly that
situation, you know exactly what you should do. But more generally,
you get insights about how those solutions change
as a consequence of changing the assumptions. So, for example, I talked
about some of the variants on the secretary problem. And from that, you might
not know exactly what the number is in your scenario. But you can recognize, OK. Well, if it becomes
more permissive, then I should be more
willing to spend longer looking before I start leaping. And if it becomes
less permissive, than I should have a much
more rapid transition from those things. And I think there’s a
related question here, which is if we think that
people should follow algorithms, why don’t we just
make computers that will solve these
problems for people, and then people don’t have
to make any decisions at all? And I think one of
the things that people are really good
at is figuring out how to interpolate between
these possibilities and how to kind of evaluate
some of the fuzziness around the particular
problems that we’re facing. And so we’re really
providing tools that can guide those
human capacities in terms of thinking about solutions
to more realistic problems. AUDIENCE: Very
interesting topic. How long it take you
guys to write this book? BRIAN CHRISTIAN:
There’s a quote that we use at the beginning
of the book, of the chapter on scheduling. We have, the
epigraph says– it’s from Eugene Lawler, who was
a researcher in scheduling theory. And he says, “why don’t we write
a book on scheduling theory, I asked. It shouldn’t take much time. Book writing, like
warmaking, often entails grave miscalculations. 15 years later, scheduling
is still unfinished.” So I think Tom and I had
an analogous experience, where we– I think of the book
as really emerging in a dinner that Tom and I had in
2011, where we– I mean, we’ve known each
other for 11 years. These are shared
obsessions that we’ve talked about for a long time. But we basically
realized that we wanted to write the same book,
what amounted to the same book. And so that was when we decided
to team up and work together. And we had what in hindsight
was a sort of predictably naive sense of like,
oh, it should take about 18 months or something. It’ll be out in 2013. So here we are. And you can tell how
that plan worked out. TOM GRIFFITHS: I
should also point out another terrible
thing that can happen to a book is having
a small child. So my wife and I
had our second child somewhere in that process,
which threw everything off. AUDIENCE: Are there any
rejected chapters in this book? Are there any
areas of life where you looked hard for an analog,
and it didn’t work out? TOM GRIFFITHS: Yeah,
that’s interesting. So our rejected
chapters are more that there are lots and
lots of algorithmic ideas that we would love
to write about, but we just ran out of
space and time to include. So we’ve actually
got a folder which is called Sequel, which
is the only way that we were happy cutting things,
so we could pretend. And so that contains a bunch
of chapters and proto-chapters that explored a lot of
different kinds of algorithms. And in some cases, it was
that there’s algorithms that have already got good press. And in other cases,
it was that we really felt like there was a key
insight that those things could give you about human lives. But either you had to get
too far into the weeds in order to get out what that
insight was or it was something where– like, basically,
the examples we give in the book have been
sort of carefully selected for exactly
the right blend of you need to understand a
certain amount of math in order to get a
practical payoff. And some things just didn’t
reach that threshold. BRIAN CHRISTIAN: We
also had a beta version of a chapter on data structures
that had a lot of great stuff that we still kind of
wish we could have saved. But it just didn’t fit under
the rubric of the book. So that, I guess, lives in
the Sequel folder now, too. AUDIENCE: Do you feel
like you learned anything about the limits or lack thereof
of what machine intelligences could be capable of? It seems like a lot
of these examples right now involve pretty
specific parameters for what the problem has to be,
but also that there’s just a ton of similarity between
how humans actually do things and how computers
end up doing them. BRIAN CHRISTIAN:
This is one where I think Tom can speak as a
machine learning researcher. But I would just
say, as a preamble, I think exactly how
you’ve formulated the question is how
I think of it, which is that a lot of the
algorithms that we discuss involve assumptions about
the distribution of values. The Gittins index, the
way that we present it, assumes a uniform distribution,
or the house-selling problem assumes uniform distribution. And often, the hardest
versions of these problems are what are called partial
information problems, where you have to
be making inferences about what you think
the distribution looks like on the fly based on the
values that you’ve encountered. And those often turn out to
be the intractable problems. And so there is a sense in which
human intelligence involves making all of these
inferential leaps, but also deciding
how to parameterize the problem in the first place. AUDIENCE: Yeah. I think this also relates
to Peter’s question. So I think as you get closer
to the actual problems that human beings solve,
you sort of start to hit the edges of the theory. Another good example of this
is in the explore or exploit setting, where– so if you
get people and put them in an explore or
exploit scenario, you find pretty consistently
that people deviate from the predictions
that come out of these sort of standard
multi-armed bandit models. And they deviate in a
particular direction, which is that they
tend to over-explore. So kind of at the point where
the algorithm is committed and said, hey, just
keep on pulling that one lever, people
are still like, well, I’m going to go try that one. I’m going to try that one. I’m going to try that
one, and still trying out different options. And so that looks
mysterious until you realize that it’s not the
people are being irrational. It’s that, in fact, the
assumptions of the model are kind of wrong for
a real human situation. So the assumption is that
a slot machine pays off with the same probability. That’s sort of fixed over time. Whereas in a human environment,
the payoff probabilities for different options are
things that change over time. Right? So they might be sort of
slowly drifting around. A particular restaurant is
getting better or worse, or a chef has changed, or
it’s under new management, or– all of those
things are things that over time mean that those
probabilities can be different. And so if you’re in a
situation like that, then what you should
be doing is, in fact, exploring more because you
need to go back and check on the information that you had. But that problem is what’s
called a restless bandit problem, and it’s one
which still presents a challenge for developing
sort of good machine learning methods that can deal with it. So I think there’s plenty
of room to take inspiration from human cognition, as well as
room for making recommendations about humankind. BORIS DEBIC: And with
that, let’s please thank Tom and Michael
for coming to Google. BRIAN CHRISTIAN:
Thank you so much.

Only registered users can comment.

  1. Can this be classified as a religion? Or can religion be classified as an algorithm? This book seems like a very individual-focused religion in that case…

  2. How can you calculate the probability of being caught as a robber if your probability of getting caught is going to be 0 until you're caught? For example, if you have 3 successful robberies, then by looking at previous outcomes (3 out of 3 robberies being successful), you'll conclude that the probability of a successful robbery is 1.0 or 100%.

  3. One problem with implementing Optimal stopping in real might probably be, it is not worth it go to the extent of 37% in terms of time and effort for the payoff of that particular problem.. and it is worth it in some other problems to go beyond because of it's payoff.. or I haven't understood the underlying mathematical structure properly..

  4. I am wondering about how I could use these algorithms to be most efficient and accurate when evaluating students essay on English literature.

  5. can you do an algorithm on where ISIS would attack next in Europe so I know which train station to avoid ??

  6. 40:00 What about FOMO, or fear of missing out, where instead of dwelling on missed (unseized) opportunities in the past, you dwell on missed (unevaluated) opportunities in the future?

    With the cost to "play the slot machine," or message another potential date online, growing ever smaller in today's more connected society, FOMO seems to be on the rise.

    Susceptibility to regret versus FOMO likely varies person to person depending on wide ranging factors like prior luck, patience, faith in the system, and your personal utility function (i.e. is it best-or-nothing for you, or would you be just as happy with any outcome above a threshold value). I know that I personally am more prone to regret than FOMO, so like Bezos, this is what I tend to minimize for, but the same may not necessarily be good advice for others.

  7. It's surprising to find a so-called scientist getting the theory of Optimal Stopping so fundamentally wrong. At about 6 minutes in he says that optimal stopping suggests spending 37% of your time looking for candidates is the best way to find candidates. What the theory actually says is if you interview n divided by e candidates then select the next one that is better than all those you've interviewed so far, you'll pick the best candidate on 37% of occasions. Where n is the total number of candidates and e is the base of the natural logarithm. Not nearly the same thing as "spending 37% of your time."

    TL;DR: The time spent is variable. The percentage success (37%) is what is constant.

  8. Friend, Front end "exploration" / Back end "exploitation", seems to describe Mormon Batch Dating & single / drawn out Victorian Courting. In both cases presumably without sex.

Leave a Reply

Your email address will not be published. Required fields are marked *