Emeralds are Grue

Though it’ll be included in my March links roundup, When it goes out, I’ll link to it here. the Daily Princetonian, in addition to the second-greatest collegiate crossword puzzle section in the nation (the Maroon is #1), also had a great piece on SecDef Pete Hegseth’s apparent plagiarism in his senior thesis.

On the one hand, I love student journalism, Only haters and losers will denigate my time as head (crossword) editor for the Chicago Maroon. and Hegseth seems, pretty clearly, to have plagiarised. On the other, honestly? With the sheer number of public figures that appear to have plagiarized, maybe the correct conclusion is that plagiarism isn’t as bad as we thought? I’m very unsure, but, regardless, it doesn’t look great for Pete.

I’ve also spent the past ~week thinking (and writing!) about:

Whether I should donate a kidney,
What must I do to ensure that, when I move to a totally new city in a few months, I can build a community,
What good governance structures look like in worlds where AGI capabilities are available to everyone (thanks, DeepMind internship!),
How we can get across the finish line light-touch AI governance that acknowledges our uncertainties and intelligently walks the tightrope between innovation and recklessness, and
what the hell “commitment” means.

You aren’t getting any of those today — those notes shall, alas, remain private for the next couple weeks as I distill them. Instead, I offer musings on a classic problem in epistemology: which patterns can I extend into the future? How do I differentiate them from patterns that I cannot extend into the future? I grapple with grue and investigate a solution. I hope you enjoy — and if you don’t, please let me know.

Grue

We learn by looking. See enough swans, all white, and you start to think, reasonably enough, that all swans are white. See enough emeralds, all green, and you generalize: all emeralds are green. This inference induction — roughly that, because something has been one way in the past, it will continue being that way in the future. In his 1955 Fact, Fiction, and Forecast, though, Nelson Goodman noted that we’re very comfortable projecting some patterns into the future, while others, equally consistent with our observations, seem absurd. To show this, Goodman defines the predicate grue:

An object is grue if it’s observed before some future time t and is green, or if it hasn’t been observed before some future time t (we could say January 1, 2027) and is blue. Titelbaum, Michael G. “Not Enough There There Evidence, Reasons, and Language Independence.” Philosophical Perspectives, vol. 24, no. 1, 2010, pp. 477–528.

Now consider all the emeralds we’ve ever observed up to today. Every single one has been green. But this also means that every emerald we’ve observed is grue! Only haters and losers will denigate my time as head (crossword) editor for the Chicago Maroon. Our past observations are perfectly consistent with both the hypothesis all emeralds are green (HG) and the hypothesis all emeralds are grue (HGR).

Yet, we instinctively feel that observing green emeralds confirms HG more than it confirms HGr. We expect emeralds mined on January 2, 2027 to be just as green as they are today, and not, as HGr predicts, blue. Why, though, are we asymmetrically confident in green over grue, when we’re equally likely to see our evidence under HG as we are under HGr? This is Goodman’s central question: why are we comfortable treating HG as law-like and projectable into the future while rejecting HGr as contingent and non-projectible?

You might start by asserting that “grue” is artificial — “non-qualitative” or positional. Its definition explicitly mentions a time t, after all, which means that an object’s grueness depends on our observation point. Its greenness, though, is time-invariant and thus more parsimonious, and thus preferable. But this move is not so simple. The apparent simplicity depends on the predicates with which we start.

We already have grue — now define bleen as applying to things that are blue and observed before t OR haven’t been observed by time t and are green. Certainly, if we start with green and blue, grue and bleen can be defined as time-relative applications of blue and green. But if we start with grue and bleen, then we can define blue and green as time-relative applications of grue and bleen! “Thus,” Goodman tells us, “qualitativeness is an entirely relative matter and does not by itself establish any dichotomy of predicates.” (Goodman, 79-80) Dismissing grue simply because it looks complex from our green/blue perspective assumes the very thing it tries to show. There must be something deeper going on than just picking the predicates that seem intuitively natural or qualitative from our contingent linguistic starting point. Is there anything that objectively justifies projecting green over grue?

Michael Titelbaum argues that the problem Goodman unearthed is far deeper and more general than just a critique of specific formal confirmation theories. Titelbaum, Michael G. “Not Enough There There Evidence, Reasons, and Language Independence.” Philosophical Perspectives, vol. 24, no. 1, 2010, pp. 477–528. He suggests that the core issue is that evidential propositions themselves often lack sufficient information content to substantively favor one hypothesis over another, at least in cases where the hypotheses aren’t just logical consequences of the evidence.

After all, the proposition all observed emeralds E1, E2, …, En were green is logically equivalent to all observed emeralds E1, E2, …, En were grue. The raw data, the propositional content of our evidence, is the same under both descriptions. Titelbaum argues that if there were an objective, three-place evidential favoring relation — a relation f(H1, H2, E) meaning evidence E favors one of the hypotheses H1 or H2 over the other — that went beyond mere logical entailment, it would have to be biased.

Why biased? Because to get from the same evidence E to favoring HG but not HGr, the favoring relation f itself must treat the properties involved differently. It must somehow “know” that green is a projectable property while grue is not. As Titelbaum puts it, any process that detects substantial evidential favoring “must have a bias towards certain properties (or something equivalent) that precedes the influence of the evidence.” Titelbaum, 479. To see why, note that the evidence E and the logical structures of H1 and H2 alone are not sufficient to distinguish between green and grue — after all, the evidence and logical structures can be made formally identical through predicate switching like we did in the previous paragraph.

But this means that, to prefer one over the other, we need to do so on the basis of something beyond our observed evidence! If we need some kind of non-evidential bias towards certain “natural” or “special” properties to favor them, how can we figure out which properties are special? We might initially think that we ought to look at the evidence, but we’ve just established that the evidence itself doesn’t tell us which properties are special! You “can’t determine what [your] evidence favors until [you have] the list of natural properties,” but you can’t determine the list from your evidence either. Titelbaum, 484. Evidence seems to only rule out possibilities (like the possibility that the first emerald we saw was red). It doesn’t, on its own, provide a ranking or a weighting among the remaining possibilities without some additional input. That input, Titelbaum argues, cannot be derived solely from the evidence itself. There’s just not enough there in the evidential proposition alone.

We are not, however, fully lost. Titelbaum suggests this pushes us towards subjectivism about evidential favoring. Perhaps favoring isn’t just a three-place relation f(H1, H2, E), but a four-place relation f(H1, H2, E, S), where S represents some subjective factor belonging to the agent or their context. This S could be their native language (which privileges green), their conceptual scheme, their pragmatic goals, or, in a Bayesian framework, their prior probability distribution. On this view, the evidence combined with the subjective factor S determines which hypotheses are favored. Subjective Bayesianism is a prime example: the prior probability P(H) reflects the initial bias, and evidence E updates this via Bayes’ theorem to P(H | E). Different priors can lead to different conclusions from the same evidence. This subjectivist turn dissolves the paradox by denying the premise that favoring must be purely objective and derivable from evidence alone.

Titelbaum’s argument leaves us with a powerful, if somewhat unsettling, conclusion: evidence by itself isn’t enough to get induction off the ground for non-trivial cases. It needs help, some extra ingredient — a bias towards certain properties, perhaps encoded in a preferred language, a conceptual scheme, or our prior probabilities. This neatly dissolves the riddle by denying that favoring must be purely objective and evidence-driven. But it raises a new worry: If our inductive conclusions depend on subjective priors, are those priors completely arbitrary? Can we rationally adopt any bias we like, even one which favors grue?

This is where Sven Neth’s recent work, “Random Emeralds,” offers a crucial insight. Neth, Sven. “Random Emeralds.” Philosophical Quarterly, forthcoming. I’d been working from the PhilPapers preprint — at https://philpapers.org/rec/NETRES — and didn’t realize that the paper had been published this past Tuesday at https://doi.org/10.1093/pq/pqaf038, so my page numbers correspond to the former. Neth doesn’t dispute Titelbaum’s core claim about the insufficiency of evidence alone. Instead, he provides a principled argument for why, even within a subjectivist framework relying on priors, we have non-arbitrary reasons to assign a very low prior probability to hypotheses like “All emeralds are grue.”

Neth invites us to consider the “best case for inductive reasoning.” What ideal conditions would we assume? Two highly plausible candidates, he suggests, are random sampling and probabilistic observation-independence. By random sampling, Neth means that we encounter or select emeralds for observation in a random order. Our sampling method doesn’t inherently favor observing green ones early or late, for example. Any specific emerald is equally likely to appear at any point in our sequence of observations. By probabilistic observationindependence, Neth means that, for certain properties, learning whether an object has the property gives us no probabilistic information about when it was observed, and vice versa, given that we are sampling randomly.

Under these conditions, Neth argues, our prior beliefs should be exchangeable with respect to green. Exchangeability means the probability of observing a particular sequence of green and blue emeralds depends only on the number of green and blue emeralds in it, not their order. If we know we’ll see one green (G) and one blue (B), random sampling and the independence of color from order imply P(GB) = P(BG). This echoes Karnap’s theory of confirmation. Order doesn’t matter.

Now, Neth confronts the symmetry objection head-on. The objector says: “Hold on! green can be defined using grue, bleen, and time, just as grue can be defined using green, blue, and time. Your setup just assumes that green is basic. Why isn’t grue observationindependent if we start with grue/bleen primitives?”

Neth’s crucial move is to clarify that his notion of observation-independence isn’t about linguistic definitions or primitives; it’s about probabilistic dependence. The question is: does learning which things have property X change your probabilities about the observation order O, assuming random sampling?

For green, the answer is plausibly “no.” If someone tells you emeralds a, c, and d are green, and b is blue, under random sampling, this gives you no reason to think a was observed before b. But for grue, the answer is “yes”! Learning which emeralds are grue can give you information about the observation order, especially if you have any prior beliefs about the underlying color distribution. Suppose you think blue emeralds are generally more common than green ones. Now, someone tells you emerald a is grue. Recall, grue means (observed before t AND green) OR (not observed before t AND blue). Given your background belief that blue is more common, learning a is grue makes it more likely that a fits the second disjunct — i.e., that it is blue and not yet observed (or observed late). If a were green, it would have to have been observed early to be grue. So, learning Grue(a) does update your probabilities about when a was observed, relative to other emeralds. Therefore, under the plausible condition of random sampling, ‘grue’ fails the test of observation-independence.

The symmetry is broken not at the level of linguistic definition, but at the level of probabilistic independence between the property and the observation process. The assumptions justifying exchangeability hold for green but fail for grue.

What does this mean for prior probabilities? If our priors are exchangeable for green, consider HGr — the hypothesis that all emeralds are grue. This hypothesis describes one very specific sequence: the first k emeralds observed (before t) are green, and the subsequent n-k emeralds are blue. Exchangeability demands that any sequence with k green and n-k blue emeralds (observed in any order) must have the same probability, conditional on there being k green ones total.

There are n-choose-k such sequences. If the total probability for k green emeralds is P(k), then under exchangeability, the probability assigned to any single sequence, including the specific all-grue sequence, is P(k) / n-choose-k. Since n-choose-k is often a very large number (especially for large n and k near n/2), the prior probability P(all grue) is forced to be very small. Neth puts it starkly: “the maximum prior probability of all-grue is 1/n-choose-k,” Neth, 10. Neth uses latex, but I’m writing in a google doc, so I replaced “\binom{n}{n}” with “n-choose-k”. reached only if we are certain there will be exactly k green emeralds. Any uncertainty about k lowers the probability further.

Now consider HG — the hypothesis that all emeralds are green. This corresponds to the single sequence where all n emeralds are green (k=n). How many ways are there to arrange n green emeralds among n slots? Only one. Since n-choose-k = 1, exchangeability imposes no upper bound derived from combinatorics. P(All Green) = 1 is perfectly compatible with exchangeability. So, the same principle (exchangeability), derived from plausible assumptions about ideal induction, yields an asymmetry: it imposes a potentially severe upper limit on the prior probability of “all emeralds are grue,” but places no such limit on the prior probability of “all emeralds are green.”

Goodman’s New Riddle reveals a crack in the intuitive foundations of induction. Titelbaum persuasively argues that the bare evidence often isn’t enough to guide our projections; some prior bias, some subjective element, is needed to bridge the gap. This subjectivist perspective acknowledges the limits of purely objective, evidence-driven inference.

However, Neth provides a crucial counterpoint to the idea that this subjective element must be wholly arbitrary. By reasoning from the nature of ideal inductive scenarios — random sampling and the probabilistic independence of properties like green from the observation process — we find, Neth suggests, a principled, non-linguistic reason to break the symmetry between green and grue. The resulting exchangeability for green forces the prior probability for the highly specific all-grue hypothesis to be low, while leaving the all-green hypothesis unconstrained. While our conclusions still depend on our priors (or other subjective factors), Neth shows we can have rational grounds to set those priors in a way that strongly disfavors gruesome predicates from the outset. The evidence might not be enough on its own, but the assumptions we rationally bring to that evidence give green the winning edge.