Jacob Hilton
๐ค SpeakerAppearances Over Time
Podcast Appearances
Using only a handful of computational operations, we were able to mechanistically estimate the model's accuracy to within under one part in 13,000, which would have taken tens of thousands of samples.
So our mechanistic estimate was much more computationally efficient than random sampling.
Moreover, we could have easily produced a much more precise estimate, exact to within floating point error, by simply computing how close the 8 subscript 0 and 8 subscript 1 were in the two yellow regions.
Surprise Accounting As explained here, the total surprise decomposes into the surprise of the explanation plus the surprise given the explanation.
The surprise given the explanation is close to 0 bits, since the calculation was essentially exact.
For the surprise of the explanation, we can walk through the steps we took.
There's a list of bullet points here.
We standardized the model, which simply replaced the model with an exactly equivalent one.
This did not depend on the model's parameters at all, and so doesn't incur any surprise.
We checked the signs of all 10 of the model's parameters and whether or not each of the 4 entries of complex formula omitted from the narration was greater than or less than 1 in magnitude, incurring 14 bits of surprise.
We deduced from this the form of the piecewise linear function delta.
This was another step that didn't depend on the model's parameters and so doesn't incur any surprise.
We checked which of the two linear coefficients was larger in magnitude in each of the four blue and green regions incurring four bits of surprise.
We checked that the two linear coefficients were equal in magnitude in each of the two yellow regions to within one part in.
Complex formula omitted from the narration.
Incurring around 22 bits of surprise.
That's the end of the list.
Adding this up, the total surprise is around 40 bits.
This plausibly matches the number of bits of optimization used to select the model, since it was probably necessary to optimize the linear coefficients in the yellow regions to be almost equal.
So we can be relatively comfortable in saying that we have achieved a full understanding.