Chronology	Current Month	Current Thread	Current Date
[Year List] [Month List (current year)]	[Date Index] [Thread Index]	[Thread Prev] [Thread Next]	[Date Prev] [Date Next]

Re: [Phys-L] graphing

From: John Denker <jsd@av8n.com>
Date: Thu, 6 Mar 2025 16:23:58 -0700

On 3/6/25 09:13, stefan jeglinski wrote:

I'm in general agreement with John Denker - it can't actually matter how you plot; however there are a couple of points that I think are worth discussing. As was stated, YMMV, and I'm happy to learn of corrections to my misunderstandings.

:-)

1. If your model has an additive constant (e.g., x0 for linear acceleration data starting from rest), how you choose to plot can make it more easy or more difficult to interpret the intercept. It's especially nice for beginning students to have an immediate meaning of the intercept that comes out of the linear regression. In the case of the pendulum example here, it's 6s bc the intercept is zero.

2. The most common linear least squares derivation assumes that the variance is in the vert axis with no variance in the horz. If you want to be pedantic and are using this regression algorithm, then when linearizing you should plot the variable with the least variance on the horz axis.

Points (1) and (2) are important and true as stated. I would add the
following:

3) Fitting and plotting are two different things. You could fit Y
versus X and then plot X versus Y if you wanted.

4) It's true that the software does a terrible job with uncertainties
in the "other" direction, but that's a bug in the software, and it
doesn't change the essential concepts. In particular, if you do a
linear fit by hand, using a transparent ruler, you get the same
answer for the original data and the transposed data. The ruler
does not care. This applies to point (2) and also to point (1):
the ruler finds the same intercept either way.

In more detail: Compared to the right answer (found using a
ruler), the software is correct to first order and wrong to
second order in small quantities. Specifically: If the noise
in the "other" direction is small, the software's estimate of
the slope will be wrong by an amount that is small squared.
Sometimes you can live with that.

More generally: The fitting software used at the introductory
level -- and several levels beyond that -- is just terrible. The
issue with "other" axis uncertainty is the tip of the iceberg.
The whole thing is conceptually unsound. It's doing a maximum
likelihood fit, i.e. maximum_a_priori, which is provably *not*
what you should be doing. That's a topic for another day. I
don't know of any good references, although it's been a while
since I went looking.

Here is where I have some questions, and maybe someone here can fill in some missing pieces.

Diagonalizing the covariance matrix finds the eigenvectors/values of the data set. This is a principle components (PCA) calculation (for which the data is typically scaled and centered to zero the intercept first). But the PCA is /maximizing/ the variance along an axis it discovers. This axis is the first PC, the one with the largest eigenvalue. I've never found a proof of it in a stats book, but it appears that the first PC is collinear with the slope of the total linear least squares regression, which is /minimizing/ a variance. In a handwaving way, I can accept that maximizing one type of variance is the same as minimizing another, but it seems dangerous to generalize this. I'd feel better with a linalg proof of the connection in 2 dimensions rather than always relying on "proof by computation."

Two answers:

Principal components analysis (PCA) or singular-value decomposition
(SVD) finds the eigenvectors and eigenvalues. The eigenvectors are
mutually perpendicular. If you can find them accurately, it does not
matter in which order you find them. You can find the big eigenvalue
first or the small eigenvalue first. So there are lots of things to
worry about, but that's not one of them.

OTOH ... as mentioned above, the thing you are diagonalizing is
the wrong thing! You can easily prove that it finds the maximum
likelihood, i.e. maximum_a_priori, but that's the wrong thing to
be maximizing.

=========

I realize I have stated a lot of things without explaining them.
Sorry about that. It's not easy to explain. I once explained it
to a guy who had literally written the book on statistical
inference. It took about a year.

Follow-Ups:
- Re: [Phys-L] graphing
  - From: bernard cleyet <bernard@cleyet.org>

References:
- [Phys-L] pendulum
  - From: Anthony Lapinski <alapinski@pds.org>
- Re: [Phys-L] graphing (was: pendulum)
  - From: John Denker <jsd@av8n.com>
- Re: [Phys-L] graphing
  - From: stefan jeglinski <jeglin@4pi.com>

Prev by Date: Re: [Phys-L] graphing
Next by Date: Re: [Phys-L] graphing
Previous by thread: Re: [Phys-L] graphing
Next by thread: Re: [Phys-L] graphing
Index(es):
- Date
- Thread