Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] graphing



I'm in general agreement with John Denker - it can't actually matter how you plot; however there are a couple of points that I think are worth discussing. As was stated, YMMV, and I'm happy to learn of corrections to my misunderstandings.

1. If your model has an additive constant (e.g., x0 for linear acceleration data starting from rest), how you choose to plot can make it more easy or more difficult to interpret the intercept. It's especially nice for beginning students to have an immediate meaning of the intercept that comes out of the linear regression. In the case of the pendulum example here, it's 6s bc the intercept is zero.

2. The most common linear least squares derivation assumes that the variance is in the vert axis with no variance in the horz. If you want to be pedantic and are using this regression algorithm, then when linearizing you should plot the variable with the least variance on the horz axis.

When measuring L and T of a pendulum, you probably can do L pretty accurately with little scatter (unless maybe you get different students measuring the same length), whereas T is likely to have scatter even if a single observer does every measurement. Add to this that you're really plotting T² so the scatter on the vert axis is even larger.

You could I suppose re-derive the regression computation for when all of the scatter is on the horizontal axis if you were going to be insistent on which gets plotted where.

Or you could compute the regression as what I've seen referred to as "total linear least squares," where there's scatter on both axes. In this case the algorithm is finding the minimum variance in a "perpendicular" sense.

For any experiment done half-well, the slopes you get from each of these 3 different approaches are close in value, so which one is correct? All 3 have an uncertainty associated with them, so I guess you can say they only differ from one another depending on the confidence level, so it becomes a religious war IMO as to which is a "better answer."

If any of this so far is wrong, please clarify.

================

Here is where I have some questions, and maybe someone here can fill in some missing pieces.

Diagonalizing the covariance matrix finds the eigenvectors/values of the data set. This is a principle components (PCA) calculation (for which the data is typically scaled and centered to zero the intercept first). But the PCA is /maximizing/ the variance along an axis it discovers. This axis is the first PC, the one with the largest eigenvalue. I've never found a proof of it in a stats book, but it appears that the first PC is collinear with the slope of the total linear least squares regression, which is /minimizing/ a variance. In a handwaving way, I can accept that maximizing one type of variance is the same as minimizing another, but it seems dangerous to generalize this. I'd feel better with a linalg proof of the connection in 2 dimensions rather than always relying on "proof by computation."

PCA is generally used to analyze higher dimensional data sets but I was never taught it as a rigorous approach to 2-D data sets (x vs y). Given its connection to eigendecomposition and the importance of that subject in physics, I wonder why PCA isn't ever taught to undergrad physics majors for doing regressions in 2 dimensions. Seems like there would be some useful pedagogy here.

Thoughts?

Stefan Jeglinski



On 3/5/25 1:24 PM, John Denker via Phys-l wrote:
On 3/5/25 04:01, Anthony Lapinski via Phys-l wrote:
Since length is the independent variable, it should go on
the y-axis - plot length vs period squared

YMMV, but from my point of view it could not possibly matter
which quantity goes on which axis.

I realize that other people are of the opinion that life is
simpler for ultra-unsophisticated students if the horizontal
axis is always X and X is always the independent variable
... but that simplification comes at a terrible cost. It
will have to be unlearned, sooner rather than later, and
unlearning is always hard. I'm not persuaded that it helps
anybody to make up "rules" or "traditions" that have no
basis in reality.

=============================

I'm all in favor of keeping things simple in when a subject
is being introduced. The first step is the first step. Even
so, you want the first step to be in the right direction,
so here, amongst experts, it's worth discussion which way
the path has to go.

Consider for example a combination of functions: g(f(x)).
The output of f is the input to g, and you can illustrate
this nicely if you rotate the graph of g 90°. Phil Keller
wrote an introductory-level book that does this. Last time
I checked (which was a while ago), compound graphs show
up on the standardized tests.

Here's an even simpler example: Suppose you want to plot
the temperature of the atmosphere (or the speed of sound,
or the wind speed, or anything else) as a function of
height. We're talking about the actual geometrical physical
height, and it makes sense to plot that "vertically" on
the graph. Like this:

https://www.eoas.ubc.ca/courses/atsc113/flying/met_concepts/03-met_concepts/03a-std_atmos/images-03a/std-atmos-temperature-color.png

In this case, the vertical coordinate is clearly not a
function of the other coordinate ... but that's totally
fine.

For a /linear/ relationship, it is super-extra immaterial
which quantity goes on which axis. It's six florins versus
half-a-dozen guilders.

Having made thousands of graphs, I have some pretty strong
opinions about what's important and what's not. Consider
for example the writeup I posted a couple days ago:
https://av8n.com/physics/ne-asteroid-groups.htm#sec-belt
Initially I plotted Q versus q, but I flipped it to become
q versus Q, simply because that fits better on a typical
screen. The was absolutely no change in meaning. Also note
that the second (more complicated) version of the diagram
has *five* different axes on a single two-dimensional plot.
  Q = aphelion
  q = perihelion
  e = eccentricity
  2a = Q+q = major axis
  T = period

In fluid mechanics it gets even more complicated. At each
point in three dimensions we have /at least/:
  temperature
  pressure
  density
  entropy density
  energy density
  three components of momentum density
  ... et cetera ............

Then there are ternary graphs, which are intrinsically
triangular, giving a two-dimensional representation of
something with three variables and one constraint. For
example:
  chemistry or mineralogy with three concentrations
  electoral politics with three candidates
  ... et cetera .....................

At some point you come to the realization that the axes,
strictly speaking, were never important. What matters
are the /contours/ of constant X, constant Y, and on
the ternary graph, constant Z. In special relativity
(not to mention general relativity) and also in
thermodynamics, very commonly the contours aren't
straight and/or aren't mutually perpendicular. This
is discussed in detail here:
https://www.av8n.com/physics/axes.htm

You may have noticed that every graph I've created in
the last 30+ years shows the contours, not just the
axes. Some plotting packages do this for you by default;
others will do it upon request. It's OK if the contours
lines are thin and unobtrusive, but IMHO the really
ought to be there.
_______________________________________________
Forum for Physics Educators
Phys-l@mail.phys-l.org
https://www.phys-l.org/mailman/listinfo/phys-l