The raw temperature deal
Temperature records are adjusted for changes in measurement techniques over time, but the net effect on global temperatures is small
One long-running climate skeptic claim is that scientists are manipulating temperature records to show more warming, and that the “raw” temperature records show less severe climate change.
Reality is more mundane: we’ve changed the way we’ve measured temperatures a number of times over the past 250 years, scientists are trying to account for these changes, and the corrections we make to the record only have a modest impact on global temperatures.
Let's dive into how ocean temperature measurements have changed over time as an example. Prior to the early-to-mid 1900s, sailors used to toss buckets over the side of wooden ships, pull those buckets up, and stick a thermometer in to measure sea surface temperature. But it turns out that evaporation cools water as the bucket is being pulled up, so the deck height of the ship and whether the bucket was made of wood or canvas could change the resulting temperature measurements by a few tenths of a degree C.
Once wooden ships with sails were replaced by modern vessels, temperature measurements were taken in the engine room intakes (where water is pulled in to cool the engine). These were more accurate than bucket-based measurements but – engine rooms being warm – tended to be slightly warmer than actual sea surface temperatures. In recent decades, ship-based measurements have largely been replaced by autonomous buoys that float around the ocean taking measurements, and send their data up to satellites.
If you ignore changes in instrumentation and just slap everything together into a single record, you end up with a biased result: spurious warming when we switched from buckets to ship engine room measurements in the mid-20th century, and spurious cooling over the last few decades as we transitioned to buoy measurements. It's easy to prove this, as records from just one type of instrument show broad agreement with each other (and other independent data from satellites or Argo floats).
For land temperatures there are similar challenges, though they vary a bit more based on specific country practices. In the US, for example, we systematically changed the time at which temperature measurements were taken (when observers reset their min/max thermometers) from late afternoon to early morning in an effort to improve rain gauge measurements. These time of observation shifts introduced a bias as stations that would occasionally double count maximum temperatures (e.g. where a 5 PM measurement is a high for the following 24 hour period as well) when measuring in the afternoon switched to occasionally double counting lows when measuring in the morning.
A shift from liquid in glass thermometers to electronic thermistors in the 1980s-2000s introduced a cooling bias of about 0.5C in max temperature readings due differences in instrumentation that shows up clearly in side-by-side comparisons, as well as a slight warming bias (~0.1C to 0.2C) in minimum temperatures likely associated with a move closer to buildings in some cases for power hookups.
Most long-lived stations have also moved once or twice over their lifetimes. For example, many stations were moved from urban rooftops to newly built waste water treatment plants or airports in the 1940s and 1950s, resulting in cooler temperature readings post-move. The environment around stations has also changed as urban areas have expanded and impermeable surface areas have increased. The fundamental challenge is that older weather stations were set up to measure the weather (and its large short-term variations), rather than being optimized to detect more modest but persistent changes of 1C or 2C over century timescales.
Scientists understandably try to correct for changes in measurement practices over time to get a more accurate temperature record. This is done in a number of ways: side-by-side comparisons of different measurement techniques (liquid in glass thermometers vs thermistors), temporal and spatial comparison of records from different measurement approaches (buckets vs ship engine rooms vs buoys), using hourly data to assess the impact of changing observation times, etc.
One of the most effective ways that researchers have used to detect and correct for biases in land stations (which, unlike ocean records, have the advantage of being stationary) is through neighbor comparisons. Climate changes are generally quite spatially correlated over space and time; it thermodynamically difficult for one region to warm by 1C over the course of decades while another a few miles away to cool by 1C. When one station shows a change that persists over time that is not apparent in the records of any of its neighbors, it is indicative of a localized bias (e.g. a station move or instrument change). These biases, once detected, can be corrected by subtracting the persistent offset from neighboring stations (as NOAA does) or by treating everything after a breakpoint as a new station (as Berkeley Earth does). These “pairwise homogenization” algorithms can also be blindly tested on synthetic data where the underlying “truth” is known to ensure they are accurately detecting and correcting breakpoints without introducing their own biases.
There are other ways to assess the effect of changes in measurement techniques over time. You can use independent measurements from satellites. You can set up climate reference networks that are optimally sited and use triple redundant thermometers to detect and correct for any sensor drift. You can compare subsets of stations (e.g. urban vs rural, liquid in glass thermometers vs electronic thermistors, etc.) to track biases over time and see how well these are corrected for in adjusted data. There are hundreds of studies over the years looking at these details in detail; their takeaway is that while there are always improvements to be made in how we account for past biases, the temperature record we have today is broadly accurate and well characterized by assessed uncertainties.
Raw temperatures actually show more warming
Say you are a climate skeptic who, despite the manifold reasons for accounting for changing measurement techniques over time, dogmatically insists on simply using raw measurements as taken between 1850 and today. You would, at least based on arguments I come across on social media, tend to naively assert that the raw data would show less warming. And, at least for global temperatures, your assumption would be wrong.
Here are what global temperatures look like when only using raw data (dashed line) compared with adjusted land (GHCHv4) and ocean (ERSSTv5) records (solid line). Raw and adjusted temperatures are quite similar prior to 1940, after which the shift to ship engine room intake valves creates a spurious spike in raw temperatures. This is somewhat compensated for by subsequent adjustments to land temperature records, but global raw temperatures continue to show more warming compared to homogenized data through 2024.

If we just look at land temperatures, there is some additional warming introduced by homogenized over time, which addresses the various biases in land-based stations discussed earlier. But even the raw land data shows a broadly similar story of rapid warming over the past 50 years.

In ocean temperatures we see the opposite, with raw data showing much more warming since preindustrial, largely due to biases introduced by changes in measurement techniques introduced around the WW2 era:

Remaining uncertainties are concentrated in the oceans
As I noted earlier, the work of correcting for changes in measurement techniques over time remains an active area of scientific research, and there are a number of research groups around the world working on the problem. These include NOAA, NASA, and Berkeley Earth in the US, the UK Met Office Hadley Center in the UK, Copernicus ECMWF in the EU, the Japanese Meteorological Agency, and the China Meteorological Administration.
These produce quite similar records over land, where stations do not move and the detection and correction of localized biases is easier, though there are still some notable differences in the early record (pre-1900) that arise from which stations are included and how the gaps between more spatially sparse measurements are filled in.
Over the oceans, however, differences between groups are much larger. The two main ocean temperature records are NOAA’s ERSSTv5 and the Met Office’s HadSST4. These differ much more substantially due to different approaches in correcting for changing measurement techniques, with particularly large differences in the WW2 era and in more recent decades:

If we compare global raw and adjusted data using HadSST4 rather than ERSSTv5 for the oceans, the difference in warming between the two is much smaller, reflecting the smaller difference in historical warming in the ocean between raw and adjusted data in the HadSST4 dataset:

Science is an ongoing process of discovery and self-correction, and it's quite possible that the historical temperature record will continue to evolve over time as we both collect and digitize more historical data and develop better ways to detect and correct for changing measurement techniques and conditions. But however you parse the data – whether you use our best estimate of bias corrections or just throw all the raw data in a pot and stir it around – the results are the same: the world is rapidly warming.
Living is easy(er) with eyes closed. (John Lennon). Sceptists refuse the evidence because they refuse the fact our abuses on Nature are causing climates changes.
In discussing use of surface air temperature to assess global warming, there are these two other issues to consider.
First, ocean heat content change is a more robust to monitor his warming. This was discussed in
Pielke Sr., R.A., 2003: Heat storage within the Earth system. Bull. Amer. Meteor. Soc., 84, 331-335.
http://pielkeclimatesci.wordpress.com/files/2009/10/r-247.pdf
which was motivated by
Ellis et al. 1978: The annual variation in the global heat balance of the Earth. J. Geophys. Res., 83, 1958-1962
http://pielkeclimatesci.files.wordpress.com/2010/12/ellis-et-al-jgr-1978.pdf
Latest values can be viewed, for example, at
https://www.data.jma.go.jp/kaiyou/english/ohc/ohc_global_en.html
This avoids the complexities that you raise in your post. Of course we need surface air temperature for a wide range of other uses, but global warming is not an optimal use for it.
Second, even using surface air temperatures, if the goal is to assess heating, the component of heat in water vapor must also be included. This was proposed in
Pielke Sr., R.A., C. Davey, and J. Morgan, 2004: Assessing "global warming" with surface heat content. Eos, 85, No. 21, 210-211.
http://pielkeclimatesci.wordpress.com/files/2009/10/r-290.pdf
and assessed recently in
Matthews, T., M. Byrne, R. Horton, C Murphy, R.A. Pielke Sr., C. Raymond, P. Thorne, and R.L. Wilby, 2022: Latent heat must be visible in climate communications. Wires, https://doi.org/10.1002/wcc.779