Errors in Google Analytics' User Location Data Explained

 
JMM_Blog-UserLocation-DataErrors.jpg

There’s no doubt that data driven decision-making has exploded as tools in the digital landscape have developed. The caveat is that the decisions can only as good as the data they come from. Due to measures put in place to protect individual user data, location data in Google Analytics is considerably inaccurate at the city level. Google only acknowledges that their location data in Analytics isn’t quite accurate in the far corners of their support pages, so it wouldn’t be surprising if the majority of business owners and advertisers are unaware of how inaccurate this data is. Discovering why the data is so inaccurate and where to find accurate data is at the bottom of a very deep rabbit hole. Fortunately, we’ve made that trek and have brought the most important points back to the surface.

How User Location is Determined

Ironically, one of the best ways to make the inaccuracies of user location data in Google Analytics evident is to compare it to Google Ads’ data. They use different tools to determine user location because of Google’s policy on Personally Identifiable Information (PII). Precise location data such lat/long data from GPS sometimes fall under PII depending on the specific Google product being used. In other words, Google Ads can use precise location data because it publishes its location data in a way that cannot be connected to individual users within the Google Ads platform. Conversely, precise location data would break the PII policy in Google Analytics because of how it’s location data can be connected to the actions individual users take.

Google Ads has a variety of tools it can use to determine a user’s physical location; however, it will always use the most accurate tool to decide if a user is within the location targeting when multiple tools can be used.

  • IP Address: More on that later!

  • Device location: Device location can be determined in multiple ways

    • GPS – GPS uses satellites to derive longitude and latitude of a device.

    • Wi-Fi – Location is anywhere within the effective access range of the Wi-Fi router. The location of the router is only as accurate as the IP Address.

    • Bluetooth – Short ranged beacons are placed in fixed locations that can be used to determine the location of nearby devices, though this is not a commonly location identifier.

    • Google Cell Tower – in the absence of GPS and Wi-Fi, Google will use cell tower data to determine the location. The accuracy of cell towers varies. Here is an in-depth explanation on how connections between phones and towers are established and transferred. It is possible that a cell phone can be connected to a tower within our geographical targeting, but the user and phone are located outside of it.

Analytics only uses the connected IP address to estimate user location based on geolocation databases. Google specifically acknowledges that doing so using IP addresses are “convenient but also [have] a few drawbacks” Google’s IP data must get information from a third-party IP database. This Google support page states explains how the IP data Google Ads and Analytics use can be inaccurate. For Google Ads this doesn’t apply to locations derived through GPS, but this applies to all cases in Analytics.

IP addresses are routinely re-assigned, and AdWords updates its IP data regularly to reflect these changes. Third-party tracking providers may update their IP data on a different schedule.

Their third-party data source isn’t guaranteed to be 100% accurate either. Misentered data and recently moved Wi-Fi routers can also contribute to the inaccuracy of IP data. This site dedicated to information on IP geolocation asked IP data providers about the accuracy of IP data at various levels. They estimate that countries are 95%-99% accurate, regions (states) are 55%-80% accurate, and cities are 50-75% accurate.

Comparing the Data

Knowing how Analytics and Google Ads differ when determining user location data will explain the variation between the two platforms in the following examples. The data here is taken from A Google Analytics account as well as the connected Google Ads account from June 1st - June 30th. A segment has been applied to the Analytics data that only shows traffic from the connect Google Ads account to ensure the data matches. The Google Ads account uses the advanced targeting setting that only targets people physically located in the targeted area, as opposed to targeting people who also show interest in the area. While clicks and sessions can differ slightly, the minor differences won’t be able to explain the larger discrepancies at more specific location levels. This Google Ads account targets San Diego and the immediately surrounding area, accurate data would have to align with this area in both platforms.

Country Level

  Google Ads Country Level  - As expected, Google Ads shows 100% of our clicks are from the US.

Google Ads Country Level - As expected, Google Ads shows 100% of our clicks are from the US.

  Analytics Country Level  - Analytics shows nearly all sessions are from the US, but 123 were from Mexico and a handful from other countries. Still, this is only 1% off the total sessions.

Analytics Country Level - Analytics shows nearly all sessions are from the US, but 123 were from Mexico and a handful from other countries. Still, this is only 1% off the total sessions.

Region (state) Level

  Google Ads Region Level  - Once again, all Google Ads clicks come from California like we want.

Google Ads Region Level - Once again, all Google Ads clicks come from California like we want.

  Analytics Region Level  - The Analytics sessions for the US are mostly in California, but 1,283 sessions (9% of sessions) are attributed as states or regions outside of California.

Analytics Region Level - The Analytics sessions for the US are mostly in California, but 1,283 sessions (9% of sessions) are attributed as states or regions outside of California.


City Level

  Google Ads City Level  - All of these cities are relevant to this business. Fantastic!

Google Ads City Level - All of these cities are relevant to this business. Fantastic!

  Analytics City Level  - The Analytics data is far off. Of the top ten, Los Angeles, Chicago, and the “(not set)” location in Nevada are all outside of the Google Ads targeting zone that initially captured these sessions. In fact, Analytics claims 20% of the sessions are coming from locations outside our targeting (show below in a filtered view).

Analytics City Level - The Analytics data is far off. Of the top ten, Los Angeles, Chicago, and the “(not set)” location in Nevada are all outside of the Google Ads targeting zone that initially captured these sessions. In fact, Analytics claims 20% of the sessions are coming from locations outside our targeting (show below in a filtered view).

7 False Location Data.png

Some of these locations like Escondido are only a few miles outside our targeting boundaries, but most others such as Los Angeles, Dallas, and the (not set) Nevada locations are more than 100 miles away. All 2,858 of these sessions must be from clicks of users Google Ads determined were located within our location targeting with a method that wasn’t IP geolocation (most probably used GPS). Analytics then went to determine user location using the only method it can: it’s IP Address geolocation. As a result, these sessions were incorrectly labeled to come from the wrong cities and sometimes the wrong state or country!

The Kansas Problem

As if this wasn’t enough to show Google Ads is more reliable when reporting on user location than Analytics, an oddly specific error occurs in Analytics which several people refer to as “The Kansas Problem”. Three miles north of the Kansas-Oklahoma border and about an hour and a half North of Tulsa, Oklahoma lies Coffeyville, Kansas. It has a population around 10,000, yet it’s very likely that any Google Analytics account with at least a few thousand monthly visitors claims traffic is coming from this small town. This account only had 4 sessions from Coffeyville in June, however in April the account had 55 sessions from Coffeyville among 3,173 sessions from all channels. Check your account for the last 30 days to see if any of your traffic is being misrepresented as coming from Coffeyville.

Some theorize that Coffeyville’s location near the center of the US makes it the default location for traffic from locations within the US that can’t be more precisely defined, and it seems to be the case for the geolocation company MaxMind. It’d be interesting to see if other countries have their versions of Coffeyville that act as the central city for undefined geolocations. Perhaps Coffeyville is the default location for all of North America or potentially the entire world for some geolocation companies.

User location data seems straightforward at surface level, but the complex technologies used to gather this information without compromising personally identifiable information make location data unreliable in Google Analytics. Other platforms that gather user location data such as Google Ads and certain demand side platforms are more accurate alternatives. If you’re lost in your location data or other facets of digital marketing reach out to us at J Miller Marketing. We have the knowledge and problem-solving skills to navigate around the most perplexing issues in the digital landscape. Follow J Miller on Facebook, Twitter, Instagram, LinkedIn or contact us directly!

By Steve Ott
Senior Online Media Strategist



Sources:

https://support.google.com/analytics/answer/7686480
https://support.google.com/adwords/answer/2453995?hl=en&ref_topic=3119074
https://www.quora.com/Does-your-phone-use-algorithms-to-decide-which-cell-tower-it-should-connect-to
https://support.google.com/analytics/answer/6160484?hl=en
https://support.google.com/adwords/answer/2453994?hl=en&co=ADWORDS.IsAWNCustomer%3Dfalse
https://www.iplocation.net/geolocation-accuracy
https://support.google.com/analytics/topic/4588493?hl=en&ref_topic=1042504
http://p5k6.github.io/blog/2014/08/09/understanding-your-geoip-data/
https://webmasters.stackexchange.com/questions/109765/why-is-coffeyville-kansas-sending-large-amounts-of-traffic-in-google-analytics?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
https://splinternews.com/this-is-the-new-digital-center-of-the-united-states-1793856143