Topcon - High Accuracy GIS


Geocoding Data and Software Sources: A Focus on Address Matching
by Jane Goodman

With the advent of the mobile internet and the concurrent rapid growth of location-based services, geocoding is no longer the exclusive domain of GIS professionals. It''s the engine that powers these "killer apps" by matching street addresses, zipcodes, parcel numbers and other geographic place identifiers to geographic longitude/latitude coordinates that define a location. Through address matching, image registration and other geocoding procedures, any information with a "where" component can be converted into a geographic object and displayed on a map revealing patterns and relationships with other features not discernable in the original data.

Address matching, a very common form of geocoding, will be covered in this article. To perform address matching several things are needed. First of course are the addresses or street intersections of interest (accident locations, flood damage claims, shopping mall locations). These may be keyed in one at a time, or stored in a text file, spreadsheet or database. Next is a geographic base file (GBF) that includes address ranges for each street centerline segment. Usually an address range for both the left and right sides of the streets is provided. The geocoding is carried out by software that parses the street address (breaking out the address number, street prefix, street name, street suffix, street type, city, state and zipcode), finds the matching street segment in the database and interpolates the address location. If the matched output address meets the U.S. Postal Service''s highest standard for address correction, the geocoding software is CASS-certified and mass mailings produced using it will quailify for special discounts. Once the geographic coordinates of the address are calculated, the longitude and latitude and any other user requested geocodes such as census tract, zipcode or voting district are appended to "enrich" the address information. If the geocoding is taking place inside of a GIS, the resulting point locations can be displayed as a new layer on top of the existing map features.

Statistics summarizing the match "hit rate" are displayed after each geocoding run. Some problems frequently encountered that prevent matches are street name spelling errors, erroneous zip codes, incomplete addresses, streets with multiple names, addresses too new to be found in the street centerline file, street numbers falling outside the ranges in the centerline file, addresses which are rural route numbers with box numbers or more than one match occurring for a single address. Most packages allow the user to specify a match strategy (from conservative to aggressive) and options for processing when no match is found. They flag each address record with a code indicating which fields could be matched and some measure of the probability that the match was correct. When no match can be made, the zip code + 4 centroid may be used instead to approximate the location. Additionally "soundex" technology can be used to look for street names that are pronounced similarly to the street address that can not be located. Non-US addresses often can not be processed because they require different parsing, or the addresses include foreign font characters. Additional passes may be made after refining the problem addresses and choosing from possible matches to maximize the accuracy of the results.

It is important to know the datum of the longitude/latitude information appended to the street address information. Displaying the points on a map using the wrong datum can result in significant offsets in location. In addition it is important to remember that over time, address matching may find a different location for the identical street address. This is because address ranges of street segments as well as their zip codes change. When this occurs, interpolation to the street number or assignment to the zip code centroid will yield different results.

Data Providers

Free TIGER/Line data files are available from the US Census Bureau. The term TIGER® comes from the acronym Topologically Integrated Geographic Encoding and Referencing. This is the name for the system and digital database developed at the Census Bureau to support its mapping needs for the Decennial Census and other Bureau programs. The TIGER/Line files are a digital database of geographic features, such as roads, railroads, rivers, lakes, political boundaries, census statistical boundaries, etc. covering the entire United States and contain information about these features such as their location in latitude and longitude, the name, the type of feature, address ranges for most streets, the geographic relationship to other features, and other related information. The data files are ASCII text in fixed record format. Single range address segments are withheld to protect the confidentiality of individual addresses collected through census field. Accuracy of maps is similar to that of a 1:100,000 USGS quad map and address ranges are present for most streets in urban areas but in rural areas address ranges are sometimes missing.

The major commerical data providers for address matching based applications are Geographic Data Technologies (GDT), Navigational Technologies (NavTech), and ETAK. These vendors use USGS quad maps, DOQ quads, aerial ortho photography and in the field data collection including GPS to produce the best possible address matching and map generation database. Additionally they are CASS compliant and update their data frequently with United States Postal Service Line of Travel Files and the latest TIGER/Line data. They claim geocoding hit rates in excess of 95%. These companies offer geoengines used by direction finder web pages, delivery services like Federal Express for fleet scheduling and routing and desktop business geographic software packages.

Based in Lebanon, New Hampshire, GDT, is a developer of map databases that provide the foundation for applications such as site selection, routing packages, environmental mapping and direct marketing. With its street, postal, census and other geographic databases, GDT supplies cartographic data to all major GIS and desktop mapping vendors. Data formats include ARC/INFO®, ArcView® GIS, ASCII, Atlas GISTM, Autodesk MapGuide, GeoMediaTM, SDETM, Tactician®. Try mapping your address online at their demo web page.



ETAK, based in Europe is a unit of Tele Atlas and offers EtakMap Premim containing the complete nationwide network of roadways, points-of-interest listings, political boundaries and over 100 additional attributes and Premium with Directions enhanced with routing instructions. Data can be purchased in ESRI, MapInfo, text and native format. Test the address data in EtakMap Premium by trying a 100 Free Geocodes on line via the EZ-Locate geocoding service or download a GeoEngine for a desktop version.

Navigational Technologies (NAVTECH), based in Rosemont, Illinois, with over 90 offices world wide produces a detailed, digital representation of the road network that enables turn-by-turn, door-to-door route guidance. Every commercially available in-vehicle navigation product in North America and the majority of products in Europe that offer turn-by-turn route guidance use a NAVTECH database. Map data bases can be ordered for in-vehicle navigation systems both for North America and Europe.



CADalog.com - Countless CAD add-ons, plug-ins and more.



Click here for Internet Business Systems © 2010 Internet Business Systems, Inc.
+1 (408) 850-9202 — Contact Us, or visit our other sites:
AECCafe - Architectural Design and EngineeringEDACafe - Electronic Design AutomationTechJobsCafe - Technical Jobs and Resumes	MCADCafe - Mechanical Design and EngineeringNanotechCafe - Nanotechnology ResourcesPrinted Circuit Board Engineering and ManufacturingShareCG  - Share Computer Graphic (CG) Animation, 3D Art and 3D Models
  Privacy Policy