Enhanced Companies House Data Analysis

Python, Tableau, APIs

Project Overview:

In this project, utilizing Python and innovative data-matching techniques, we have tackled the challenge of identifying active companies operating in Northumberland County, specializing in specific industry codes. The objective was to merge data from publicly available Business Rates and Companies House, enriching it with accurate industry codes.

Challenges:

The Business Rates data lacked industry codes (SIC codes), and manual entry introduced potential errors. Additionally, Companies House data provided SIC codes but often referred to headquarters, not the exact business location recorded in Business Rates.

Solution:

We employed the Companies House API to match company names from Business Rates, ensuring a comprehensive dataset. Leveraging Python’s natural language processing with fuzzy matching, the team calculated name likeness, successfully linking companies with precision. The result was a robust list of companies in Northumberland County with specific SIC codes.

Outcome:

The Enhanced Business Identification Project not only addressed data disparities but also delivered a reliable list of companies operating in Northumberland County, equipped with accurate industry classifications. Our innovative approach ensured a nuanced understanding of the business landscape, providing valuable insights for informed decision-making.