• Home
  • Diver BI Group
    • Diver | BI
    • Customer Commendations
    • Videos
    • Case Studies
    • Documents and Brochures
    • Diver BI Group Privacy Policy
  • Articles
    • News
    • Accolades
    • Blog
    • Ask Anthony
  • The Team
    • Key Experts
    • Members
      • Dynamic Business Informatics
      • Envisage
      • Chartwell IT
      • Cloud-Works Solutions Ltd
      • Data Management Business Insight
      • Manitex
      • Equiniti ICS
      • Pharma IT
      • Pinnula
      • Formell
      • IBI
    • Partners
      • Dimensional Insight US
      • Dimensional Insight EU
      • Dimensional Insight Asia
      • DI Panama
      • ID Technology (Brazil)
      • Infotool (Nordic Countries)
      • Pilaro
      • iXias Software – DI France
      • Vision IT
      • KDE Group (Israel)
      • Zight -Partner for Transport
      • 24/7 Uptime
  • Applications
    • Property
      • Property Prices Portal
    • Healthcare
      • DI Surgery Advisor
      • DI Physician Performance Advisor
      • Healthcare
      • Pharma BI Applications of Diver
    • Supply Chain
      • Retail
      • Beverage Alcohol Suppliers
      • Public Transport
      • Beverage Alcohol Distributors
      • Distributor
      • Produce
      • Wholesale
      • DI Program Advisor
      • Manufacturing
      • Autoparts
      • Transport
    • Finance
      • DI GL Advisor
      • Finance
    • Mobile & SaaS
      • Diver BI SaaS
      • BI designed for Tablet
      • SaaS for Sales Gap Analysis
      • DiveTab – Mobile BI
    • Diversity
      • Waste Management
      • Telecom Brokerage
      • Colleges & Universities
      • Economic Trends
      • Maps
      • Parking
      • Smart Meters
      • Carbon Emissions Management
      • Travel & Tourism
  • Workbench
  • Contact
    • Free Consultation with One of Our Experts
    • Add a New Revenue Stream as a Partner
    • 5 Minute Guides

DiverBIGroup

Business Intelligence Experts

Diver BI Group Articles Ask Anthony How do I Incorporate Useful Data from the Web? Part 2

How do I Incorporate Useful Data from the Web? Part 2

27th September 2016

For further information on either part of this blog please get in touch through our contact page or attend our conference on 27th October where we will have Diver experts talking about these techniques.

YY Tsang

In Part 2 here my colleague YY Tsang from DMBI explains his ideas on extracting and incorporating non-tabular web data.

In Part 1 my colleague Paul Duggan at Dynamic Business Informatics has shared his ideas on how to interface with Sharepoint and incorporate some useful tabular web data into a Diveport embedded within Sharepoint.

 

PART 2 – INCORPORATING WEB DATA THAT ARE NOT IN TABULAR FORMAT

In the case described in part 1 of this blog, we were incorporating web data that was already in tabular format and accessible through a single web page – so it was fairly straightforward to integrate using the Diver Integrator.

In this new example we were working on a dashboard for the pharma sector. As the project involved Brand/Generic drug mapping we wanted to add more value to the dashboard reports by incorporating information about what ailments the drugs were actually used for.  This would allow us to carry out some further analysis which could be of use to the pharmacists.

This information was available on a variety of websites but not in tabular format. We wanted to incorporate the Classifications and all of the sub classes of the active ingredients using a list of unique ATC codes to map them.

We had a drugs list file which contained these 7 character codes which are common to the industry. We could see the descriptions of what they were used for at each breakdown level of this code on various websites – but each classification was hidden amongst a lot of other information, and they were all on different pages.

Extraction Software

Using some freely available extraction software, such as the one we used http://www.hexcat.com/deepvacuum/, we were able to set up an iterative process that extracted the information we needed from every page of the website.

extraction software input screen

 

This enabled us to create a file for each page of the website as shown below.

extracted file pages

The next but one image shows what the contents of just one of those files looks like after gathering it using the software – the page for ATC N02AB.

list of ATC files

HTML Content

As you can see, the html code below has a lot of information but it’s not in the format that could be used as a lookup table.

html file content

Using Diver BI’s Visual Integrator

Using Diver BI’s Visual Integrator we created a script that would loop around and collect the information in each of these files, and strip out the important classifications to create tables for each level.

Visual Integrator

This is the same file when viewed through the Visual Integrator’s test feature.

Visual Integrator test input

Reformat Routines to Extract Classifications

In the reformat routines we looped through the html files and extracted the relevant classifications by breaking down the ATC code into its 3 component levels. The highest level was the description for the first 4 characters.

ATC level 1

The next level down was added by including the 5th character of the ATC code.

ATC level 2

The third level included all 7 characters.

ATC level 3

Creation of Lookup Table

From these iterative loops we were able to derive a complete hierarchy and convert it to a lookup table for the drugs in our BI model. This extra information allows the pharma user to investigate, for example, which drugs are being prescribed together for particular diseases or ailments and whether other drugs are also being prescribed for the side effects.

ATC lookup table

These techniques are only shown as examples – you should be able to apply them to most widely available public data sets and incorporate them into your Diver models to provide important insights that do not reside in your own data.

Ask Anthony

  • This field is for validation purposes and should be left unchanged.

Articles

  • News
  • Accolades
  • Blog
  • Ask Anthony
  • Diver | BI
  • Customer Commendations
  • Videos
  • Case Studies
  • Documents and Brochures
  • Diver BI Group Privacy Policy

Dynamic Business Informatics Ltd

Distributors of The Diver Solution & Diver Platform, providing business consultancy services, products and applications derived from the Diver Solution & Diver Platform.
14 Mellifont Avenue
Dun Laoghaire
Co. Dublin

+353 1 2302022

10 Kenyon Street
Nenagh
Co Tipperary

+353 67 43309

Contact Us

We would love to hear from you! Please submit your query and we will get in touch with you shortly.

Email addresses are never sold, rented, or shared. See our Privacy Policy.

  • This field is for validation purposes and should be left unchanged.

Accolades

  • Top Ranks in BARC’s The BI Survey 16
  • Best in Klas again!
  • DI is a Leader in Dresner’s Industry Excellence Awards
  • The World’s No 1 BI Solution Again!
  • Envisage Cloud Wins Sage 200 Award

Copyright © 2018 Diver BI Group · website by cubecolour