Select Page

Data Quality- The Start to better insights

One of the things that is not talked about enough is the issue of data quality. There are plenty of  people talking about all the great things we can do with data – from automation of analysis to running advanced AI / ML algorithms. Whilst all these are true and the possibilities are boundless, we need to have clean data to start with. Without clean data you could run int a few problems, namely :

  • Patterns are not clear enough to derive any trends or predictions
  • Miss out on inferences that could add value to your business
  • Or even worse draw wrong inferences that potentially hurt your business
  • Spend lots of time and effort on data that is not properly cleansed and curated

Of course we are talking very much about structured data here. We know and understand that a lot of our clients will have unstructured data where its harder to have a consistent data quality patterns, however they can still be measured by the same data quality patterns that we apply to structured data:

  • Completeness
  • Uniqueness
  • Timeliness
  • Validity
  • Accuracy
  • Consistency

The key thing to remember is that poor data quality can (and does) cost businesses aroudn the world billions in costs and the failure of projects. Helping to identify the issues early and get it right at the start is key to success. This is the guiding principle that we have used to add value to our clients – provide early indicators so they can be remedied at the earliest point of the chain.

PyCell – Smart DQ Tools

At PyCell we are looking at ways of getting the data quality as a core component so you can then really leverage your data for great insights. In our view data quality starts from some really simple principles and this is understanding the datasets, how they change and evolve on a day to day basis. With this in mind we are working on our Smart DQ  toolkit which gives you the answers you need. We start with the basics :

  • The basic meta of the data
    • Number of columns and rows
    • The column headers
  • Changes Day on Day
  • Versioning of files – audit control is key
  • Duplicate rows
  • Flag any changes to the data model / structure
  • Any obvious changes from observed trends in the past

We fundamentally believe that you know your data best. We want to give you the tools so you can understand when something goes wrong immediately. We have the Smart Algos to give you  warning, and we can automate the most tedious tasks, but we want you to be ultimately in control. And we allow you to maintain control with our Smart DQ Dashboards, which give you a quick look at the data you are using in your Workspaces and the DQ score (between 0 and 100) we have calculated. This allows you to very quickly hone in on any issues and take any remedial action that may be necessary.

We are working hard to get you all the features and more! Please check back and sign up for our newsletter, so you are the first to know about new features, and any great offers we have for you.