Building an MVP is an exciting time for any business. Recently we were approached by an experienced academic at one of the top two UK Universities that’s working closely with industry and was looking to better understand data in the industry. This is always a tricky problem.
There were two objectives in this case
- Create an MVP that solves the problem for the academic and industry partner
- Create something that adds enough value that it would warrant further monthly investment from the industry partner
Much of this project is under NDA so we cannot talk about the problem itself but we can talk about the general solution and the approach.
Solving these problems require a range of skills.
1. We first had to understand and break down the problem. This is no easy feat. It takes a good number of hours or days really pulling the problem apart, maybe writing some code, investigating technologies and figuring out a general direction and a hierarchy of what parts of the problem are more valuable than others so everyone is on the same page.
- Next we considered the simplest possible thing we could technically deliver which would add value to the client right away
- We then needed to assign roles and responsibilities
- Finally we build the solution, always staying in constant communication and making minor adjustments on the fly if needed
The proposed solution was a kind of analytics engine that was a combination of proprietary and internet scraped data. Starting this way meant we could built something that would provide insight into the problem while also giving the client a way to do generalised analytics on a previously nonexistent dataset.
We started by choosing two things. Firstly the data we needed to analyse and the technology we were going to build the engine on top of. We went with an AWS architecture. This meant we could scale easily and deal with large amounts of data easily. We linked up a whole range of micro services to deliver the product.
The AWS pieces
- We had an infinite flat file database on AWS S3
- A DynamoDB database to help us do queries.
- We had proprietary code running in lambda functions and docker containers to help cut down on the need for infrastructure
- We used SQS to help us manage the disconnected nature of our platform.
- We also had a simple website for monitoring and AWS’s Elastic search doing some of the heavy listing.
The flow of data
We setup the data such that we could continually do operations on it and ad new workers in at any time. It meant that we could start by scraping and looking at simple flat file data and later down the line we were able to add in a machine learning processing node.
The MVP was well received and delivered on it’s objective. We are excited to be doing more on this project already and we are looking forward to see where it goes.
Coming soon is a tutorial series on how to setup your own analytics engine.