Disease Predictor and Recommender System

Overview: 4-5 minute read

This project is in addition to the previous project - BlockChain Authenticator

This part of my project is based on Machine Learning which focuses on predicting a disease based on the symptoms shown by the patient. I used SVM, Logistic Regression and Decision Tree algorithm to predict the disease and in the end, developed a voting system which took the outputs of the diseases predicted by the 3 algorithms and selected the one predicted by the majority of the algorithms. Then, as per the disease predicted, a doctor for the same will be recommended using a recommender system.

Disease Predictor

I had a data-set which mapped each symptom to a disease along with its TF-IDF (Term Frequency Inverse Document Frequency) with 321 symptoms and 4219 diseases. So, in order to get the symptoms for a particular disease, I looped all the symptoms looking for it's corresponding diseases; and once all the symptoms for that disease were determined, a new record in the dataset corresponding to that disease was created.

Using the above described dataset, I trained my model using the algorithms discussed above and reached the goal accuracy of 75%

Recommender System

I successfully created a data set of all the doctors in Delhi. Our dataset comprises of 3000 doctors, their speciality, degrees and their addresses. After transforming the dataset into an excel file, I used Google Application Program Interface (API) to find the latitude and longitude of each doctor's address so that it could be plotted on a map and also to determine the distance of the doctor from the user. A random rating (out of 10), and a random cost per meeting was assigned to each doctor.

Our recommendation algorithm filters out results as per the user requirements. The user can filter the doctors by cost, rating or distance. The resulting Doctors are then displayed on a map along with a table showing the information about each doctor.

  • Selected in top 3 projects in the entire university out of more than 1000.

In depth: 7-12 minute read

Introduction

This project is in addition to the previous project - BlockChain Authenticator

The biggest concern in this era is trust; No one is willing to trust anyone. The latest trend in development of trust is the Blockchain technology and we now bring it into the medical branch. EHRs (Electronic Health Records) are the basis of storing the health records for the patients these days, but they are not decentralised and they can be modified or deleted by an authorised person or by a hacker easily. The base for this project is to give all the power to the patient by implementing the EHRs System on blockchain, making the records unmodifiable and not deletable show-ing transparency with whomsoever we want to share our records. The other part of our project is based on Machine Learning which focusses on predicting a disease based on the symptoms shown by the patient. We will be using SVM, Logistic Regression and Decision Tree algorithm to predict the disease and in the end, we will develop a voting system which takes the votes of the disease predicted by the 3 algorithms and select the majority. Then as per the disease predicted, a doctor for the same will be recommended using a recommender system.

Problem Statement

Data is one thing in this world everyone is after. The ley man is afraid to give up his personal details into the world for privacy reasons. The lay man in this world is also very busy and doesn’t has enough time to spend on his health and take care of it properly. Thus, our aim is to develop a system which is secure and also one which saves the time of the user by predicting his disease from his symptoms and recommend the most suitable doctor accordingly.

Disease Predictor

Our Disease Predictor will take in various amounts of symptoms as input and predict a

disease based on the symptoms. It will be based on three machine learning algorithms:

  • Logistic Regression

  • Naive Bayes

  • Decision Tree Classification

Creating the Data-Set

We had a data-set which mapped each symptom to a disease along with its TF-IDF with 321 symptoms and 4219 diseases. So, in order to get the symptoms for a particular disease, we had to run a for loop on all the symptoms looking which one had the corresponding disease; and once we found out all the symptoms, we made a new dataset which had the symptoms for each disease.

Once we had created our dataset, it was time to convert it into a dataset which the machine could understand. The new dataset had the rows as all the diseases and the columns as all the symptoms. So if a particular disease had a particular symptom, it was marked as 1 and if not, it was marked as 0. In the end we had a dataset of dimensions: 4921 X 321 (diseases X symptoms).

Recommender System

A recommender System will be utilized to suggest a specialist for the patient who is best appropriate for that specific infection.

A recommender framework or a suggestion framework (in some cases "framework" is re-put with a comparable, for example, "stage" or "motor") is a tract of data permeating frameworks which try to appraise the "rating" or "inclination" a client would provide for a curious perspective. Recommender frameworks for the most part works with a few kinds of general data: Idiosyncratic data: This is the data about things (classes, watchwords, and so forth.) and clients (profiles, inclinations, and so forth.) which exceptionally characterize every client. Client - Item connections: This is data, for example, likes, number of buys, appraisals, and so forth which are exceptionally given by every client. In view of the data above, we touch base at a first sort of prescribing frameworks: content-based, which utilizes quirky data, and the second kind of proposal frameworks: collective separating, which depends on client - thing cooperations. The third type is a blend of both: Hybrid frameworks join the two sorts of learning with the point of turning away entanglements which are produced when working with only one kind.

Collaborative Filtering

These frameworks utilize some unique sorts of strategies to consider the client collaboration with a specific gathering of things. A network can be utilized to effortlessly picture the arrangement of collaborations where every section (i,j)(i, j)(i,j) is utilized to speak to the cooperation between client iii and thing jjj separately. A fascinating way we can pick up understanding on cooperative separating is by not taking a gander at it as a proposal issue, however by thinking of it as a speculation of characterization and relapse. By and large cases we mean to anticipate a variable which is depended straightforwardly on different factors or highlights, though in synergistic sifting, refinements, for example, include factors or class factors don't exist.

Data Set

We had successfully created a data set of all the doctors in Delhi. Our dataset comprises of 3000 doctors, their speciality, degrees and their addresses. After transforming the data-set into an excel file, we had used google Application Program Interface (API) to find the latitude and longitude of each doctors address so that it could be plotted on the map and also so that we could find the distance of the doctor from the user. And we had given a random rating (out of 10), and a random Cost per Meeting to each doctor.

Implementation

After finishing the dataset, we had created a user interface where the user would enter the speciality of the doctor he needed. Then our recommendation algorithm would filter out results as per the user requirements. The user could choose whether to filter each doctor by Rate, Rating or Distance. The resulting Doctors were then displayed on the map along with a table showing the information about each doctor.