Identification of significant features and data mining techniques in predicting heart stroke
Abstract
• Heart disease is the one of the most common disease.
• This disease is quite common now a days we used different attributes which can relate to this heart diseases well to find the better method to predict and we also used algorithms for prediction.
• Logistic regression is used on dataset based on risk factors.
• The prediction of heart disease based on the Dataset attributes.
• The results gives the accuracy of the logistic regression algorithm based on the dataset we have taken.
• And also we find out the co efficient and confuse matrix for the regression output.
• LR is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis.
• Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
DOMAIN INTRODUCTION
Data mining is the computing process of discovering patterns in large datasets involving methods at the intersection of machine learning, statistics and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD.
Data mining is about finding new information in a lots of data.The information obtained from data mining is hopefully both new and useful.
Data mining is the process of extracting previously unknown information from large databases or data warehouses and using it to make crucial business decisions. Data mining tools find patterns in the data and infer rules from them. The extracted information can be used to form a prediction or classification model, identify relations between database records, or provide a summary of the database(s) being mined. Those patterns and rules can be used to guide decision-making and forecast the effect of those decisions, and data mining can speed analysis by focusing attention on the most important variables.
Data mining is taking off for several reasons: organizations are gathering more data about their businesses, the enormous drop in storage costs, competitive business pressures, a desire to leverage existing information technology investments, and the dramatic drop in the cost/performance ratio of computer systems. Another reason is the rise of data warehousing. In the past, it was often necessary to gather the data, cleanse it, and merge it. Now, in many cases, the data is already sitting in a data warehouse ready to be used.
There are four basic mining operations supported by numerous mining techniques: predictive model creation supported by supervised induction techniques; link analysis supported by association discovery and sequence discovery techniques; database segmentation supported by clustering techniques; and deviation detection supported by statistical techniques.
There are four basic mining operations supported by numerous mining techniques: predictive model creation supported by supervised induction techniques; link analysis supported by association discovery and sequence discovery techniques; database segmentation supported by clustering techniques; and deviation detection supported by statistical techniques.
In sequences, events are linked over time. Classification is probably the most common data mining activity today. It recognizes patterns that describe the group to which an item belongs. It does this by examining existing items that already have been classified and inferring a set of rules from them. Clustering is related to classification, but differs in that no groups have yet been defined. Using clustering, the data mining tool discovers different groupings within the data. All of these applications may involve predictions. The fifth application type, forecasting, is a different form of prediction. It estimates the future value of continuous variables based on patterns within the data. A number of tools are used in data mining. These include, but are not limited to, neural networks, decision trees, rule induction, factor analysis, genetic algorithms, and data visualization.
Using the described data mining tools, an organization can access and analyse the 10 percent of its information that is structured. To access the rest, a different technique is required – document mining.
The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
EXISTING SYSTEM
• Mohammed Abdul Khaleel has given paper in the Survey of Techniques for mining of data on Medical Data for Finding Fre-quent Diseases locally.
• This paper focus on dissect information mining procedures which are required for medicinal information mining particularly to find locally visit illnesses, for example, heart infirmities, lung malignancy, bosom disease et cetera. In-formation mining is the way toward extricating information for finding inactive examples which Vembandasamy et al. performed a work, to analyze and detect heart disease. In this the algorithm used was Naive Bayes algorithm. In Naïve Bayes algorithm they used Bayes theorem. Hence Naive Bayes has a very power to make assumption independently. The used data-set is obtained from a diabetic research institutes of Chennai, Tamilnadu which is leading institute. There are more than 500 patients in the dataset. The tool used is Weka and classification is executed by using 70% of Percentage Split. The accuracy offered by Naive Bayes is 86.419%.
DISADVANTAGES
Ø Incorrect classification Results.
Ø Accuracy of the classification low
PROPOSED SYSTEM
Ø The proposed model is introduced to overcome all the disadvantages that arises in the existing system.
Ø Here we used logistic regression for predicting the stroke of the user in a dataset.
Ø The coefficients (Beta values b) of the logistic regression algorithm must be estimated from the training data. This is done using maximum-likelihood estimation.
Ø Maximum-likelihood estimation is a common learning algorithm used by a variety of machine learning algorithms, although it does make assumptions about the distribution of our data.
ADVANTAGES
• High performance.
• Provide accurate prediction results.
thank you for your comment
pls call me on 8125424511