pls contact 8125424511
detecting
phishing pages using machine learning
ABSTRACT
Malicious Web sites largely promote the growth of Internet criminal activities and constrain the development of Web services. As a result, there has been strong motivation to develop systemic solution to stopping the user from visiting such Web sites. We propose a learning based approach to classifying Web sites into 2 classes: Legitimate and Malicious. Our mechanism only analyses the Uniform Resource Locator (URL) itself without accessing the content of Web sites. Thus, it eliminates the run-time latency and the possibility of exposing users to the browser based vulnerabilities. By employing learning algorithms, our scheme achieves better performance on generality and coverage compared with blacklisting service.
URLs of the websites are separated into 2 classes:
Legitimate: Safe websites with normal services
Phishing/Malicious: Website performs the act of attempting to flood the
user with advertising or sites . Website created by attackers to
disrupt computer operation, gather sensitive information, or gain access to
private computer systems.
EXISTING SYSTEM
A poorly structured NN model may cause the model to underfit the training
dataset . On the other hand, exaggeration in restructuring the system to suit
every single item in the training dataset may cause the system to be overfitted
. One possible solution to avoid the Overfitting problem is by restructuring
the NN model in terms of tuning some parameters, adding new neurons to the
hidden layer or sometimes adding a new layer to the network. A NN with a small
number of hidden neurons may not have a satisfactory representational power to
model the complexity and diversity inherent in the data. On the other hand, networks
with too many hidden neurons could overfit the data. However, at a certain
stage the model can no longer be improved, therefore, the structuring process
should be terminated. Hence, an acceptable error rate should be specified when
creating any NN model, which itself is considered a problem since it is
difficult to determine the acceptable error rate a priori . For instance, the
model designer may set the acceptable error rate to a value that is unreachable
which causes the model to stick in local minima
or sometimes the model designer may set the acceptable error rate to a
value that can further be improved.
DISADVANTAGE
It will take time to load all the dataset.
Process is not accuracy.
It will analyze slowly.
PROPOSED SYSTEM
Lexical features are based on the observation that the URLs of many illegal
sites look different, compared with legitimate sites. Analyzing lexical
features enables us to capture the property for classification purposes. We
first distinguish the two parts of a URL: the host name and the path, from
which we extract bag-of-words (strings delimited by ‘/’, ‘?’, ‘.’, ‘=’, ‘-’ and
‘’).
We find that phishing website prefers to have longer URL, more levels
(delimited by dot), more tokens in domain and path, longer token. Besides,
phishing and malware websites could pretend to be a benign one by containing
popular brand names as tokens other than those in second-level domain. Considering
phishing websites and malware websites may use IP address directly so as to
cover the suspicious URL, which is very rare in benign case. Also, phishing
URLs are found to contain several suggestive word tokens (confirm, account,
banking, secure, ebayisapi, webscr, login, signin), we check the presence of
these security sensitive words and include the binary value in our features.
Intuitively, malicious sites are always less popular than benign ones. For this
reason, site popularity can be considered as an important feature. Traffic rank
feature is acquired from Alexa.com. Host-based features are based on the
observation that malicious sites are always registered in less reputable
hosting centres or regions.
ADVANTAGE
All of URLs in the dataset are labelled.
Acuuracy is
high.
We used two supervised learning algorithm random forest to train using scikit-learn library.
ARCHITECTURE:
FEATURES:
having_IP_Address
URL_Length
Shortining_Service
having_At_Symbol
double_slash_redirecting
Prefix_Suffix
having_Sub_Domain
SSLfinal_State
Domain_registeration_length
Favicon
port
HTTPS_token
Request_URL
URL_of_Anchor
Links_in_tags
SFH
Submitting_to_email
Abnormal_URL
Redirect
on_mouseover
RightClick
popUpWidnow
Iframe
age_of_domain
DNSRecord
web_traffic
Page_Rank
Google_Index
Links_pointing_to_page
Statistical_report
video output
1 comments:
comments
ReplyHi! This is my first comment here so I just wanted to give a quick shout out and say I genuinely enjoy reading your blog posts. Can you recommend any other Beauty Guest Post blogs that go over the same topics? Thanks a ton!
We are an Outsourcing Company with a decade of experience in call center Grow your business exponentially by outsourcing your work to us.
Thanks.
call center
bpo
business process outsource
web development
seo
web disigning
it services
indound services
outbound services
business growth
call center services
manpower outsourcing
manpower recruitment
telesales
cctv monitoring
lead generation
live chat support
data entry
thank you for your comment
pls call me on 8125424511