Home
Blog
Uncategorized
Kmeans clustering with fish dataset

Kmeans clustering with fish dataset

Posted by Myla RamReddy
Categories Uncategorized
Date January 17, 2018
Comments 0 comment

# Step 1: Understand data

http://ww2.amstat.org/publications/jse/datasets/fishcatch.txt

https://drive.google.com/open?id=1P2YzTua5ZMEAdxnMbfDwv19VSKY4F8ZI

#Step 2: Load data

# Import modules
import pandas as pd

import numpy as np
df = pd.read_csv(“fish.csv”)

y = df[‘Species’].values
type(y)
X = (df[df.columns[[1,2,3,4,5,6]]].values)
type(X)

#Step 3: Work with StandardScaler and Kmeans

# import modules
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

# Create scaler: scaler
scaler = StandardScaler()

# Create KMeans instance: kmeans
kmeans = KMeans(n_clusters=4)

# Create pipeline: pipeline
pipeline = make_pipeline(scaler, kmeans)

# Fit the pipeline to samples
pipeline.fit(X)

# Calculate the cluster labels: labels
labels = pipeline.predict(X)

# Create a DataFrame with labels and species as columns: df
df1 = pd.DataFrame({‘labels’: labels, ‘species’: y})

# Create crosstab: ct
ct = pd.crosstab(df1[‘labels’], df1[‘species’])

# Display ct
print(ct)

Passionate about Digital Transformation and the field of Data Science. Currently, AD of ML/AI at CMMI level company. Iam Ram Reddy, is AD of Product AI and Machine Learning in one of the CMMI level company. In this role, he oversees the product strategy for AI/ML PaaS offerings. Apart from above role he is a CEO and Founder of DataHexa.com DataHexa.com provides an industry leading enterprise data science platform that combined the tools, libraries, and languages data scientists loved with the infrastructure and workflows their organizations needed. DataHexa.com is offering corporate, online and Class Room trainings on Data Science, Machine Learning, Deep Learning, Natural Language Process, Big Data, Spark, and Artificial Intelligence A sought-after speaker and expert on digital transformation, data science, big data and performance-based analytics. He has Extensive background in all phases of BI Project life cycle including requirements gathering, developing prototypes, functional specification, design implementation, debugging, testing and documentation. Good hands on experience in configuring and developing the ETL mappings /RPD/Reports/Dashboards. Very good exposure in interacting with customer’s on gathering business requirements and handling issues on the customer site. Experience  Working as a AD and gained 15 years of hands on experience.  Working as a CEO and trainer, gained 9 years of experience in training  Trained more than 5000 resources worldwide.  Training partner almost 30+ companies like CGI, CTS, UNISYS, EMERSONIT and FCS ...etc. .  Executed more than 10 full life cycle projects Education MTech in Electronics MSc in Statistics PHD in Stats and AI pursuing

Leave A Reply Cancel reply

You must be logged in to post a comment.

error: Content is protected !!