I’m a Data Scientist/Analyst with a background in Statistics and Agronomy. My current job focus is on data cleaning and integrate databases with customers' ERP systems. I am also interested in Artificial Intelligence using NLP and while I am not doing that, I enjoy riding my bicycle, watching movies and reading some good books.
Provided data EDA service to customers prior to database integration. The most I have done was 50k packages shipped in a week after EDA. Delivered BI analysis for customer if they have questions about the discrepencies between their invoices and and the shipping charges from the shipping system. Supported UPS's internal and external IT needs throughout the entire Southern AZ. Work well under tight deadlines and good with time management.
Assigned and supervised the quality and operation departments. Making data driven decisions and helped startup the company within 2 months of deployment. Hired entire factory staff and workers with help from HR department and Industrial Engineering department.
My main goal as trainee per the CEO was to acquire industrial cetificates such as 5S, TPM, TQM, 6 sigma and perform BI analysis for the operation and QC department.
Collaborated with PI Dr. Hwu and worked on watermelon and sorghum species identification using genetic markers and performed designed of experiment analysis on the results.
Performed data cleaning on a messy bank loan marketing data.
Demonstrated methods to reveal missing values and outliers and
how to deal with them. Visualized charateristics of end results
• Skills/Tools: Google Colab, Pandas, Matplotlib, Seaborn, Numpy
• Picture from Freepik
Performed data cleaning on Pima Indian diabetes dataset. There were some
values in the dataset that were invalid if the test subject is
alive, namely bloodpressure 0 and skin thickness 0. We attempt to explain this
scenario and correct them with reasonable values. After that we use boxplot
and other visualization tools to review charateristics of the dataset.
• Skills/Tools: Seaborn, Pandas, IQR, boxplot, univariate and and multivariate plot, Z-Score
• Picture from Freepik
There's a saying, a picture worth a thousand words. In this section we
will be using visualization tools to display the data and paint a picture
of the data and let the data tell us stories of these women
• Skills/Tools: Tableau, Google Data Studio
• Picture from Freepik
Performed diabetes classification on patients from Pima County.
Utilized several classification algorithms to classify diabetes
patients. Discussed ways of handling small imbalanced dataset.
Talked about data leakage and the reason behind up/down sampling.
• Skills/Tools: Pandas, Numpy, Matplotlib, Seaborn,
Pandas Profiling, LogisticRegression, RandomForestClassfier,
XGBClassifier, KNeighborsClassifier, GridSearchCV,
• Picture from Freepik
Performed data cleaning on a messy bank loan marketing data.
Demonstrated methods to reveal missing values and outliers and
how to deal with them. Visualized charateristics of end results
• Skills/Tools: Google Colab, Pandas, Matplotlib, Seaborn, Numpy
• Picture from Freepik
Stay tuned, researching ideas and looking for good stuff to present.
• Skills/Tools: NA
• Picture from Freepik
GPA: 4.0
GPA: 4.0
GPA: 3.0
Apart from being a data scientist/analyst, I enjoy most of my time being outdoors. I enjoy riding my bicycle in the loop (Tucson bike path), watching movies, and reading some good books.
When forced indoor, I spend a large amount of my free time exploring the latest technolgy advancements in the data science world.