Rafi Rahman
Data Scientist
Rafi Rahman
Data Scientist
Data Scientist
Data Scientist
With a foundation built from my Mechanical Engineering degree and experience as a Data Recruitment Consultant, I bring a unique blend of skills to the realm of data science. Fuelled by a deep understanding of business needs, my passion lies in leveraging data to drive growth and innovation.
I am dedicated to helping companies extract valuable insights from their data, enabling key stakeholders to make informed decisions that propel success. Whether it's developing predictive models to unravel market dynamics, identifying growth opportunities or optimising resource allocation, I am committed to delivering impactful solutions.
Please click icons to direct you to GitHub Repo
In this project, I focused on identifying the top 10 countries globally with the least healthcare access, using data from GapMinder. The context was framed around the mission of a hypothetical company, "Doctors Go Anywhere," aiming to pinpoint regions that could benefit from enhanced healthcare support.
Not only did I aim to determine these areas, but I also delved into the correlation between the financial investments made by local governments and their respective rankings in healthcare access. Additionally, I investigated how life expectancy correlates with a country's position in these rankings, providing a comprehensive analysis of the complex interplay between healthcare access, government investment, and overall well-being.
In this project, I tackled the challenge of predicting housing sale prices for a new development project set to redesign and modernize Ames. With the goal of comprehensively understanding the cost implications associated with land acquisition and ensuring fair compensation for property owners, I employed regression modelling techniques.
The process involved extensive research into the factors influencing real estate development costs, emphasizing the significant investment and planning required for large-scale projects. Leveraging a dataset from Kaggle, featuring crucial features such as square footage, year built, and overall quality, I created a baseline Linear Regression model. The methodology included pre-processing steps like polynomial transformation and standard scaling, contributing to the refinement of subsequent models. Ultimately, a Ridge CV Regression model stood out, demonstrating superior performance in predicting housing prices. The project's conclusion highlighted a predicted cost of $157,135,272.02 with an estimated variation of $17,141,656.37. Recommendations for future enhancements included incorporating more details about the houses, exploring neighbourhood and house style influences, and investigating the impact of remodelling and additional constructions on the model.
Within the conceptual framework of a collaborative effort with the fictitious Scammer Payback YouTube channel, I orchestrated a project to address the simulated issue of scams using contextually generated data from a hypothetical scenario on Reddit. The primary objective involved developing two theoretical models. The first model aimed to categorize posts into binary distinctions of 'Scams' or 'Not Scams,' establishing a foundational layer for identification. Subsequently, the second model, situated within the context of this hypothetical setting, extended the capabilities by classifying scams into more specific categories of 'Phishing' or 'Malware,' contributing to a comprehensive and entirely contextual approach to tackling deceptive tactics online.
The initial filtering model achieved an impressive 99.7% accuracy on the test data, establishing itself as the baseline Logistic Regression model for distinguishing 'Scam' from 'Not Scam' posts. As for the Phishing and Malware categorization models, a comparative analysis reveals that a Random Forest model outperformed its counterparts. Despite the challenges posed by the close proximity of Malware and Phishing data, with their nuanced terminologies and reliance on experiential postings rather than direct scammer-generated content, Model 4 exhibited a balanced approach, avoiding strong bias toward one category over the other. This decision underscores the nuanced nature of scam detection and the importance of striking a balance in categorizing deceptive tactics within this hypothetical context.
In this speculative project, FF Consulting collaborated with the Department of Health and Human Services to analyze the Covid-19 pandemic's impact at state and county levels. Our goal was to enhance pandemic response strategies and formulate a resiliency plan for potential future outbreaks, utilizing machine learning models. Drawing insights from diverse data sources encompassing demographics, health conditions, policies, and socio-economic factors, our models explored the complex influences on Covid-19 outcomes.
Despite the fictional nature of the scenario, the 2023 analysis uncovered intriguing findings, emphasizing the role of inequality in shaping pandemic outcomes. Metrics such as racial demographics, housing problems, and access to clean water demonstrated substantial relationships with Covid-19 deaths. The hypothetical recommendation for future pandemic preparedness centers on addressing socio-economic disparities, fostering trust in public health measures, enhancing healthcare access, and boosting vaccination rates. As part of our imaginary continuous improvement strategy, we suggest further exploration of feature engineering steps and data collection, including combining similar variables and utilizing PCA for dimensionality reduction. Hypothetical future models could delve into time-series analyses and clustering methodologies for a more comprehensive understanding of the evolving pandemic landscape.
In response to the intricate dynamics of the used car market, our project, MarketMaster, aimed to alleviate the challenges faced by individual sellers in determining accurate market values for their vehicles. Leveraging machine learning models, we harnessed data from diverse sources, including Cazoo, BuyaCar, and Motorpoint, to mitigate biases and offer precise price predictions. The initiative was sparked by the significant fluctuations in the UK's second-hand car market, valued at $117.69 billion in 2021 and projected to reach $226.16 billion by 2027.
This machine learning model, anchored by an ExtraTrees ensemble method, demonstrated exceptional accuracy, boasting an impressive R-squared score of 99.21% on the test data. The mean absolute error (MAE) of 136.8079 underscored the model's capability to provide reliable price predictions. By encapsulating our findings in a user-friendly Streamlit web application, we empowered sellers to make informed decisions based on their vehicle's specifications. Looking forward, potential enhancements include expanding data sources for broader market representation, exploring advanced machine learning algorithms, and developing a mobile application for convenient on-the-go access to price predictions.
Powered by GoDaddy