WUT_Computer_Science/final/report/report.tex

76 lines
2.9 KiB
TeX

\documentclass[12pt]{article}
\usepackage{listings}
\usepackage{hyperref}
\usepackage{graphicx}
\title{EARIN project Final report}
\author{Krzysztof Rudnicki \\ Jakub Kliszko}
\begin{document}
\maketitle
\section{Introduction}
The goal of our project was to create a model for anime reccomender \\
After entering anime name from the database model should output recommended animes
\section{Used data and algorithms}
\subsection{Data}
We used different dataset from originally specified in the project description \\
We decided to use Anime Recommendation Database from Kaggle: \href{https://www.kaggle.com/datasets/hernan4444/anime-recommendation-database-2020}{LINK} \\
Main reasons why we decided to use this database was that it was bigger than original one, was more recent, it was described as being 100\% usable by Kaggle and still had decent amount of code examples \\
We are mostly interested in rating\_complete.csv file which contains information about anime ratings from users who completed the anime
\subsection{Algorithms}
We decided to use collaborative filtering to develop our model, It makes personalized recommandations based on preferences of similar users \\
We represent anime data-set as embedding vector \\
We use K-nearest neighbors model and decided to test it out with different metrics, neighbors and algorithms \\
\subsubsection{Algorithms}
We decided to test our model with 2 algorithms:
\begin{enumerate}
\item Brute
\item Auto
\end{enumerate}
Ball Tree and KD Tree do not work on sparse input (as is the case with our input) so we decided to omit them
\subsubsection{Neighbor number}
We decided to test our model with 5 different neighbor amount:
\begin{enumerate}
\item 5 - Popular starting point for small-medium datasets
\item square root of available data - Usually helps to balance between underfitting and overfitting
\item half of available data - Usually usefull for checking overall trend than specific nuances
\item logarithm of available data - Used for very large datasets
\item n-1 neighbors - Usually leads to overgeneralization as we use all instances excepct one for prediciton
\end{enumerate}
\subsubsection{Metrics}
For brute algorithm we tested it will all possible metrics:
\begin{enumerate}
\item Cityblock
\item Cosine
\item Euclidean
\item l1
\item l2
\item Manhattan
\end{enumerate}
\section{Intermediate results}
\subsection{Results}
\subsection{Insights}
\section{Using program}
\subsection{Arguments}
\subsubsection{Default arguments}
\subsubsection{Reproducing}
\section{Final experimental results}
\subsection{Experiments}
\subsection{Results}
\subsection{Disussion}
\subsection{Comparison}
\section{Challenges}
\subsection{Challenges themselfes}
\subsection{Tackling challenges}
\section{Conclusions}
\paragraph{Best algorithm}
\subsection{Solution satisfaction}
\subsection{Potential improvements}
\end{document}