WUT_Computer_Science/final/report/report.tex

\documentclass[12pt]{article}
\usepackage{listings}
\usepackage{hyperref}
\usepackage{graphicx}
\title{EARIN project Final report}
\author{Krzysztof Rudnicki \\ Jakub Kliszko}
\begin{document}
\maketitle
\section{Introduction}
The goal of our project was to create a model for anime reccomender \\
After entering anime name from the database model should output recommended animes
\section{Used data and algorithms}
\subsection{Data}
We used different dataset from originally specified in the project description \\
We decided to use Anime Recommendation Database from Kaggle: \href{https://www.kaggle.com/datasets/hernan4444/anime-recommendation-database-2020}{LINK} \\
Main reasons why we decided to use this database was that it was bigger than original one, was more recent, it was described as being 100\% usable by Kaggle and still had decent amount of code examples \\
We are mostly interested in rating\_complete.csv file which contains information about anime ratings from users who completed the anime
\subsection{Algorithms}
We decided to use collaborative filtering to develop our model, It makes personalized recommandations based on preferences of similar users \\
We represent anime data-set as embedding vector \\
We use K-nearest neighbors model and decided to test it out with different metrics, neighbors and algorithms \\
\subsubsection{Algorithms}
We decided to test our model with 2  algorithms:
\begin{enumerate}
  \item Brute
  \item Auto
\end{enumerate}
Ball Tree and KD Tree do not work on sparse input (as is the case with our input) so we decided to omit them

\subsubsection{Neighbor number}
We decided to test our model with 5 different neighbor amount:
\begin{enumerate}
  \item 5 - Popular starting point for small-medium datasets
  \item square root of available data - Usually helps to balance between underfitting and overfitting
  \item half of available data - Usually usefull for checking overall trend than specific nuances
  \item logarithm of available data - Used for very large datasets
  \item n-1 neighbors - Usually leads to overgeneralization as we use all instances excepct one for prediciton
\end{enumerate}

\subsubsection{Metrics}
For brute algorithm we tested it will all possible metrics:
\begin{enumerate}
  \item Cityblock
  \item Cosine
  \item Euclidean
  \item l1
  \item l2
  \item Manhattan
\end{enumerate}

\section{Intermediate results}
\subsection{Results}
\subsection{Insights}

\section{Using program}
  \subsection{Arguments}
    \subsubsection{Default arguments}
    \subsubsection{Reproducing}

\section{Final experimental results}
\subsection{Experiments}
\subsection{Results}
\subsection{Disussion}
\subsection{Comparison}

\section{Challenges}
\subsection{Challenges themselfes}
\subsection{Tackling challenges}

\section{Conclusions}
\paragraph{Best algorithm}
\subsection{Solution satisfaction}
\subsection{Potential improvements}


\end{document}