Deep learning is one of the most interesting innovations of the twenty-first century. Born from the neuroscience studies of the early forties and evolved over time, it has become what most similar to an artificial intelligence humanity has been able to create. Deep[ Learning is based on complex structures called neural networks, which mimic the functioning of neurons in our brain combined with optimization algorithms to allow a machine to learn from the observation of results. This thesis explores one of the many uses of these structures that is the study of the voice, focusing on speech recognition methods and analyzing their functioning in details.