Semiconductor USA
Back to Careers

Intern 2020-2021 - Optical character recognition (OCR) based on deep learning AI/Machine Learning Internship

Location: Paris, France
Job Category: Internship

Apply Now
2020-2021 - Optical character recognition (OCR) based on deep learning
AI/Machine Learning Internship - Samsung Strategy and Innovation Center - Paris - 6 months
Keywords: machine learning, optical character recognition, deep learning, embedding device, optical music recognition
----------------------------------------------------------------
OCR based on deep learning + application to Music recognition
----------------------------------------------------------------
Text is everywhere, and when we developped new product being able to easily detect/recognize text in the environment is particularly important.
Numerous technology exists to create an OCR system, some of them are fast but not particurlarly accurate, some other are also open (free
software), some of them are based on machine learning and could be trained on specific dataset (for a list of different available software see:
https://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_software).
In order to build new product we want to be able to recognize new kind of text or symbols using in-house dataset, in other term we are interested
in system that could be retrained easily on specific symbol dataset.
The intership objective is to developped an OCR system based on deep learning and compare it’s performance (accuracy/speed) to other
systems.
More precisely the objectives of the internship are:
State of the art report of OCR system with a focus on limitation of existing systems (Pro-Cons of each), new trends in scientific litterature,
and use of deep learning
Implement an OCR system from scratch based on deep learning, train it on public text dataset
Compare the result to existing systems
Use the same model on music symbol (music sheet dataset) and build an application that can read music sheet and translate it into midifile/
sound
References:
[1] https://nanonets.com/blog/attention-ocr-for-text-recogntion/ -
[2] https://www.reddit.com/r/Python/comments/hdjdq9/easyocr_opensource_ocr_with_40_languages/
[3] https://dropbox.tech/machine-learning/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning
[4] https://en.wikipedia.org/wiki/Optical_music_recognition
[5] http://archives.ismir.net/ismir2017/paper/000034.pdf - optical music recognition based on neural network
[6] https://arxiv.org/abs/1805.10548
-------------------------------------------------------
Samsung Strategy and Innovation Center
-------------------------------------------------------
With offices in San Jose (US), Menlo Park (US), New York (US), Paris (France), Tel Aviv (Israel) and Seoul (Korea), the goal of Samsung
Strategy and Innovation Center (SSIC) is to smartly add artificial intelligence into Samsung products and to promote innovation. Our first lines of
work are the Automated Mobility and the Internet of Things, in order to seek and develop high impact solutions to revolutionize uses. We are
customer-centric, making our technologies respecting privacy. In collaboration with Samsung's business teams, SSIC brings the latest research
innovations to create products optimized by AI, and quickly accessible to users.
https://www.samsung.com/us/ssic/
Apply Now