I love Google Photos for the fact that you can search for text in the images. If you remember a part of a inspirational quote, you can just type it in the searchbar and it will appear.
I wanted to see if I could do this for the images on my harddrive and make a webbased interface. We can!
To store the information we use sqlite3, a small database included in Python.
To show the information we use streamlit, a package that webenables your scripts.
Preperation
First have Python installed. Then we need to install these packages (with pip install …)
- pandas
- pytesseract (read this if you have problems installing it)
- sqlite3
- streamlit
Read the text in the images
Here is the code I used to crawl the text in the images. In the code is still the option to write separate .txt-files and a single .txt file with the retrieved text, but the default is storing the information in a sqlite3-database in the start-directory.
Make the interface
Here we have the interface with the help of Streamlit. You can search for a keyword and (part) of the directory/filename. The images are shown only if there are less than 100 results due the fact that iterrows is very slow.
Note that there is also a small method to delete records in the database from a specific directory.
To do
The scripts can be a bit more efficient. The biggest disadvantage is that if you don’t finish, the script doesn’t remember where you ended. I could solve this with a lookup, but this will make the script much slower.
Next step is including PDF’s (especially where the text is included as an image) in the process.
If this tutorial helped you, feel free to buy me a coffee to support my efforts. Much appreciated!
I am also available for custom made solutions and scripts!