A simple but useful program written almost entirely by Chat-GPT

The disruption of low code/no code is coming. Soon.

For a long time, I wanted to digitize my heap of business cards and pretty much all the free and paid apps I tried out were a disappointment in one way or another. Digitally connecting with each other via LinkedIn, WhatsApp, and other apps has picked up recently, but exchanging business cards is still commonplace in the Middle East and Africa.

All I needed was a simple personal tool to simplify an otherwise daunting job that’s been haunting me and my desk (with a stack of business cards migrating from one corner to another like a stop-motion movie) for quite some time.

As a techie with an itch, the next natural course of action would be to write a program that does it for you. Even if it takes as long (or longer) to build the program than to actually do the thing manually that the program was supposed to automate, it is worth it.

However, it’s been quite a while since I built a fully functional app that is usable by the masses. My Python is rusty at best and I don’t remember what a Rails or a PHP app looks like. And C# and BASIC are so distant I can hardly recognize them. Sure, I dabble with code from time to time, but it’s almost exclusively for entertainment.

So, I asked the friendly AI assistant, Chat-GPT (version 4) to help me write a simple program for me. And it was a rewarding experience garnished with some existential dilemmas.

Enter Chat-GPT

I was (and still am at the time of this writing) an absolute beginner in using Chat-GPT or any LLM for that matter.

I figured I should be fairly descriptive and process-oriented in my approach if I want a fairly good result.

This was my initial Prompt:

Can you help me write a code that does the following:
1. runs on my local windows 11 machine 
2. read images of business cards from a local folder
3. uses chat gpt-4 to extract these fields from each image: first name, last name, designation, company, email, phone number, mobile number, website, address.
4. format the fields extracted in step 3 into a data record and adds it as a new row to a CSV file called cards.csv
5. logs the potential errors in the file called log.txt
6. announces when the work is done.

before starting, tell me which language you will use for this task and why.

The initial response from GPT4:

I will use Python for this task because it is a versatile and widely-used programming language that has extensive libraries and tools available for image processing, optical character recognition (OCR), and working with files such as CSV and TXT. Specifically, I’ll use the OpenCV library for reading images, pytesseract for OCR, and the built-in CSV and logging modules for handling CSV files and logging errors, respectively.

Here’s the code for the task:

import cv2
import pytesseract
import os
import csv
import logging
from pytesseract import Output

# Configure the logging
logging.basicConfig(filename='log.txt', level=logging.ERROR)

# Set up pytesseract
pytesseract.pytesseract.tesseract_cmd = r'<path_to_tesseract_executable>'
# Replace <path_to_tesseract_executable> with the actual path, e.g., 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'

def extract_fields(text):
    fields = ['first name', 'last name', 'designation', 'company', 'email', 'phone number', 'mobile number', 'website', 'address']
    extracted_data = {field: '' for field in fields}

    for line in text.split('\n'):
        for field in fields:
            if field in line.lower():
                extracted_data[field] = line.replace(field, '').strip()
                break

    return extracted_data

def main():
    input_folder = '<path_to_input_folder>'
    # Replace <path_to_input_folder> with the actual path, e.g., 'C:\\Users\\Username\\Desktop\\BusinessCards'
    output_file = 'cards.csv'

    with open(output_file, 'a', newline='', encoding='utf-8') as csvfile:
        fieldnames = ['first name', 'last name', 'designation', 'company', 'email', 'phone number', 'mobile number', 'website', 'address']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()

        for filename in os.listdir(input_folder):
            if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
                try:
                    img = cv2.imread(os.path.join(input_folder, filename))
                    text = pytesseract.image_to_string(img, lang='eng', config='--psm 6', output_type=Output.TEXT)

                    extracted_data = extract_fields(text)
                    writer.writerow(extracted_data)
                except Exception as e:
                    logging.error(f'Error processing {filename}: {e}')

    print('The work is done.')

if __name__ == '__main__':
    main()

Replace <path_to_tesseract_executable> with the actual path to the Tesseract executable (e.g., ‘C:\Program Files\Tesseract-OCR\tesseract.exe’) and <path_to_input_folder> with the path to the folder containing the business card images.

Make sure to install the required libraries (OpenCV and pytesseract) before running the code:

pip install opencv-python
pip install pytesseract

This code will read images of business cards from the specified folder, extract the required fields using pytesseract OCR, format the extracted data into records, and append them to the ‘cards.csv’ file. Errors encountered during processing will be logged to ‘log.txt’. The program will print ‘The work is done.’ when it has finished processing all images in the folder.

The initial code didn’t work and I started asking questions and sharing error codes one after another. Chat-GPT patiently apologized and made corrections.

The bot iterated the code 3 or 4 times and made little changes here and there while explaining what likely went wrong and what is suggested to do. I had to consult OpenAI documentation once to learn how to use the gpt-3.5-turbo engine until it finally worked and generated somewhat acceptable results.

The program worked as intended, but the results were subpar at first. The main issue was with the (quality) of the prompt that I devised. Now, it all boiled down to how to provide a good prompt for the AI to be able to generate quality results.

I modified the prompt a few times until the results started to show some promise. I can only guess that the prompt can still be improved. No wonder why prompt engineering has become the hottest skill overnight with vacancies popping up everywhere like mushrooms.

Here is the final code after my edits:

import cv2
import pytesseract
import os
import csv
import logging
import openai
from tenacity import retry, stop_after_attempt, wait_fixed
from tenacity import RetryError
from datetime import datetime
from PIL import Image

logging.basicConfig(filename='log.txt', level=logging.ERROR)

pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
fields = ['first name', 'last name', 'designation', 'company', 'phone', 'mobile', 'email', 'website', 'country', 'address']
openai.api_key = "<YOUR API KEY GOES HERE.>"

def extract_text_from_image(image_path):
    img = cv2.imread(image_path)
    text = pytesseract.image_to_string(img, lang='eng', config='--psm 6')

    return text

@retry(stop=stop_after_attempt(3), wait=wait_fixed(1))
def ask_gpt3(text, prompt):
    completions = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": f"""Extract {prompt} from the following text: {text}, ensuring that you
            return only the extracted data omitting field name, extra explanation or sentence, punctuation, or label, and
            if you couldn't extract the data for any reason, just return 'NA' with no extra wording or explanation.
            for phone numbers and mobile numbers, only pick the first one for each.
            """}
        ],
        max_tokens=50,
        n=1,
        stop=None,
        temperature=0.5,
    )

    message = completions.choices[0]['message']['content']

    return message.strip()

def extract_fields_using_gpt3(text):
    extracted_data = {field: '' for field in fields}

    for field in fields:
        extracted_data[field] = ask_gpt3(text, field)
        #I added this lazy way of formatting of the phone and mobile numbers
        if field in ('phone', 'mobile'):
            cleaned_number = str(''.join(c for c in extracted_data[field] if c.isdigit()))
            extracted_data[field] = '+' + cleaned_number

    return extracted_data

def main():
    #change this 
    input_folder = '<PATH TO LOCAL FOLDER CONTAINING THE IMAGES.>'
    output_file = 'cards.csv'

    #I added this 
    print(datetime.now().strftime("%H:%M:%S"))

    with open(output_file, 'a', newline='', encoding='utf-8') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=fields)
        writer.writeheader()

        for filename in os.listdir(input_folder):
            if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
                try:
                    image_path = os.path.join(input_folder, filename)
                    with Image.open(image_path) as img:
                        img.verify()
                    text = extract_text_from_image(image_path)
                    extracted_data = extract_fields_using_gpt3(text)
                    writer.writerow(extracted_data)
                    #I added this premitive progress bar
                    print("-", end=" ", flush=True) 
                except Exception as e:
                    error_message = f'Error processing {filename}: {e}'
                    logging.error(error_message)
                    print(error_message, flush=True)
                    if isinstance(e, RetryError):
                        print("Cause of the RetryError:", e.reraise())
    
    #I added this
    print(datetime.now().strftime("%H:%M:%S"))
    print('The work is done.')

if __name__ == '__main__':
    main()

Well, I think the code speaks for itself. Yes, it’s not the best code you’ve seen and I’m not an expert programmer anymore. But in my humble opinion, it’s breaking new ground in how we build computer programs. If you asked me to do this without any “help”, I would’ve spent at least a day refreshing my rusty Python knowledge, reading a bunch of documentation, and manually debugging and improving the code.

The cruicial point here is that the Python code feeds a fairly primitive conversational prompt into the GPT-3.5 engine itself to extract the contact information from a given block of text. And it does it not by me dealing with regex and some other complicated code, but simply by AI sorcery.

This is the future, the beginning of a new era of versatile LowCode-NoCode and rapid software development.

A group of engineers at work perhaps in a parallel universe; imagined by Midjourney V.5

What I demonstrated in here was a very idea of the immense possibilities of building software with AI assistance, something that was just a pipedream for decades, and until a few months ago.

There are thousands of talented people coming up with genius ways to use Chat-GPT in building software every day. All you need to do is to get on Twitter and see for yourself. Just stay away from the crypto-bros turned AI-bros.

Companies too, in tech and other industries, have started incorporating the capabilities of AI, and Chat-GPT in particular, into their existing ecosystem to bring these capabilities to the masses.

One of the key areas would be LCNC (low code/no code) development. So far, leading LCNC platforms allow users to eliminate being exposed to code mostly or completely when building business apps. This makes it easier for non-techies to build apps and automate their processes with minimal cost, time, and risk. However, you’ll need a certain level of IT literacy to be able to do anything meaningful.

In the post-Chat-GPT world, a knowledge worker would simply “talk” to the computer sharing their vision of what their app should look like and do, and then leave the rest to the machine. Who knows, maybe apps as we know them cease to exist and it’ll be just us talking and teaching the machines what we want and then that’ll be it. Soon, you’ll be reserving a seat for AI in your brainstorming meetings.

Disclosure: No, I did not use ChatGPT or any AI assistant to write this article, except of course, the prompts and results are posted in here verbatim as examples.

A simple but useful program written almost entirely by Chat-GPT

Enter Chat-GPT

Low-code is dead. Long live low-code.

Leave a Reply

Enter Chat-GPT

Similar Posts

Leave a Reply