Efficient Data Analysis: How ChatGPT Can Help Extract and Summarize Relevant Information from customer reviews
Reviews left by customers on a company’s website can be incredibly valuable for the business. They provide insight into areas that need improvement and can help the company make necessary changes to their service. The text of these reviews is often classified as either positive or negative, a task known as “Sentiment Analysis.” However, as a customer leaving a review, we do more than just express whether we had a good or bad experience. We also provide specific feedback on areas of friction, allowing the company to take action to address them.
Nevertheless, It can be difficult to manually extract specific complaints from a customer review. An automatic tool could be incredibly valuable in this regard. It could quickly and accurately identify specific complaints and areas of dissatisfaction, providing the company with actionable insights that can be used to improve their service.
Why not using ChatGPT to uncover customer complaints and dissatisfaction from reviews left on a website? As an example, I will focus on extracting a list of points to improve on a website from the reviews left by the customers. As a use case, I will be using www.triphobo.com, a website that allows users to plan their trips and has a section for customer reviews.
Data Scraping
Initially, I used ChatGPT to generate code for scraping the website and downloading the customer reviews. I wanted to be as lazy as possible, so I just visited the site and extracted the HTML code of one review, to show ChatGPT the structure it should be working with. I then gave the following prompt:
Make a python code to scrape the reviews from https://www.triphobo.com/tripplanner/reviews?page=1
Each review has this format:
<li id="node-5f040e29bedbd25bf85cf73f" class="js-block-list-item">
<div class="single-testimonial-section">
<div class="clearfix">
<img data-src="" class="user-pics" src="https://ak.jogurucdn.com/resources_version/desktop/img/default/user/t2/default_1_new.jpg" alt="user-avatar" style="transition: opacity 0.3s ease-in 0s; opacity: 1;">
<div class="testimonial-user-name">
<div class="username-rating">
<label>S Gino</label>
<div class="trips-testimonial-rating">
<i class="fa fa-star"></i><i class="fa fa-star"></i><i class="fa fa-star"></i><i class="fa fa-star"></i><i class="fa fa-star-o"></i>
</div>
<div class="trips-testimonial-date">
07-Jul-2020
</div>
</div>
<p class="testimonial">
I totally LOVED TripHobo’s Trip planning tool. I guess this is the best trip planner I have ever used. The ability to connect to hotel aggregators, the auto distance calculation, the budget and pricing capabilities, the drag drop features, etc. all have been remarkable so far. The execution is almost perfect. The site is however slow at times probably because of the various elements involved in trip planning process. But if you can optimize it then nothing like it.
</p>
</div>
</div>
</div>
</li>
The code should loop over the pages, extract the text inside the sections trips-testimonial-date, testimonial and put it in a dataframe
The first version was:
import requests
from bs4 import BeautifulSoup
import pandas as pd
reviews = []
for page in range(1,10):
url = f"https://www.triphobo.com/tripplanner/reviews?page={page}"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for review in soup.find_all("li", class_="js-block-list-item"):
date = review.find("div", class_="trips-testimonial-date").text
text = review.find("p", class_="testimonial").text
reviews.append({"date": date, "text": text})
reviews_df = pd.DataFrame(reviews)
print(reviews_df)
After some iteration for error handling, feature addition and manual correction this is the code I came with:
import requests
import random
from bs4 import BeautifulSoup
import pandas as pd
from tqdm import tqdm
import time
def scrape_reviews(pages, sleep_time):
reviews = []
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
for page in tqdm(pages):
url = f"https://www.triphobo.com/tripplanner/reviews?page={page}"
try:
response = requests.get(url, headers=headers)
response.raise_for_status()
soup = BeautifulSoup(response.text, "html.parser")
for review in soup.find_all("li", class_="js-block-list-item"):
date = review.find("div", class_="trips-testimonial-date").text.strip()
text = review.find("p", class_="testimonial").text.strip()
rating_el = review.find("div", class_="trips-testimonial-rating")
rating = len(rating_el.find_all(class_="fa-star"))
reviews.append({"date": date, "rating": rating, "text": text, "page":page})
time.sleep(sleep_time + random.uniform(0, 1))
except requests.exceptions.HTTPError as errh:
print ("HTTP Error:",errh)
except requests.exceptions.ConnectionError as errc:
print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
print ("Timeout Error:",errt)
except requests.exceptions.RequestException as err:
print ("Something went wrong",err)
if len(reviews) != 0:
reviews_df = pd.DataFrame(reviews)
reviews_df['date'] = pd.to_datetime(reviews_df['date'], format='%d-%b-%Y')
return reviews_df
return None
That can be executed like so:
pages = range(1,94)
reviews_df = scrape_reviews( pages, 2)
Data Summarization
After checking double checking the data, I just copy/pasted the text of the reviews with less than 4 stars:
reviews_df.query('rating<4').text.str.replace('\n','').to_clipboard()
And let chatGPT extract a list of negative points, using this prompt:
Extract the complaints from this reviews
"Text of the reviews just copied using pandas"
Where “Text of the reviews just copied using pandas” is the result of Ctrl+V.
And the bot gladly replied:
Complaints:
- Would like to see if you can add hostels for those looking to plan budget trips.
- The new UX (no times on overview, no smarts telling things are overlapping in editable mode etc. etc.)
- The app no longer comes up in the app store and I'm no longer able to edit trips I've already planned. What happened?! This was my favorite trip planning app
- Tried it to create a road trip and all it would do is add airports/flights etc... Not intuitive.
- Really terrible... takes forever to put all the information in, then if you click anywhere else before saving, all of your information is gone and impossible to retrieve. Why not just have it save as you go along?
- I need the option to email in my hotel reservations to have them added to the plan, or add them manually without having to go through the 'booking.com' link. When i search for a hotel then click on the dates to edit, the page stalls and won't let me edit the check in / check out dates. I can; find a way to add the hotel to the plan without booking through the site.
- Can't figure out why it wont allow me to drive further than 50 klm. Also the train estimates are way off. Good list of attractions, but doesn't know when a single attraction is listed by two names and will add both to your trip.
- Amazing idea, but with issues. There were some loading issues -it was slow but I was still managing through. I got really annoyed when I built an itinerary and tried to change the train timings (it was putting me on a midnight train from Kyoto t Tokyo). It would not let me change the timings, that part of the schedule was completely locked. Also, I had difficulty adding my hotel which I had already booked. The planner needs work.
- What I really required was a simple itinerary with suggestions for things to do in a particular city. Although this does the job ok, I do not want the accommodation to be added automatically to the plan. This part is really irritating.
- I liked TripHobo and the kind of algorithm it uses to plan trips based on the user added details. However it suggests some places totally irrelevant and impractical. For example, for a trip in Bhutan it suggests places in Sikkim.
- As an avid traveler, I’ve never been so excited to discover your website. It was like discovering the best tool I had been subconsciously hoping for. Unfortunately my excitement was short-lived because of not being able to plan a trip because the calendar won’t allow for a date around a year in future.
- The idea seems amazing but the site was kind of laggy at a lot of places. Not sure if it was a temporary thingy but you guys shoud try to fix it up.
- The site is however slow at times. That just might be a site bandwidth or server issue i guess.
- At times it loads slow, but being from the IT industry myself, i understand the amount of information you fetch makes it the case. If this can be sloved would be awesome.
- Liked how the schedule was displayed and how easy it was to rearrange cities by dragging and dropping. But if it were possible to drag and drop things on the schedule and have the times updated automatically, a bit like dragging things around on iCal or Google Calendars, it would make the processes much more user-friendly.
- I thought your site was awesome but I would like to be able to customize the ""free time"" so that it doesn't appear as ""free time"" when you print
That is great, but can we be more concise? Let’s try
Prompt:
Can you be more synthetic?
Reply:
The reviews contain a mix of complaints and compliments. Some users have issues with the tool not being able to plan budget trips, adding hostels, problems with the new UX, app no longer available in the app store, difficulty editing trips, issues with the algorithm suggesting irrelevant and impractical places, site being slow or laggy, difficulty adding already booked hotels, the planner needs work, not being able to customize the "free time" when printing the itinerary, lack of offline availability, and the need for more items in the "add places menu." Some users also suggest improvements such as being able to connect to external sites, having ambassadors from each location, and the ability to drag and drop things on the schedule and have the times updated automatically for a more user-friendly experience.
I prefer lists, so I’d keep the first version.
Chatgpt made simple to extract the data from the website and to identify and understand the most pressing issues mentioned by the customers. In general, It is evident that ChatGPT can be a valuable tool for extracting and summarizing data from large amounts of text. Its ability to understand and interpret natural language makes it a powerful tool for data analysis and information extraction, particularly when it comes to extract unstructured data from text and converting to a more structured form.