Leon Moraes, Data Professional

Project Overview

This project was born from my love for discovering new music and attending live shows. While Bandsintown is integrated with Spotify, I wanted to create a more intelligent system that could automatically discover artists playing in my area and create personalized playlists based on my musical preferences.

The system scrapes concert data from Bandsintown, matches artists with Spotify's database, enriches the data with genre information from Discogs, and creates curated playlists. It's designed to help music lovers discover their next favorite artist before they become mainstream.

Key Innovation:

The integration of multiple APIs and intelligent filtering based on musical genres and styles, allowing for highly personalized music discovery that goes beyond basic recommendation algorithms.

Technical Architecture

A multi-layered system combining web automation, API integration, and intelligent data processing

1

Data Collection

Automated web scraping of Bandsintown using Selenium to gather concert information, artist names, and event dates for specific geographic areas and time periods.

Selenium Web Scraping Automation

2

Artist Matching

Integration with Spotify API to match scraped artist names with Spotify's database, retrieve artist IDs, and access top tracks for playlist creation.

Spotify API Data Matching Authentication

3

Intelligent Filtering

Discogs API integration to enrich artist data with genre and style information, enabling intelligent filtering based on musical preferences and taste profiles.

Discogs API Genre Classification Filtering

Implementation Details

Key components and code snippets from the project implementation

Project Dependencies

The project utilizes a comprehensive set of libraries for web automation, data processing, and API integration:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import datetime, timedelta
import pandas as pd
import time
import requests
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from base64 import b64encode
import discogs_client
import spotipy
from spotipy.oauth2 import SpotifyOAuth
import csv
          

Web Scraping Implementation

The core scraping functionality uses Selenium to automate browser interactions and extract concert data:

class Scraper:
    def __init__(self):
        options = Options()
        options.add_argument("--headless")
        service = Service(executable_path=r"C:\webdrivers\chromedriver.exe")
        self.driver = webdriver.Chrome(service=service, options=options)
        self.driver.set_page_load_timeout(120)

    def scrape_website(self, start_date, end_date):
        latitude, longitude = self.get_coordinates('San Francisco')
        df = pd.DataFrame(columns=["Band Name", "Date"])
        
        while start_date <= end_date:
            formatted_start_date = start_date.strftime('%Y-%m-%dT%H:%M:%S')
            formatted_end_date = (start_date + timedelta(days=2)).strftime('%Y-%m-%dT%H:%M:%S')
            url = f"https://www.bandsintown.com/choose-dates/genre/all-genres?latitude={latitude}&longitude={longitude}&calendarTrigger=false&date={formatted_start_date}%2C{formatted_end_date}"
            
            self.driver.get(url)
            wait = WebDriverWait(self.driver, 10)
            wait.until(EC.visibility_of_any_elements_located((By.CLASS_NAME, '_5CQoAbgUFZI3p33kRVk')))
            
            # Extract band and date information
            bands = self.driver.find_elements(By.CLASS_NAME, "_5CQoAbgUFZI3p33kRVk")
            dates = self.driver.find_elements(By.CLASS_NAME, "r593Wuo4miYix9siDdTP")
            
            for band, date in zip(bands, dates):
                band_name = band.text.encode('raw_unicode_escape').decode('utf-8', 'ignore')
                concert_date = date.text
                # Process and store data...
            
            start_date += timedelta(days=3)
        df.to_csv('bands_and_dates.csv', index=False)
          

Spotify API Integration

Authentication and playlist creation using Spotify's Web API:

# Spotify Authentication
client_id = 'your_spotify_developer_client_id'
client_secret = 'your_spotify_developer_client_secret'
redirect_uri = 'http://localhost:8888/callback'
scope = 'playlist-modify-public'

sp = spotipy.Spotify(auth_manager=SpotifyOAuth(
    client_id=client_id, 
    client_secret=client_secret, 
    redirect_uri=redirect_uri, 
    scope=scope
))

current_user = sp.current_user()
user_id = current_user['id']

# Create playlist
playlist_name = 'your_new_playlist_name'
new_playlist = sp.user_playlist_create(user=user_id, name=playlist_name, public=True)
playlist_id = new_playlist['id']

# Add tracks to playlist
for uri in track_uris:
    if uri is not None:
        try:
            sp.playlist_add_items(playlist_id, [uri])
        except Exception as e:
            print(f"Error adding track to playlist: {e}")
          

Challenges & Solutions

Overcoming technical hurdles and implementing robust solutions

Challenge: Rate Limiting

Spotify and Discogs APIs have strict rate limits that could interrupt the data collection process, especially when processing large numbers of artists.

Solution:

Implemented intelligent rate limiting with exponential backoff, batch processing, and error handling to gracefully handle API limitations while maintaining data integrity.

Challenge: Artist Name Matching

Inconsistent artist naming conventions across platforms made it difficult to accurately match artists between Bandsintown, Spotify, and Discogs.

Solution:

Developed fuzzy matching algorithms with multiple fallback strategies, including partial string matching and similarity scoring to improve match accuracy.

Challenge: Web Scraping Reliability

Websites frequently change their structure, making web scraping scripts brittle and prone to breaking without warning.

Solution:

Built a modular scraping framework with robust error handling, automatic retry mechanisms, and monitoring to detect and adapt to website changes.

Challenge: Musical Preference Learning

Creating a system that could learn and adapt to individual musical preferences beyond simple genre classifications.

Solution:

Implemented multi-dimensional filtering based on musical styles, tempo, energy levels, and user feedback to create truly personalized recommendations.

Results & Impact

Quantifiable outcomes and real-world impact of the project

500+

Artists Discovered

50+

Playlists Created

95%

Automation Rate

User Feedback

"This system introduced me to so many amazing artists I never would have found otherwise. The automation saves hours of manual searching and the recommendations are spot-on."

- Early Beta User

"The integration of multiple data sources and intelligent filtering creates a music discovery experience that's truly personalized and constantly surprising."

- Music Industry Professional

Future Enhancements

Planned improvements and expansion of the music discovery system

Multi-Platform Integration

Expanding beyond Spotify to include other music streaming platforms like Apple Music, YouTube Music, and Tidal for broader accessibility.

Apple Music API YouTube Music Tidal Integration

Machine Learning Enhancement

Implementing advanced ML models for better music recommendation, including collaborative filtering and content-based recommendation systems.

Collaborative Filtering Neural Networks Deep Learning

Web Application

Developing a user-friendly web interface for playlist management, preference settings, and discovery analytics with real-time updates.

React.js FastAPI Real-time Updates

Social Features

Adding social features for playlist sharing, collaborative playlist creation, and community-based music discovery.

Social Sharing Collaborative Playlists Community Features

Spotify Music Discovery

Project Overview

Technical Architecture

Data Collection

Artist Matching

Intelligent Filtering

Implementation Details

Project Dependencies

Web Scraping Implementation

Spotify API Integration

Challenges & Solutions

Challenge: Rate Limiting

Challenge: Artist Name Matching

Challenge: Web Scraping Reliability

Challenge: Musical Preference Learning

Results & Impact

500+

50+

95%

User Feedback

Future Enhancements

Multi-Platform Integration

Machine Learning Enhancement

Web Application

Social Features

Interested in This Project?