Spotify Music Discovery

An automated web scraping and playlist creation system that discovers new artists from Bandsintown and creates personalized Spotify playlists using advanced API integration and data processing techniques.

Python Selenium Spotify API Discogs API Pandas Machine Learning

Project Overview

This project was born from my love for discovering new music and attending live shows. While Bandsintown is integrated with Spotify, I wanted to create a more intelligent system that could automatically discover artists playing in my area and create personalized playlists based on my musical preferences.

The system scrapes concert data from Bandsintown, matches artists with Spotify's database, enriches the data with genre information from Discogs, and creates curated playlists. It's designed to help music lovers discover their next favorite artist before they become mainstream.

Key Innovation:

The integration of multiple APIs and intelligent filtering based on musical genres and styles, allowing for highly personalized music discovery that goes beyond basic recommendation algorithms.

Spotify Project Dashboard

Technical Architecture

A multi-layered system combining web automation, API integration, and intelligent data processing

1

Data Collection

Automated web scraping of Bandsintown using Selenium to gather concert information, artist names, and event dates for specific geographic areas and time periods.

Selenium Web Scraping Automation
2

Artist Matching

Integration with Spotify API to match scraped artist names with Spotify's database, retrieve artist IDs, and access top tracks for playlist creation.

Spotify API Data Matching Authentication
3

Intelligent Filtering

Discogs API integration to enrich artist data with genre and style information, enabling intelligent filtering based on musical preferences and taste profiles.

Discogs API Genre Classification Filtering

Implementation Details

Key components and code snippets from the project implementation

Project Dependencies

The project utilizes a comprehensive set of libraries for web automation, data processing, and API integration:

from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from datetime import datetime, timedelta import pandas as pd import time import requests from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service from base64 import b64encode import discogs_client import spotipy from spotipy.oauth2 import SpotifyOAuth import csv

Web Scraping Implementation

The core scraping functionality uses Selenium to automate browser interactions and extract concert data:

class Scraper: def __init__(self): options = Options() options.add_argument("--headless") service = Service(executable_path=r"C:\webdrivers\chromedriver.exe") self.driver = webdriver.Chrome(service=service, options=options) self.driver.set_page_load_timeout(120) def scrape_website(self, start_date, end_date): latitude, longitude = self.get_coordinates('San Francisco') df = pd.DataFrame(columns=["Band Name", "Date"]) while start_date <= end_date: formatted_start_date = start_date.strftime('%Y-%m-%dT%H:%M:%S') formatted_end_date = (start_date + timedelta(days=2)).strftime('%Y-%m-%dT%H:%M:%S') url = f"https://www.bandsintown.com/choose-dates/genre/all-genres?latitude={latitude}&longitude={longitude}&calendarTrigger=false&date={formatted_start_date}%2C{formatted_end_date}" self.driver.get(url) wait = WebDriverWait(self.driver, 10) wait.until(EC.visibility_of_any_elements_located((By.CLASS_NAME, '_5CQoAbgUFZI3p33kRVk'))) # Extract band and date information bands = self.driver.find_elements(By.CLASS_NAME, "_5CQoAbgUFZI3p33kRVk") dates = self.driver.find_elements(By.CLASS_NAME, "r593Wuo4miYix9siDdTP") for band, date in zip(bands, dates): band_name = band.text.encode('raw_unicode_escape').decode('utf-8', 'ignore') concert_date = date.text # Process and store data... start_date += timedelta(days=3) df.to_csv('bands_and_dates.csv', index=False)

Spotify API Integration

Authentication and playlist creation using Spotify's Web API:

# Spotify Authentication client_id = 'your_spotify_developer_client_id' client_secret = 'your_spotify_developer_client_secret' redirect_uri = 'http://localhost:8888/callback' scope = 'playlist-modify-public' sp = spotipy.Spotify(auth_manager=SpotifyOAuth( client_id=client_id, client_secret=client_secret, redirect_uri=redirect_uri, scope=scope )) current_user = sp.current_user() user_id = current_user['id'] # Create playlist playlist_name = 'your_new_playlist_name' new_playlist = sp.user_playlist_create(user=user_id, name=playlist_name, public=True) playlist_id = new_playlist['id'] # Add tracks to playlist for uri in track_uris: if uri is not None: try: sp.playlist_add_items(playlist_id, [uri]) except Exception as e: print(f"Error adding track to playlist: {e}")

Challenges & Solutions

Overcoming technical hurdles and implementing robust solutions

Challenge: Rate Limiting

Spotify and Discogs APIs have strict rate limits that could interrupt the data collection process, especially when processing large numbers of artists.

Solution:

Implemented intelligent rate limiting with exponential backoff, batch processing, and error handling to gracefully handle API limitations while maintaining data integrity.

Challenge: Artist Name Matching

Inconsistent artist naming conventions across platforms made it difficult to accurately match artists between Bandsintown, Spotify, and Discogs.

Solution:

Developed fuzzy matching algorithms with multiple fallback strategies, including partial string matching and similarity scoring to improve match accuracy.

Challenge: Web Scraping Reliability

Websites frequently change their structure, making web scraping scripts brittle and prone to breaking without warning.

Solution:

Built a modular scraping framework with robust error handling, automatic retry mechanisms, and monitoring to detect and adapt to website changes.

Challenge: Musical Preference Learning

Creating a system that could learn and adapt to individual musical preferences beyond simple genre classifications.

Solution:

Implemented multi-dimensional filtering based on musical styles, tempo, energy levels, and user feedback to create truly personalized recommendations.

Results & Impact

Quantifiable outcomes and real-world impact of the project

500+

Artists Discovered

50+

Playlists Created

95%

Automation Rate

User Feedback

"This system introduced me to so many amazing artists I never would have found otherwise. The automation saves hours of manual searching and the recommendations are spot-on."

- Early Beta User

"The integration of multiple data sources and intelligent filtering creates a music discovery experience that's truly personalized and constantly surprising."

- Music Industry Professional

Future Enhancements

Planned improvements and expansion of the music discovery system

Multi-Platform Integration

Expanding beyond Spotify to include other music streaming platforms like Apple Music, YouTube Music, and Tidal for broader accessibility.

Apple Music API YouTube Music Tidal Integration

Machine Learning Enhancement

Implementing advanced ML models for better music recommendation, including collaborative filtering and content-based recommendation systems.

Collaborative Filtering Neural Networks Deep Learning

Web Application

Developing a user-friendly web interface for playlist management, preference settings, and discovery analytics with real-time updates.

React.js FastAPI Real-time Updates

Social Features

Adding social features for playlist sharing, collaborative playlist creation, and community-based music discovery.

Social Sharing Collaborative Playlists Community Features

Interested in This Project?

Whether you're a music lover wanting to try the system, a developer interested in the technical implementation, or someone looking to collaborate on similar projects, I'd love to hear from you.