Programming & Development / April 18, 2025

Scraping Contact Information from Websites Using Java and JSoup

Java JSoup Web Scraping Email Extraction Phone Scraping Contact Info Scraper HTML Parser Website Automation Java Web Crawler

To scrape a contact address from a website using Java, you would typically use a library like JSoup to parse the HTML and extract the desired information. Here's a basic example of how you might do that:

Step 1: Add JSoup Dependency

If you're using Maven, add JSoup to your pom.xml:

xml

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.15.4</version>
</dependency>

If you're not using Maven, you can download JSoup and include the JAR manually in your project.

Step 2: Java Code to Scrape Emails, Phones, and Addresses

java

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;

public class WebScraper {

    public static void main(String[] args) {
        String url = "https://example.com/contact"; // Change this to the actual URL

        try {
            Document doc = Jsoup.connect(url).get();

            // Extract email addresses
            Elements emailLinks = doc.select("a[href^=mailto]");
            for (Element link : emailLinks) {
                String email = link.attr("href").replace("mailto:", "");
                System.out.println("Email: " + email);
            }

            // Extract phone numbers
            Elements phoneLinks = doc.select("a[href^=tel]");
            for (Element link : phoneLinks) {
                String phone = link.attr("href").replace("tel:", "");
                System.out.println("Phone: " + phone);
            }

            // Extract physical address (if present in <address> tags)
            Elements addressTags = doc.select("address");
            for (Element address : addressTags) {
                System.out.println("Address: " + address.text());
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Explanation:

  • Jsoup.connect(url).get() fetches the HTML content of the webpage.
  • Selectors like a[href^=mailto] and a[href^=tel] extract emails and phone numbers from <a> tags.
  • <address> tags are used by some websites to denote physical addresses.

Tip:

If contact info is loaded via JavaScript, you’ll need a tool like Selenium WebDriver instead, since JSoup can’t handle dynamic content.


Comments

No comments yet

Add a new Comment

NUHMAN.COM

Information Technology website for Programming & Development, Web Design & UX/UI, Startups & Innovation, Gadgets & Consumer Tech, Cloud Computing & Enterprise Tech, Cybersecurity, Artificial Intelligence (AI) & Machine Learning (ML), Gaming Technology, Mobile Development, Tech News & Trends, Open Source & Linux, Data Science & Analytics

Categories

Tags

©{" "} Nuhmans.com . All Rights Reserved. Designed by{" "} HTML Codex