Programming & Development / April 19, 2025

Spring Batch Job to Fetch College Names and Websites, and Store Data in Database

Spring Batch Job Configuration ItemReader ItemProcessor ItemWriter RestTemplate API Call College Names Website URLs Database Storage Spring Boot Spring Batch Job Scheduling Job Runner JpaRepository

In modern applications, it's common to integrate external data sources and batch processing to fetch, process, and store information efficiently. One example of this is a Spring Batch job that fetches college names from an API, retrieves 10 website URLs for each college, and stores the resulting data in a database. This process can be broken down into several key steps, from configuring Spring Batch to writing the necessary components that perform API calls and database writes.

This article will guide you through the entire process, from creating the batch job configuration to implementing the individual components like ItemReader, ItemProcessor, and ItemWriter.

1. Spring Batch Job Configuration

To begin with, you'll need to configure a Spring Batch job. A Spring Batch job is composed of steps, where each step performs a specific task, such as reading data, processing it, and writing it to a database. Here's an example of how to configure your job with a single step:

java

@Configuration
@EnableBatchProcessing
public class CollegeBatchJobConfig {

    @Autowired
    private JobBuilderFactory jobBuilderFactory;

    @Autowired
    private StepBuilderFactory stepBuilderFactory;

    @Autowired
    private CollegeNameItemReader collegeNameItemReader;

    @Autowired
    private CollegeWebsiteProcessor collegeWebsiteProcessor;

    @Autowired
    private CollegeItemWriter collegeItemWriter;

    @Bean
    public Job collegeDataJob() {
        return jobBuilderFactory.get("collegeDataJob")
                .start(fetchCollegeNamesStep())
                .build();
    }

    @Bean
    public Step fetchCollegeNamesStep() {
        return stepBuilderFactory.get("fetchCollegeNamesStep")
                .<College, College>chunk(10)
                .reader(collegeNameItemReader)
                .processor(collegeWebsiteProcessor)
                .writer(collegeItemWriter)
                .build();
    }
}

This configuration sets up the job and the step that fetches college names, processes each one to retrieve website URLs, and writes the results to the database.

2. Implementing the ItemReader to Fetch College Names

The ItemReader is responsible for reading data from an external source—in this case, an API that returns a list of 100 college names. You'll need to implement the read() method to fetch the data.

java

@Component
public class CollegeNameItemReader implements ItemReader<College> {

    private List<College> colleges = null;
    private int nextCollegeIndex;

    @Override
    public College read() throws Exception {
        if (colleges == null) {
            colleges = fetchCollegeNamesFromApi();
        }

        College nextCollege = null;

        if (nextCollegeIndex < colleges.size()) {
            nextCollege = colleges.get(nextCollegeIndex);
            nextCollegeIndex++;
        }

        return nextCollege;
    }

    private List<College> fetchCollegeNamesFromApi() {
        // Make API call to retrieve 100 college names
        return new ArrayList<>(); // Placeholder for actual API call
    }
}

Here, the fetchCollegeNamesFromApi() method would contain the logic to call the API and get the list of college names.

3. Implementing the ItemProcessor to Fetch Website URLs

The ItemProcessor is where you process each college name to fetch 10 website URLs. For each college, we need to perform another API call to retrieve this data.

java

@Component
public class CollegeWebsiteProcessor implements ItemProcessor<College, College> {

    @Autowired
    private RestTemplate restTemplate;

    @Override
    public College process(College college) throws Exception {
        List<String> websiteUrls = fetchWebsitesForCollege(college.getName());
        college.setWebsiteUrls(websiteUrls);
        return college;
    }

    private List<String> fetchWebsitesForCollege(String collegeName) {
        // Make API call to retrieve 10 website URLs for the given college name
        return new ArrayList<>(); // Placeholder for actual API call
    }
}

This processor fetches the website URLs based on the college name and adds them to the college object before passing it on to the writer.

4. Implementing the ItemWriter to Save Data to the Database

The ItemWriter is responsible for saving the processed data (college names and their websites) into the database.

java

@Component
public class CollegeItemWriter implements ItemWriter<College> {

    @Autowired
    private CollegeRepository collegeRepository;

    @Override
    public void write(List<? extends College> colleges) throws Exception {
        collegeRepository.saveAll(colleges);
    }
}

Here, the CollegeItemWriter saves the list of colleges, including their website URLs, to the database using the CollegeRepository.

5. Create the Entity and Repository

Next, define the College entity and its repository to store the data in the database.

java

@Entity
public class College {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String name;

    @ElementCollection
    private List<String> websiteUrls;

    // Getters and setters
}
java

public interface CollegeRepository extends JpaRepository<College, Long> {
}

6. Configure the RestTemplate Bean

Since you'll be making external API calls, you need to configure a RestTemplate bean for use in your ItemProcessor.

java

@Configuration
public class AppConfig {

    @Bean
    public RestTemplate restTemplate() {
        return new RestTemplate();
    }
}

7. Running the Spring Batch Job

Finally, you can set up the job to run either on startup or manually. If you want to trigger the job on application startup, you can use the CommandLineRunner interface:

java

@Configuration
public class JobRunner {

    @Autowired
    private JobLauncher jobLauncher;

    @Autowired
    private Job collegeDataJob;

    @Bean
    public CommandLineRunner runJob() {
        return args -> {
            try {
                JobParameters jobParameters = new JobParametersBuilder()
                        .addLong("startAt", System.currentTimeMillis())
                        .toJobParameters();
                JobExecution execution = jobLauncher.run(collegeDataJob, jobParameters);
                System.out.println("Job Status : " + execution.getStatus());
            } catch (Exception e) {
                e.printStackTrace();
            }
        };
    }
}

Alternatively, you can schedule the job to run at regular intervals using Spring's @Scheduled annotation or expose an endpoint to trigger the job remotely.

Conclusion

This setup allows you to efficiently fetch, process, and store data in a Spring Batch job. By leveraging ItemReader, ItemProcessor, and ItemWriter, you can integrate external APIs and handle large volumes of data in an efficient manner. The Spring Batch framework also provides robust features for error handling, transaction management, and performance optimization, making it ideal for such use cases.


Comments

No comments yet

Add a new Comment

NUHMAN.COM

Information Technology website for Programming & Development, Web Design & UX/UI, Startups & Innovation, Gadgets & Consumer Tech, Cloud Computing & Enterprise Tech, Cybersecurity, Artificial Intelligence (AI) & Machine Learning (ML), Gaming Technology, Mobile Development, Tech News & Trends, Open Source & Linux, Data Science & Analytics

Categories

Tags

©{" "} Nuhmans.com . All Rights Reserved. Designed by{" "} HTML Codex