In modern applications, it's common to integrate external data sources and batch processing to fetch, process, and store information efficiently. One example of this is a Spring Batch job that fetches college names from an API, retrieves 10 website URLs for each college, and stores the resulting data in a database. This process can be broken down into several key steps, from configuring Spring Batch to writing the necessary components that perform API calls and database writes.
This article will guide you through the entire process, from creating the batch job configuration to implementing the individual components like ItemReader
, ItemProcessor
, and ItemWriter
.
1. Spring Batch Job Configuration
To begin with, you'll need to configure a Spring Batch job. A Spring Batch job is composed of steps, where each step performs a specific task, such as reading data, processing it, and writing it to a database. Here's an example of how to configure your job with a single step:
java
@Configuration
@EnableBatchProcessing
public class CollegeBatchJobConfig {
@Autowired
private JobBuilderFactory jobBuilderFactory;
@Autowired
private StepBuilderFactory stepBuilderFactory;
@Autowired
private CollegeNameItemReader collegeNameItemReader;
@Autowired
private CollegeWebsiteProcessor collegeWebsiteProcessor;
@Autowired
private CollegeItemWriter collegeItemWriter;
@Bean
public Job collegeDataJob() {
return jobBuilderFactory.get("collegeDataJob")
.start(fetchCollegeNamesStep())
.build();
}
@Bean
public Step fetchCollegeNamesStep() {
return stepBuilderFactory.get("fetchCollegeNamesStep")
.<College, College>chunk(10)
.reader(collegeNameItemReader)
.processor(collegeWebsiteProcessor)
.writer(collegeItemWriter)
.build();
}
}
This configuration sets up the job and the step that fetches college names, processes each one to retrieve website URLs, and writes the results to the database.
2. Implementing the ItemReader to Fetch College Names
The ItemReader
is responsible for reading data from an external sourceāin this case, an API that returns a list of 100 college names. You'll need to implement the read()
method to fetch the data.
java
@Component
public class CollegeNameItemReader implements ItemReader<College> {
private List<College> colleges = null;
private int nextCollegeIndex;
@Override
public College read() throws Exception {
if (colleges == null) {
colleges = fetchCollegeNamesFromApi();
}
College nextCollege = null;
if (nextCollegeIndex < colleges.size()) {
nextCollege = colleges.get(nextCollegeIndex);
nextCollegeIndex++;
}
return nextCollege;
}
private List<College> fetchCollegeNamesFromApi() {
// Make API call to retrieve 100 college names
return new ArrayList<>(); // Placeholder for actual API call
}
}
Here, the fetchCollegeNamesFromApi()
method would contain the logic to call the API and get the list of college names.
3. Implementing the ItemProcessor to Fetch Website URLs
The ItemProcessor
is where you process each college name to fetch 10 website URLs. For each college, we need to perform another API call to retrieve this data.
java
@Component
public class CollegeWebsiteProcessor implements ItemProcessor<College, College> {
@Autowired
private RestTemplate restTemplate;
@Override
public College process(College college) throws Exception {
List<String> websiteUrls = fetchWebsitesForCollege(college.getName());
college.setWebsiteUrls(websiteUrls);
return college;
}
private List<String> fetchWebsitesForCollege(String collegeName) {
// Make API call to retrieve 10 website URLs for the given college name
return new ArrayList<>(); // Placeholder for actual API call
}
}
This processor fetches the website URLs based on the college name and adds them to the college object before passing it on to the writer.
4. Implementing the ItemWriter to Save Data to the Database
The ItemWriter
is responsible for saving the processed data (college names and their websites) into the database.
java
@Component
public class CollegeItemWriter implements ItemWriter<College> {
@Autowired
private CollegeRepository collegeRepository;
@Override
public void write(List<? extends College> colleges) throws Exception {
collegeRepository.saveAll(colleges);
}
}
Here, the CollegeItemWriter
saves the list of colleges, including their website URLs, to the database using the CollegeRepository
.
5. Create the Entity and Repository
Next, define the College
entity and its repository to store the data in the database.
java
@Entity
public class College {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String name;
@ElementCollection
private List<String> websiteUrls;
// Getters and setters
}
java
public interface CollegeRepository extends JpaRepository<College, Long> {
}
6. Configure the RestTemplate Bean
Since you'll be making external API calls, you need to configure a RestTemplate
bean for use in your ItemProcessor
.
java
@Configuration
public class AppConfig {
@Bean
public RestTemplate restTemplate() {
return new RestTemplate();
}
}
7. Running the Spring Batch Job
Finally, you can set up the job to run either on startup or manually. If you want to trigger the job on application startup, you can use the CommandLineRunner
interface:
java
@Configuration
public class JobRunner {
@Autowired
private JobLauncher jobLauncher;
@Autowired
private Job collegeDataJob;
@Bean
public CommandLineRunner runJob() {
return args -> {
try {
JobParameters jobParameters = new JobParametersBuilder()
.addLong("startAt", System.currentTimeMillis())
.toJobParameters();
JobExecution execution = jobLauncher.run(collegeDataJob, jobParameters);
System.out.println("Job Status : " + execution.getStatus());
} catch (Exception e) {
e.printStackTrace();
}
};
}
}
Alternatively, you can schedule the job to run at regular intervals using Spring's @Scheduled
annotation or expose an endpoint to trigger the job remotely.
Conclusion
This setup allows you to efficiently fetch, process, and store data in a Spring Batch job. By leveraging ItemReader
, ItemProcessor
, and ItemWriter
, you can integrate external APIs and handle large volumes of data in an efficient manner. The Spring Batch framework also provides robust features for error handling, transaction management, and performance optimization, making it ideal for such use cases.