How to Create an URL Translation and Summarization Shell Script Using Kotlin and LangChain4j

How to Create an URL Translation and Summarization Shell Script Using Kotlin and LangChain4j

Introduction

  • We are living in the era of LLM(Large Language Model). Numerous tools and methods leveraging LLM to enhance productivity are being released. In this post, We will create a script that filters out advertisements from internet articles, translates them into English, and summarizes them using Kotlin and LangChain4j in a Linux shell environment.

Script Features

  • Accepts an internet URL as an argument.
  • Removes advertisement and unnecessary content from the URL, translates the remaining content into English, and outputs it in Markdown format.
  • Finally, outputs a summary of the article.

Prerequisites

  • Your OpenAI API Key

Installing Kotlin

  • Install Kotlin using SDKMAN.
# Install SDKMAN
$ curl -s "https://get.sdkman.io" | bash
$ source "$HOME/.sdkman/bin/sdkman-init.sh"

# Install Kotlin
$ sdk install kotlin

# Install Glow, the Markdown Console Viewer
$ brew install glow

Creating the URL to Text Shell Script

  • Let's create a Kotlin script that can be executed in the Linux Shell as follows.
$ nano url2text.main.kts
#!/usr/bin/env kotlin

// Import the LangChain4j library.
@file:DependsOn("dev.langchain4j:langchain4j:0.31.0", "dev.langchain4j:langchain4j-open-ai:0.31.0")

import dev.langchain4j.data.document.Document
import dev.langchain4j.data.document.loader.UrlDocumentLoader
import dev.langchain4j.data.document.parser.TextDocumentParser
import dev.langchain4j.data.document.transformer.HtmlTextExtractor
import dev.langchain4j.data.message.AiMessage
import dev.langchain4j.data.message.SystemMessage
import dev.langchain4j.model.openai.OpenAiChatModel
import dev.langchain4j.model.output.Response
import java.time.Duration
import kotlin.system.exitProcess

// Check if arguments are provided, display usage message and exit if not.
val params = args
if (params.isEmpty()) {
    println("""
Internet URL to Text Tool 0.1 by Tae-hyeong Lee

Usage: url2text.main.kts {url}

Description:
  The 'url2text.main.kts' command fetches the content from a given internet URL, translates it into English, and provides a summary of the translated content.

Example:
  url2text.main.kts https://github.com/langchain-ai/langchain

Note:
Ensure the URL is a valid internet address for accurate translation and summarization.
   """.trimIndent())

    exitProcess(1)
}

// Convert HTML to text using LangChain4j.
val url = params[0]
val htmlDocument: Document = UrlDocumentLoader.load(url, TextDocumentParser())
val textDocument: Document = HtmlTextExtractor().transform(htmlDocument)

// Create a LangChain4j OpenAI GPT-4o LLM object.
val chatModel: OpenAiChatModel = OpenAiChatModel.builder()
    .apiKey("{your-openai-api-key}")
    .timeout(Duration.ofSeconds(120))
    .modelName("gpt-4o-2024-05-13")
    .temperature(0.3)
    .topP(0.3)
    .build()

// Request the LLM to refine, translate, and summarize the content.
val aiMessage: Response<AiMessage> = chatModel.generate(
    SystemMessage(
        """
You are an assistant that reads internet articles and translates them into English. Please refer to the context below and follow these guidelines:

1. Remove all promotional information, headers, footers, and menu information, and extract only the actual content of the article for accurate translation.
2. Convert the translation result into Markdown format, appropriately judging the title and each paragraph.
3. After the translation, summarize the key points of the entire article in English. Follow the format below.

# Original Article
Original content

# Summary
Summary content

context: \\\
$textDocument
\\\
    """.trimIndent()
    )
)

// Output the result from the LLM.
println(aiMessage.content().text())
exitProcess(0)

Real World Translation Example

  • Now, let's use the script to translate an actual Japanese newspaper article into English.
# Execute Japanese newspaper article translation and summarization
$ ./url2text.main.kts https://news.yahoo.co.jp/articles/f2278281e65df486a61d105184fcd4e977c49601 > article.md

# Output the execution result
$ glow article.md

# Dentsu Research Institute Begins Application of Latest LLM Model "GPT-4o" in Enterprise ChatGPT Solution "Know Narrator"

On June 5th, Dentsu Research Institute announced the application and launch of the latest LLM model "GPT-4o" in their enterprise ChatGPT solution "Know Narrator," developed using Microsoft's Azure OpenAI Service. This new model allows for approximately twice the speed and higher precision in AI generation compared to previous models.

Know Narrator is a solution that builds and promotes the use of ChatGPT environments within enterprises. It has previously incorporated the latest LLM models and developed unique features such as "Know Narrator Insight," which analyzes user chat history to propose more efficient usage methods, "Know Narrator Search," which allows ChatGPT to reference internal documents to generate responses, and "Know Narrator API," which enables integration with other systems via API.

With the application of GPT-4o, the response time from question to answer has been reduced to less than half of the previous model. Additionally, the accuracy of image recognition has improved, making it possible to read handwritten Japanese characters, which was difficult with previous models.

Furthermore, GPT-4o has enhanced multilingual data support, improving the reading and response performance for Japanese documents in Know Narrator Search. Dentsu Research Institute plans to continue expanding the functions of Know Narrator and promoting the application of generative AI in practical business and its dissemination in society.

# Article Summary
Dentsu Research Institute has launched the latest LLM model "GPT-4o" in their enterprise ChatGPT solution "Know Narrator," developed using Microsoft's Azure OpenAI Service. This new model offers twice the speed and higher precision, improved image recognition, and enhanced multilingual support, particularly for Japanese documents.