Introduction
- We are living in the era of
LLM
(Large Language Model). Numerous tools and methods leveraging LLM to enhance productivity are being released. In this post, We will create a script that filters out advertisements from internet articles, translates them into English, and summarizes them using Kotlin
and LangChain4j
in a Linux shell environment.
Script Features
- Accepts an internet URL as an argument.
- Removes advertisement and unnecessary content from the URL, translates the remaining content into English, and outputs it in Markdown format.
- Finally, outputs a summary of the article.
Prerequisites
Installing Kotlin
- Install
Kotlin
using SDKMAN
.
$ curl -s "https://get.sdkman.io" | bash
$ source "$HOME/.sdkman/bin/sdkman-init.sh"
$ sdk install kotlin
$ brew install glow
Creating the URL to Text Shell Script
- Let's create a Kotlin script that can be executed in the Linux Shell as follows.
$ nano url2text.main.kts
#!/usr/bin/env kotlin
@file:DependsOn("dev.langchain4j:langchain4j:0.31.0", "dev.langchain4j:langchain4j-open-ai:0.31.0")
import dev.langchain4j.data.document.Document
import dev.langchain4j.data.document.loader.UrlDocumentLoader
import dev.langchain4j.data.document.parser.TextDocumentParser
import dev.langchain4j.data.document.transformer.HtmlTextExtractor
import dev.langchain4j.data.message.AiMessage
import dev.langchain4j.data.message.SystemMessage
import dev.langchain4j.model.openai.OpenAiChatModel
import dev.langchain4j.model.output.Response
import java.time.Duration
import kotlin.system.exitProcess
val params = args
if (params.isEmpty()) {
println("""
Internet URL to Text Tool 0.1 by Tae-hyeong Lee
Usage: url2text.main.kts {url}
Description:
The 'url2text.main.kts' command fetches the content from a given internet URL, translates it into English, and provides a summary of the translated content.
Example:
url2text.main.kts https://github.com/langchain-ai/langchain
Note:
Ensure the URL is a valid internet address for accurate translation and summarization.
""".trimIndent())
exitProcess(1)
}
val url = params[0]
val htmlDocument: Document = UrlDocumentLoader.load(url, TextDocumentParser())
val textDocument: Document = HtmlTextExtractor().transform(htmlDocument)
val chatModel: OpenAiChatModel = OpenAiChatModel.builder()
.apiKey("{your-openai-api-key}")
.timeout(Duration.ofSeconds(120))
.modelName("gpt-4o-2024-05-13")
.temperature(0.3)
.topP(0.3)
.build()
val aiMessage: Response<AiMessage> = chatModel.generate(
SystemMessage(
"""
You are an assistant that reads internet articles and translates them into English. Please refer to the context below and follow these guidelines:
1. Remove all promotional information, headers, footers, and menu information, and extract only the actual content of the article for accurate translation.
2. Convert the translation result into Markdown format, appropriately judging the title and each paragraph.
3. After the translation, summarize the key points of the entire article in English. Follow the format below.
# Original Article
Original content
# Summary
Summary content
context: \\\
$textDocument
\\\
""".trimIndent()
)
)
println(aiMessage.content().text())
exitProcess(0)
Real World Translation Example
- Now, let's use the script to translate an actual Japanese newspaper article into English.
$ ./url2text.main.kts https://news.yahoo.co.jp/articles/f2278281e65df486a61d105184fcd4e977c49601 > article.md
$ glow article.md
On June 5th, Dentsu Research Institute announced the application and launch of the latest LLM model "GPT-4o" in their enterprise ChatGPT solution "Know Narrator," developed using Microsoft's Azure OpenAI Service. This new model allows for approximately twice the speed and higher precision in AI generation compared to previous models.
Know Narrator is a solution that builds and promotes the use of ChatGPT environments within enterprises. It has previously incorporated the latest LLM models and developed unique features such as "Know Narrator Insight," which analyzes user chat history to propose more efficient usage methods, "Know Narrator Search," which allows ChatGPT to reference internal documents to generate responses, and "Know Narrator API," which enables integration with other systems via API.
With the application of GPT-4o, the response time from question to answer has been reduced to less than half of the previous model. Additionally, the accuracy of image recognition has improved, making it possible to read handwritten Japanese characters, which was difficult with previous models.
Furthermore, GPT-4o has enhanced multilingual data support, improving the reading and response performance for Japanese documents in Know Narrator Search. Dentsu Research Institute plans to continue expanding the functions of Know Narrator and promoting the application of generative AI in practical business and its dissemination in society.
# Article Summary
Dentsu Research Institute has launched the latest LLM model "GPT-4o" in their enterprise ChatGPT solution "Know Narrator," developed using Microsoft's Azure OpenAI Service. This new model offers twice the speed and higher precision, improved image recognition, and enhanced multilingual support, particularly for Japanese documents.