ENHANCING E-COMMERCE DATA ENRICHMENT: A MULTIMODAL APPROACH WITH LARGE LANGUAGE MODELS & RULEBOOK

Saravanan Radhakrishnan; Jerubbaal Luke; Rahul Agarwal; Gargi Lahiri; Karthick Alagappan

doi:10.5455/JCSI.20250217052842

2025, Vol: 2, Issue: 4

2 / 4Current Issue Archive Aims and Scope Abstracting & Indexing Most Accessed Articles Most Downloaded Articles Most Cited Articles

Original Article
Online Published: 29 Apr 2025

J Comp Sci Informatics. 2025; 2(3): 148-162

doi: 10.5455/JCSI.20250217052842

ENHANCING E-COMMERCE DATA ENRICHMENT: A MULTIMODAL APPROACH WITH LARGE LANGUAGE MODELS & RULEBOOK

Saravanan Radhakrishnan, Jerubbaal Luke, Rahul Agarwal, Gargi Lahiri, Karthick Alagappan.

Abstract
Aim/Background: Product attribute value extraction (PAVE) systems have emerged as a powerful tool to automate the process of extracting and organizing product attributes from diverse data sources. Large Language Models (LLMs) have repeatedly demonstrated significant potential in extracting relevant information and are well founded on high reasoning ability. In this article, we propose the use of a rulebook that can assist LLMs in extracting the correct information while maintaining compliance with predefined guidelines. We call this technique rulebook-based prompting, and it significantly outperforms zero-shot prompting. This has many advantages. Firstly, LLMs do not need to be finetuned for every new product. The rulebook can be updated to include new information and guidelines. It also reduces manual effort since there are around 60 attributes to be verified with complex rules.
Methods: The process involves converting the rulebook into a vectorized representation using an embedding for efficient semantic searches. When an input containing product images and descriptions is entered, the LLM first identifies the product type. The list of attributes for the particular product type are then obtained using the vectorized rulebook. A prompt is generated using the list of attributes and other instructions and passed to the LLM. The LLM finally extracts all the required attribute information in a specified format.
Results: Our experiments demonstrated equivalent performance between Azure OpenAI’s GPT 4o and Gemini 1.5 Flash due to their multimodal ability, outperforming Azure OpenAI’s GPT 3.5 and regex pattern matching. We also show the rulebook-based prompt design improves model performance with each LLM scoring 10% higher than the F1-score under zero-shot prompting. Additionally, results are also shown on how LLMs perform for both prompt designs under various conditions.
Conclusion: With the help of a rulebook to assist LLMs to create dynamic prompts, this ensures that all relevant attributes for specific product are consistently identified and documented, thereby improving the overall quality and reliability of the extraction system. We also suggest using Gemini 1.5 Flash in commercial applications involving high traffic where cost is key factor.

Key words: Generative AI, Large Language Model (LLM), Vector Database, Product attribute extraction, E-commerce

	ARTICLE TOOLS
	Abstract
	PDF Fulltext
	How to cite this article
	Citation Tools
	Related Records
	Articles by Saravanan Radhakrishnan Articles by Jerubbaal Luke Articles by Rahul Agarwal Articles by Gargi Lahiri Articles by Karthick Alagappan
	on Google
	on Google Scholar

How to Cite this Article

Pubmed Style

Radhakrishnan S, Luke J, Agarwal R, Lahiri G, Alagappan K. ENHANCING E-COMMERCE DATA ENRICHMENT: A MULTIMODAL APPROACH WITH LARGE LANGUAGE MODELS & RULEBOOK. J Comp Sci Informatics. 2025; 2(3): 148-162. doi:10.5455/JCSI.20250217052842

Web Style

Radhakrishnan S, Luke J, Agarwal R, Lahiri G, Alagappan K. ENHANCING E-COMMERCE DATA ENRICHMENT: A MULTIMODAL APPROACH WITH LARGE LANGUAGE MODELS & RULEBOOK. https://www.wisdomgale.com/jcsi/?mno=243170 [Access: May 02, 2025]. doi:10.5455/JCSI.20250217052842

AMA (American Medical Association) Style

Radhakrishnan S, Luke J, Agarwal R, Lahiri G, Alagappan K. ENHANCING E-COMMERCE DATA ENRICHMENT: A MULTIMODAL APPROACH WITH LARGE LANGUAGE MODELS & RULEBOOK. J Comp Sci Informatics. 2025; 2(3): 148-162. doi:10.5455/JCSI.20250217052842

Vancouver/ICMJE Style

Radhakrishnan S, Luke J, Agarwal R, Lahiri G, Alagappan K. ENHANCING E-COMMERCE DATA ENRICHMENT: A MULTIMODAL APPROACH WITH LARGE LANGUAGE MODELS & RULEBOOK. J Comp Sci Informatics. (2025), [cited May 02, 2025]; 2(3): 148-162. doi:10.5455/JCSI.20250217052842

Harvard Style

Radhakrishnan, S., Luke, . J., Agarwal, . R., Lahiri, . G. & Alagappan, . K. (2025) ENHANCING E-COMMERCE DATA ENRICHMENT: A MULTIMODAL APPROACH WITH LARGE LANGUAGE MODELS & RULEBOOK. J Comp Sci Informatics, 2 (3), 148-162. doi:10.5455/JCSI.20250217052842

Turabian Style

Radhakrishnan, Saravanan, Jerubbaal Luke, Rahul Agarwal, Gargi Lahiri, and Karthick Alagappan. 2025. ENHANCING E-COMMERCE DATA ENRICHMENT: A MULTIMODAL APPROACH WITH LARGE LANGUAGE MODELS & RULEBOOK. Journal of Computer Sciences and Informatics, 2 (3), 148-162. doi:10.5455/JCSI.20250217052842

Chicago Style

Radhakrishnan, Saravanan, Jerubbaal Luke, Rahul Agarwal, Gargi Lahiri, and Karthick Alagappan. "ENHANCING E-COMMERCE DATA ENRICHMENT: A MULTIMODAL APPROACH WITH LARGE LANGUAGE MODELS & RULEBOOK." Journal of Computer Sciences and Informatics 2 (2025), 148-162. doi:10.5455/JCSI.20250217052842

MLA (The Modern Language Association) Style

APA (American Psychological Association) Style

Radhakrishnan, S., Luke, . J., Agarwal, . R., Lahiri, . G. & Alagappan, . K. (2025) ENHANCING E-COMMERCE DATA ENRICHMENT: A MULTIMODAL APPROACH WITH LARGE LANGUAGE MODELS & RULEBOOK. Journal of Computer Sciences and Informatics, 2 (3), 148-162. doi:10.5455/JCSI.20250217052842

Author Login Reviewer Login About Publisher Open Access Policy Editorial Policies Editorial Review Policy Peer Review Policy Editorial & Peer Review Process Publication Ethics and Publication Malpractice Statement Conflict of Interest Policy Plagiarism Policy Protection of Research Participants (Statement On Human And Animal Rights)Privacy Policy Corrections, Retractions & Expressions of Concern Self-Archiving Policies Digital Archiving & Preservation Policies Terms of Use License Information Copyright Information

About Journal of Computer Sciences and Informatics

Journal of Computer Sciences and Informatics is an international, peer-reviewed, open access journal, providing a platform for advances in basic, translat ... Read more.

For best results, please use Internet Explorer or Google Chrome.

Contact Information

Your comments are very important to us

We welcome your ideas, suggestions, comments, or questions. To reach Editorial Board please use one of the methods provided below.

Emails:

info@wisdomgale.com
editorinchief@ejmanager.com

Office Address:
WisdomGale Publishing
14 Rue de Grand-Bigard,
1082 Brussels
Belgium

We will respond within 48 to 72 hours.

Change of address or personal information? Please visit your account at www.ejmanager.com to update your information.

About Journal of Computer Sciences and Informatics

Contact Information

Your comments are very important to us

How to cite this article