End-to-End#

Generate EN Knowledge Graph Triples format -> MS text

This tutorial is available as an IPython notebook at Malaya-Graph/example/kg-to-text.

This module only trained on standard language structure, so it is not save to use it for local language structure.

This input must be an english knowledge graph triples format.

[1]:
# !pip3 install -U git+https://github.com/huseinzol05/malaya@5.0 --no-deps
[2]:
import os

os.environ['CUDA_VISIBLE_DEVICES'] = ''
[3]:
import logging

logging.basicConfig(level=logging.INFO)
[4]:
%%time

import malaya_graph
/home/husein/.local/lib/python3.8/site-packages/malaya/tokenizer.py:208: FutureWarning: Possible nested set at position 3372
  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
/home/husein/.local/lib/python3.8/site-packages/malaya/tokenizer.py:208: FutureWarning: Possible nested set at position 3890
  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
CPU times: user 3.16 s, sys: 3.53 s, total: 6.69 s
Wall time: 2.17 s

List available HuggingFace model#

[5]:
malaya_graph.kg_to_text.available_huggingface()
INFO:malaya_graph.kg_to_text:tested on test set 02 part translated KELM, https://huggingface.co/datasets/mesolitica/translated-KELM
[5]:
Size (MB) BLEU SacreBLEU Verbose Suggested length
mesolitica/finetune-ttkg-t5-tiny-standard-bahasa-cased 139 61.067843 86.1/68.4/55.8/45.9 (BP = 0.980 ratio = 0.980 ... 256
mesolitica/finetune-ttkg-t5-small-standard-bahasa-cased 242 61.559203 86.0/68.4/56.1/46.3 (BP = 0.984 ratio = 0.984 ... 256
mesolitica/finetune-ttkg-t5-base-standard-bahasa-cased 892 58.764876 84.5/65.8/53.0/43.1 (BP = 0.984 ratio = 0.985 ... 256

Load HuggingFace model#

def huggingface(model: str = 'mesolitica/finetune-ttkg-t5-small-standard-bahasa-cased', **kwargs):
    """
    Load HuggingFace model to knowledge graph to text.

    Parameters
    ----------
    model: str, optional (default='mesolitica/finetune-ttkg-t5-small-standard-bahasa-cased')
        Check available models at `malaya_graph.kg_to_text.available_huggingface()`.

    Returns
    -------
    result: malaya_graph.model.text_to_kg.KGtoText
    """
[6]:
ttkg = malaya_graph.text_to_kg.e2e.huggingface()
[7]:
model = malaya_graph.kg_to_text.huggingface()
[11]:
string1 = "Yang Berhormat Dato Sri Haji Mohammad Najib bin Tun Haji Abdul Razak ialah ahli politik Malaysia dan merupakan bekas Perdana Menteri Malaysia ke-6 yang mana beliau menjawat jawatan dari 3 April 2009 hingga 9 Mei 2018. Beliau juga pernah berkhidmat sebagai bekas Menteri Kewangan dan merupakan Ahli Parlimen Pekan Pahang"
string2 = "Pahang ialah negeri yang ketiga terbesar di Malaysia Terletak di lembangan Sungai Pahang yang amat luas negeri Pahang bersempadan dengan Kelantan di utara Perak Selangor serta Negeri Sembilan di barat Johor di selatan dan Terengganu dan Laut China Selatan di timur."
[17]:
r = ttkg.generate([string1, string2], do_sample=True,
    max_length=256,
    top_k=0,
    temperature=0.7)
r
[17]:
[{'G': <networkx.classes.multidigraph.MultiDiGraph at 0x7fed153ffd00>,
  'triple': [{'head': 'Haji Mohammad Najib',
    'type': 'position held',
    'tail': 'Prime Minister of Malaysia'},
   {'head': 'Pahang', 'type': 'country', 'tail': 'Malaysia'}],
  'rebel': '<triplet> Haji Mohammad Najib <subj> Prime Minister of Malaysia <obj> position held <triplet> Pahang <subj> Malaysia <obj> country'},
 {'G': <networkx.classes.multidigraph.MultiDiGraph at 0x7fed153f8fd0>,
  'triple': [{'head': 'Pahang', 'type': 'country', 'tail': 'Malaysia'},
   {'head': 'Pahang', 'type': 'shares border with', 'tail': 'Perak'},
   {'head': 'Pahang', 'type': 'shares border with', 'tail': 'Johor'},
   {'head': 'Pahang',
    'type': 'located in the administrative territorial entity',
    'tail': 'Malaysia'},
   {'head': 'Pahang',
    'type': 'located in or next to body of water',
    'tail': 'Pahang River'}],
  'rebel': '<triplet> Pahang <subj> Malaysia <obj> country <subj> Perak <obj> shares border with <subj> Johor <obj> shares border with <subj> Malaysia <obj> located in the administrative territorial entity <subj> Pahang River <obj> located in or next to body of water'}]
[18]:
r[0]['triple']
[18]:
[{'head': 'Haji Mohammad Najib',
  'type': 'position held',
  'tail': 'Prime Minister of Malaysia'},
 {'head': 'Pahang', 'type': 'country', 'tail': 'Malaysia'}]

Predict#

def generate(self, kgs: List[List[Dict]], **kwargs):
    """
    Generate a text from list of knowledge graph dictionary.

    Parameters
    ----------
    kg: List[List[Dict]]
        list of list of {'head', 'type', 'tail'}
    **kwargs: vector arguments pass to huggingface `generate` method.
        Read more at https://huggingface.co/docs/transformers/main_classes/text_generation

    Returns
    -------
    result: List[str]
    """
[23]:
model.generate([r_['triple'] for r_ in r], do_sample=True,
    max_length=100,
    top_k=50,
    top_p=0.95,
    num_return_sequences=3)
[23]:
['Beliau merupakan Perdana Menteri Malaysia, Haji Mohammad Najib yang baru dilantik, yang dipilih oleh Majlis Bandaraya Pahang sebagai wakil parti kepada Majlis Bandaraya Pahang dari 2010 hingga 2017.',
 'Haji Mohammad Najib merupakan Perdana Menteri Malaysia ke-12, yang terakhir berkhidmat sebagai Menteri Dalam Negeri Pahang dari 19 Mei 2017 hingga 5 September 2019.',
 'Haji Mohammad Najib merupakan Perdana Menteri Malaysia pada 2016 di bawah pentadbiran Sultan Hussein.',
 'Pahang bersempadan dengan Pahang berhampiran Perak, Perak, Johor, Pahang dan Pahang Sungai Pahang.',
 'Pahang bersempadan dengan Sungai Pahang di selatan dan utara dengan Perak di selatan, Johor di barat laut, dan Perak di barat.',
 'Pahang bersempadan dengan negeri Pahang di barat, Selangor di barat, Perak di timur, dan Johor di barat, Sungai Pahang di barat laut dan Sungai Pahang di barat.']