Getting Started with Microsoft Guidance

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our NLP). Because of their extraordinary capacity to write human-like text and perform a range of language-related tasks, these models, which are based on deep learning techniques, have earned considerable interest and acceptance. This field has undergone significant scientific developments in recent years. Researchers all over the world have been developing better and more domain-specific LLMs to meet the needs of various use cases.

Large Language Models (LLMs) such as Jsonformer - a bulletproof way to generate structured JSON from language models. Here’s a Source

Note: As of now, the Guidance Acceleration feature is implemented for open LLMs. We can soon expect to see if working with closed LLMs as well.

2. Token Healing - This feature attempts to correct tokenization artifacts that commonly occur at the border between the end of a prompt and the start of a group of generated tokens.

For example - If we ask LLM to auto-complete a URL with the below-mentioned Input, it’s likely to produce the shown output. Apart from the obvious limitation that the URL might not be valid. I'd like to draw your attention to the extra space it creates (highlighted in red). Such considerations make it difficult to construct a dependable parsing function and robustly absorb its result into subsequent phases.

Input:

“The link is <a href=http:”Actual Output: “The link is <a href=http: /www.google.com/search?q”

Expected Output:

“The link is <a href=/cats-d8c4vu/github.com/guidance-ai/guidance/blob/main/notebooks/token_healing.ipynb" target="_blank">jupyter notebook for more examples.

3. Guaranteed Output Structure - Large language models are fantastic at producing useful outputs, but not so much at producing outputs in a specified format (especially open-source ones like LLaMa, GPT-2, and so on). When we want to use the output of a language model as input to another system, this is frequently an issue. With Handlebars, guidance guarantees the output format to be the same as what was being asked for.

Let’s now see Guidance in action -

Installation

Installing guidance is a breeze, just do a pip :

$ pip install guidance

Assume we are now creating a product description for an e-commerce website. Here's how the traditional generation compares to the guidance generation. Feel free to play with notebook with both the below examples.

Traditional Generation

Input:

Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The below shown JSON is the structure of the description with constraints for each of the attributes enclosed within < > brackets. Please follow the mentioned constraint and format diligently.
{ prod_id: <numeric value of 5 digits>, prod_name: <name starts with the prefix 'p_'>,
prod_price: <should be an integer between 1 and 16. Should end with
suffix '$'> }
The product description is

Output:

Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The below shown JSON is the structure of the description with constraints for each of the attributes enclosed within < > brackets. Please follow the mentioned constraint and format diligently. {   prod_id: <numeric value of 5 digits>,     prod_name: <name starts with the prefix 'p_'>, prod_price: <should be an integer between 1 and
16. Should end with suffix '$'> } The product description is { resentprod_id:
<numeric value of 5 digits>, resentprod_name: <name begins with the prefix 'p_'>, resentprod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } In the above example, the product description has 5 constraint fields and 5 attribute fields. The constraints are as follows: resentprod_id: - value of 5 digits, resentprod_name: - name of the product, resentprod_price: - price of the product, resentprod_price_suffix: - suffix of the product price, resentprod_id: - the product id, resentpro diabetic_id: value of 4 digits, resentprod_ astronomer_id: - value of 4 digits, resentprod_ star_id: - value of 4 digits, resentprod_is_generic: - if the product is generic and not the generic type, resentprod_type: - the type of the product, resentprod_is_generic_type

Here’s the code for the above example with GPT-2 language model -

``` from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2-large") model = AutoModelForCausalLM.from_pretrained("gpt2-large")
inputs = tokenizer(Input, return_tensors="pt") tokens = model.generate(
**inputs, max_new_tokens=256, temperature=0.7, do_sample=True,
)

Output:

tokenizer.decode(tokens[0], skip_special_tokens=True))
```

Guidance Generation

Input w/ code:

guidance.llm = guidance.llms.Transformers("gpt-large")
# define the prompt program = guidance("""Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON.
The following is the format
```json {
"prod_id": "{{gen 'id' pattern='[0-9]{5}' stop=','}}",
"prod_name": "{{gen 'name' pattern='p_[A-Za-z]+' stop=','}}",
"prod_price": "{{gen 'price' pattern='\b([1-9]|1[0-6])\b\$' stop=','}}"
}```""")
# execute the prompt
Output = program()

Output:

Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of a fixed set of fields to be filled in the JSON. The following is the format

```json {
"prod_id": "11231",
"prod_name": "p_pizzas",
"prod_price": "11$"
}```

As seen in the preceding instances, with guidance, we can be certain that the output format will be followed within the given restrictions no matter how many times we execute the identical prompt. This capability makes it an excellent choice for constructing any dependable and strong multi-step LLM pipeline.

I hope this overview of Guidance has helped you realize the value it may provide to your daily prompt development cycle. Also, here’s a LinkedIn