Summary
As part of a AI hackathon, I created a simple tool that allowed me to query with human language the public documentation of the enterprise I work in. I leveraged mainly AWS Bedrock, with a Knowledge Base and a single Agent. The results are promising, given that the data was not heavily ‘massaged’ due to time constraints.
Data wrangling
To download all public documentation of Powens, I used wget.
wget -mpEk "https://docs.powens.com/documentation"I got a mixture of html and markdown files:
find ./ -type f -name "*html" | wc
118 118 6437
find ./ -type f -name "*md" | wc
102 102 5721I converted those html files to text using w3m:
find ./ -type f -name "*html" | xargs -I@ bash -c "w3m -dump -cols 100 @ > @.md"Then I proceed to delete the html files:
find ./ -type f -name "*html" | xargs -I@ bash -c "rm @"Some markdown files contained still html so I cleaned them:
less documentation.md
<table data-column-title-hidden data-view="cards"><thead><tr><th></th><th></th><th data-hidden></th><th data-hidden data-card-cover data-type="files"></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><mark style="color:blue;"><strong>Quick Start</strong></mark></td><td>Set up your environment and start testing</td><td></td><td></td><td><a href="integration-guides/quick-start">quick-start</a></td></tr><tr><td><mark style="color:blue;"><strong>API Overview</strong></mark></td><td>Understand the big picture</td><td></td><td></td><td><a href="integration-guides/quick-start/api-overview">api-overview</a></td></tr><tr><td><mark style="color:blue;"><strong>Glossary</strong></mark></td><td>Learn the most important concepts</td><td></td><td></td><td><a href="ressources/glossary">glossary</a></td></tr></tbody></table>
sed -E -i 's/<[a-zA-Z0-9/=" .:#;~?%&()-]*>/ /g' ./**/*(.)I still had a bunch of emojis to get rid of, but learned sadly that sed cannot delete emojis easily. 🙁
So, considered my cleaning “good enough” and continued.
I Put all those files in an s3 bucket.
Knowledge base
Then I went to AWS Bedrock. I Created a Knowledge base, and supplied the content of the s3 bucket.
I chose AWS Titan V2 to create the embeddings, and store them in AWS’ ElasticSearch.
I use the defaults for everything.
AWS allowed me to test the knowledge base, for Foundation Model I chose AWS Novalite.
Me: Hello, how can I onboard to powens?
KB: User, to onboard to Powens, you can follow these steps:
1. Create your account by signing up in the administration console
2. Register an organization or join one by filling in some information about your company
3. Register your domain and a client application
4. Register a webhook if needed
5. Integrate the API by following the integration guidesGood response.
Me: What is the API endpoint to create users?
KB: The model cannot find sufficient information to answer the question. However, based on the search results, it is possible to create a user by adding a new connection. The API endpoint for adding a new connection is `https://{domain}.biapi.pro/2.0/connections`This endpoint requires header authentication with a *users token* and a *user token*Once the connection is added, a `USER_CREATED` webhook is emitted after a permanent user is createdWrong answer.
Being able to feed data to an existing model without retraining it is valuable. Yet, it is still necessary to double check.
The documentation provided the way to create a user, yet the model didn’t catch it.
### Create a new user and generate an associated access token
`POST` `https://{domain}.biapi.pro/2.0/auth/init`Agent
Afterwards I defined an agent. Initially for Foundation Model I chose Claude Sonnet 3 and then AWS Novalite.
And I gave the following prompt to the agent:
You work for Powens, you provide support and answer questions. You rely mainly on your knowledge base, and don't deviate from it. You provide concise answers. You recognize to the user if you don't have enough information to answer his request.Then I tested:
ME: How can I create a connection in the powens api?
Agent: To create a connection in the Powens API, you need to let your users add a first connection. The simplest way is to use the Powens Connect webview, a set of web-based endpoints that will allow the user to choose their bank or provider, gather credentials for later sync, and manage consent to share their accounts or documents with your service.**[1]** The key steps to add a new connection are: 1. Create a user in the Powens API. All data is organized and scoped by users, and you will need user-scoped access tokens to interact with the API. 2. Initiate the connection process using the Connect webview, which will guide the user through authenticating with their bank/provider and granting consent. 3. Once the connection is active, Powens will synchronize the user's data and make it available through the API.Ok, reasonable.
ME: Can you give me the API endpoint to create the connection and the payload?
Agent: To create a new connection in the Powens API, you need to send a POST request to the following endpoint: https://{domain}.biapi.pro/2.0/users/{userId}/connections The request body should contain a ConnectionRequest object with the required parameters. According to the search results, the ConnectionRequest object is documented under the "API endpoints" section.
ME: What does this 'ConnectionRequest' oject contains?
Agent: The search results do not provide details on the specific fields included in the ConnectionRequest object for creating a new connection in the Powens API. The results explain what a connection is and that creating a connection requires the end user to authenticate with the connector, but do not list the fields of the ConnectionRequest object.**[1]**
ME: Don't you have an openapi schema with the payload?
Agent: The search results do not provide the full OpenAPI schema for the ConnectionRequest object used to create a new connection. However, some information about the parameters required for creating a new connection can be found: - id_connector (integer) - ID of the connector - connector_uuid (string) - optional UUID of the connector (replaces id_connector) - source (string) - optional comma-separated list of sources to use for the connection synchronization**[1]** The search results mention that to create a new connection, "You have to give all needed parameters (use /banks/ID/fields or /providers/ID/fields to get them)." This suggests that additional parameters specific to the bank or provider may be required when creating a new connection, but the full schema is not provided in these search results.**[2]**Here the agent provided good information, but I needed to twist his arm a bit to get the knowlege I wanted.
Conclusion
LLMs are quite promising for the types of workloads that involves a Knowledge Base and questions on top of it. Despite that the Agents deviate or didn’t found the information I wanted (sometimes), I recognize that this was a quick exercise and that I haven’t invested much time tagging the information so that it was easily understandable for the LLM.
