{"id":1879,"date":"2025-04-01T17:53:35","date_gmt":"2025-04-01T17:53:35","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/04\/01\/protect-data-privacy-in-amazon-bedrock-with-vault\/"},"modified":"2025-04-01T17:53:35","modified_gmt":"2025-04-01T17:53:35","slug":"protect-data-privacy-in-amazon-bedrock-with-vault","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/04\/01\/protect-data-privacy-in-amazon-bedrock-with-vault\/","title":{"rendered":"Protect data privacy in Amazon Bedrock with Vault"},"content":{"rendered":"<p>You can use retrieval augmented generation (RAG) to refine and improve the output of a large language model (LLM) without retraining the model. However, many data sources include sensitive information, such as personal identifiable information (PII), that the LLM and its applications should not require or disclose \u2014 but sometimes they do. Sensitive information disclosure is one of the OWASP <a href=\"https:\/\/genai.owasp.org\/llm-top-10\/\">2025 Top 10 Risks &amp; Mitigations for LLMs and Gen AI Apps<\/a>. To mitigate this issue, OWASP recommends data sanitization, access control, and encryption.<\/p>\n<p>This post shows how HashiCorp Vault\u2019s <a href=\"https:\/\/developer.hashicorp.com\/vault\/docs\/secrets\/transit\">transit secrets engine<\/a> can be configured to encrypt and protect sensitive data before sending it to an <a href=\"https:\/\/aws.amazon.com\/bedrock\/knowledge-bases\/\">Amazon Bedrock Knowledge Base<\/a> created by Terraform. <\/p>\n<p>The sample dataset contains a <a href=\"https:\/\/insideairbnb.com\/get-the-data\/\">list of vacation rentals from Airbnb<\/a>, which includes the names of the hosts. Not all applications require the names of the hosts, so the original application collecting the information encrypts the field using Vault before storing a rental listing in the database. As a result, running queries against the Amazon Bedrock Knowledge Base outputs the encrypted name and prevents the leakage of PII.<\/p>\n<h2>Encrypt and upload data<\/h2>\n<p>The demo uses an <a href=\"https:\/\/developer.hashicorp.com\/hcp\/docs\/vault\/what-is-hcp-vault\">HCP Vault<\/a> cluster with the transit secrets engine enabled. The host name for each rental gets encrypted by a key named listings with <a href=\"https:\/\/developer.hashicorp.com\/vault\/docs\/secrets\/transit#convergent-encryption\">convergent encryption<\/a> enabled. Convergent encryption ensures that a plaintext host name results in the same ciphertext. This allows the LLM to analyze each rental listing for similarities between hosts without knowing the actual host name.<\/p>\n<p>resource &#8220;hcp_hvn&#8221; &#8220;rental&#8221; {<br \/>\n hvn_id         = var.name<br \/>\n cloud_provider = &#8220;aws&#8221;<br \/>\n region         = var.region<br \/>\n cidr_block     = var.cidr_block<br \/>\n}<\/p>\n<p>resource &#8220;hcp_vault_cluster&#8221; &#8220;rental&#8221; {<br \/>\n cluster_id      = var.name<br \/>\n hvn_id          = hcp_hvn.rental.hvn_id<br \/>\n tier            = &#8220;plus_small&#8221;<br \/>\n public_endpoint = true<br \/>\n}<\/p>\n<p>resource &#8220;hcp_vault_cluster_admin_token&#8221; &#8220;rental&#8221; {<br \/>\n cluster_id = hcp_vault_cluster.rental.cluster_id<br \/>\n}<\/p>\n<p>resource &#8220;vault_mount&#8221; &#8220;transit_rental&#8221; {<br \/>\n path                      = var.name<br \/>\n type                      = &#8220;transit&#8221;<br \/>\n description               = &#8220;Key ring for rental information&#8221;<br \/>\n default_lease_ttl_seconds = 3600<br \/>\n max_lease_ttl_seconds     = 86400<br \/>\n}<\/p>\n<p>resource &#8220;vault_transit_secret_backend_key&#8221; &#8220;listings&#8221; {<br \/>\n backend               = vault_mount.transit_rental.path<br \/>\n name                  = &#8220;listings&#8221;<br \/>\n derived               = true<br \/>\n convergent_encryption = true<br \/>\n deletion_allowed      = true<br \/>\n}<\/p>\n<p>To process a <a href=\"https:\/\/data.insideairbnb.com\/united-states\/ny\/new-york-city\/2025-01-03\/data\/listings.csv.gz\">CSV file<\/a> with a list of vacation rentals in New York City from January 2025, you would encrypt the name of the host as if it is sensitive data. Create a local script that uses the <a href=\"https:\/\/github.com\/hvac\/hvac\">HVAC Python client for HashiCorp Vault<\/a> to access the encryption API endpoint in Vault.<\/p>\n<p>import base64<br \/>\nimport json<br \/>\nimport logging<br \/>\nimport os<\/p>\n<p>import boto3<br \/>\nimport hvac<br \/>\nimport pandas<br \/>\nfrom botocore.exceptions import ClientError<br \/>\nfrom langchain_community.document_loaders import CSVLoader<\/p>\n<p>from vardata import S3_BUCKET_NAME<\/p>\n<p>LISTINGS_FILE = &#8220;.\/data\/raw\/listings.csv&#8221;<br \/>\nENCRYPTED_LISTINGS_FILE = &#8220;.\/data\/listings.csv&#8221;<br \/>\nMOUNT_POINT = &#8220;rentals&#8221;<br \/>\nKEY_NAME = &#8220;listings&#8221;<br \/>\nCONTEXT = json.dumps({&#8220;location&#8221;: &#8220;New York City&#8221;, &#8220;field&#8221;: &#8220;host_name&#8221;})<\/p>\n<p>client = hvac.Client(<br \/>\n   url=os.environ[&#8220;VAULT_ADDR&#8221;],<br \/>\n   token=os.environ[&#8220;VAULT_TOKEN&#8221;],<br \/>\n   namespace=os.getenv(&#8220;VAULT_NAMESPACE&#8221;),<br \/>\n)<\/p>\n<p>def encrypt_payload(payload):<br \/>\n   try:<br \/>\n       encrypt_data_response = client.secrets.transit.encrypt_data(<br \/>\n           mount_point=MOUNT_POINT,<br \/>\n           name=KEY_NAME,<br \/>\n           plaintext=base64.b64encode(payload.encode()).decode(),<br \/>\n           context=base64.b64encode(CONTEXT.encode()).decode(),<br \/>\n       )<br \/>\n       ciphertext = encrypt_data_response[&#8220;data&#8221;][&#8220;ciphertext&#8221;]<br \/>\n       return ciphertext<br \/>\n   except AttributeError:<br \/>\n       return &#8220;&#8221;<\/p>\n<p>def encrypt_hostnames():<br \/>\n   dataframe = pandas.read_csv(LISTINGS_FILE)<br \/>\n   dataframe[&#8220;host_name&#8221;] = dataframe[&#8220;host_name&#8221;].apply(lambda x: encrypt_payload(x))<br \/>\n   dataframe.to_csv(ENCRYPTED_LISTINGS_FILE, index=False)<\/p>\n<p># omitted for clarity<\/p>\n<p>This example data would usually exist in a database. A separate application that adds the listing would encrypt and store the data in the database before it gets exported to CSV for reporting. While you could upload the CSV file in its entirety as a document, for this demo, continue processing the rental listings by uploading each record as its own document to debug and improve the LLM\u2019s responses. For more information on how to process semi-structured data for Amazon Bedrock, <a href=\"https:\/\/github.com\/aws-samples\/semi-structured-data-processing-with-amazon-bedrock\">review the AWS samples repository<\/a>.<\/p>\n<p>Each entry in the CSV file gets written as its own document to the S3 bucket. This example uses <a href=\"https:\/\/python.langchain.com\/docs\/integrations\/document_loaders\/csv\/\">LangChain<\/a> to convert each CSV record to a text file and upload it to S3.<\/p>\n<p>import base64<br \/>\nimport json<br \/>\nimport logging<br \/>\nimport os<\/p>\n<p>import boto3<br \/>\nimport hvac<br \/>\nimport pandas<br \/>\nfrom botocore.exceptions import ClientError<br \/>\nfrom langchain_community.document_loaders import CSVLoader<\/p>\n<p>from vardata import S3_BUCKET_NAME<\/p>\n<p>LISTINGS_FILE = &#8220;.\/data\/raw\/listings.csv&#8221;<br \/>\nENCRYPTED_LISTINGS_FILE = &#8220;.\/data\/listings.csv&#8221;<br \/>\nMOUNT_POINT = &#8220;rentals&#8221;<br \/>\nKEY_NAME = &#8220;listings&#8221;<\/p>\n<p># omitted for clarity<\/p>\n<p>def create_documents():<br \/>\n   loader = CSVLoader(ENCRYPTED_LISTINGS_FILE)<br \/>\n   data = loader.load()<br \/>\n   return data<\/p>\n<p>def upload_file(body, bucket, object):<br \/>\n   s3_client = boto3.client(&#8220;s3&#8221;)<br \/>\n   try:<br \/>\n       s3_client.put_object(Body=body, Bucket=bucket, Key=object)<br \/>\n   except ClientError as e:<br \/>\n       logging.error(e)<br \/>\n       return False<br \/>\n   return True<\/p>\n<p>def main():<br \/>\n   encrypt_hostnames()<br \/>\n   docs = create_documents()<br \/>\n   for i, doc in enumerate(docs):<br \/>\n       upload_file(doc.page_content, S3_BUCKET_NAME, f&#8221;listings\/{i}&#8221;)<\/p>\n<p>if __name__ == &#8220;__main__&#8221;:<br \/>\n   main()<\/p>\n<p>This script is intended for educational and testing purposes only. In a production use case, you can process <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/processing-job.html\">data using AWS Sagemaker<\/a>. <\/p>\n<p>The script creates a text file with the name of the host in ciphertext. All other non-sensitive attributes, such as room type and listing ID, remain in plaintext.<\/p>\n<p>id: 1284789<br \/>\nname: Light-filled Brownstone Triplex with Roof Deck<br \/>\nhost_id: 5768571<br \/>\nhost_name: vault:v1:ocqgeegsqMeo2PawbxsK8lQ1Sqp5\/VVGqftD4DUbx1iKMw==<br \/>\nneighbourhood_group: Brooklyn<br \/>\nneighbourhood: Prospect Heights<br \/>\nlatitude: 40.67506<br \/>\nlongitude: -73.96423<br \/>\nroom_type: Entire home\/apt<br \/>\nprice:<br \/>\nminimum_nights: 30<br \/>\nnumber_of_reviews: 4<br \/>\nlast_review: 2023-07-28<br \/>\nreviews_per_month: 0.06<br \/>\ncalculated_host_listings_count: 1<br \/>\navailability_365: 0<br \/>\nnumber_of_reviews_ltm: 0<br \/>\nlicense:<\/p>\n<p>After processing the data, set up an Amazon Bedrock Knowledge Base to ingest the documents from S3 as a data source. If you stored the data in Amazon Aurora, Amazon Redshift, or Glue Data Catalog, you can set up a knowledge base to <a href=\"https:\/\/docs.aws.amazon.com\/bedrock\/latest\/userguide\/knowledge-base-build-structured.html\">ingest from a structured data source<\/a>.<\/p>\n<h2>Set up a vector store<\/h2>\n<p>Amazon Bedrock Knowledge Bases allow you to add proprietary information into applications using RAG. To ingest a data source from S3, Bedrock Knowledge Bases requires the following:<\/p>\n<p>Sufficient IAM policies to connect to S3<br \/>\nA supported <a href=\"https:\/\/docs.aws.amazon.com\/bedrock\/latest\/userguide\/knowledge-base-setup.html\">vector store<\/a> for embeddings.<\/p>\n<p>This demo uses <a href=\"https:\/\/aws.amazon.com\/opensearch-service\/features\/serverless\/\">Amazon OpenSearch Serverless<\/a> to provision an OpenSearch cluster for vector embeddings. The cluster requires additional IAM policies to access collections and indexes that store the vector embeddings.<\/p>\n<p>Below are two security policies: one for the encryption of the collection with embeddings and a second for network access to the collection and dashboard. The security policy for network access to the collection and dashboard ensures that Terraform can properly create an index.<\/p>\n<p>resource &#8220;aws_opensearchserverless_security_policy&#8221; &#8220;rentals_encryption&#8221; {<br \/>\n name = var.name<br \/>\n type = &#8220;encryption&#8221;<br \/>\n policy = jsonencode({<br \/>\n   &#8220;Rules&#8221; = [<br \/>\n     {<br \/>\n       &#8220;Resource&#8221; = [<br \/>\n         &#8220;collection\/${var.name}&#8221;<br \/>\n       ],<br \/>\n       &#8220;ResourceType&#8221; = &#8220;collection&#8221;<br \/>\n     },<br \/>\n   ],<br \/>\n   &#8220;AWSOwnedKey&#8221; = true<br \/>\n })<br \/>\n}<\/p>\n<p>resource &#8220;aws_opensearchserverless_security_policy&#8221; &#8220;rentals_network&#8221; {<br \/>\n name = var.name<br \/>\n type = &#8220;network&#8221;<br \/>\n policy = jsonencode([{<br \/>\n   &#8220;Rules&#8221; = [<br \/>\n     {<br \/>\n       &#8220;Resource&#8221; = [<br \/>\n         &#8220;collection\/${var.name}&#8221;<br \/>\n       ],<br \/>\n       &#8220;ResourceType&#8221; = &#8220;collection&#8221;<br \/>\n     },<br \/>\n     {<br \/>\n       &#8220;Resource&#8221; = [<br \/>\n         &#8220;collection\/${var.name}&#8221;<br \/>\n       ],<br \/>\n       &#8220;ResourceType&#8221; = &#8220;dashboard&#8221;<br \/>\n     },<br \/>\n   ],<br \/>\n   &#8220;AllowFromPublic&#8221; = true,<br \/>\n }])<br \/>\n}<\/p>\n<p>The configuration below generates an access policy to allow Amazon Bedrock and your current AWS credentials in Terraform to read and write to the index with the collection of the embeddings.<\/p>\n<p>data &#8220;aws_caller_identity&#8221; &#8220;current&#8221; {}<\/p>\n<p>resource &#8220;aws_opensearchserverless_access_policy&#8221; &#8220;rentals&#8221; {<br \/>\n name        = var.name<br \/>\n type        = &#8220;data&#8221;<br \/>\n description = &#8220;read and write permissions&#8221;<br \/>\n policy = jsonencode([<br \/>\n   {<br \/>\n     Rules = [<br \/>\n       {<br \/>\n         ResourceType = &#8220;index&#8221;,<br \/>\n         Resource = [<br \/>\n           &#8220;index\/${var.name}\/*&#8221;<br \/>\n         ],<br \/>\n         Permission = [<br \/>\n           &#8220;aoss:*&#8221;<br \/>\n         ]<br \/>\n       },<br \/>\n       {<br \/>\n         ResourceType = &#8220;collection&#8221;,<br \/>\n         Resource = [<br \/>\n           &#8220;collection\/${var.name}&#8221;<br \/>\n         ],<br \/>\n         Permission = [<br \/>\n           &#8220;aoss:*&#8221;<br \/>\n         ]<br \/>\n       }<br \/>\n     ],<br \/>\n     Principal = [<br \/>\n       aws_iam_role.bedrock.arn,<br \/>\n       data.aws_caller_identity.current.arn<br \/>\n     ]<br \/>\n   }<br \/>\n ])<br \/>\n}<\/p>\n<p>After creating these policies, Terraform can now build the collection. For Amazon Bedrock Knowledge Bases, set the type to VECTORSEARCH.<\/p>\n<p>resource &#8220;aws_opensearchserverless_collection&#8221; &#8220;rentals&#8221; {<br \/>\n name = var.name<br \/>\n type = &#8220;VECTORSEARCH&#8221;<\/p>\n<p> depends_on = [<br \/>\n   aws_opensearchserverless_security_policy.rentals_encryption,<br \/>\n   aws_opensearchserverless_security_policy.rentals_network<br \/>\n ]<br \/>\n}<\/p>\n<p>Finally, you need to create an index for the vector embeddings collection in the OpenSearch cluster. Based on <a href=\"https:\/\/docs.aws.amazon.com\/bedrock\/latest\/userguide\/knowledge-base-setup.html\">prerequisites for bringing your own vector store<\/a>, configure an index using the <a href=\"https:\/\/registry.terraform.io\/providers\/opensearch-project\/opensearch\/latest\/docs\">OpenSearch Provider for Terraform<\/a>. This example uses <a href=\"https:\/\/aws.amazon.com\/bedrock\/amazon-models\/titan\/\">Amazon Titan<\/a> 2.0 embeddings, which have 1024 dimensions in the vector.<\/p>\n<p>provider &#8220;opensearch&#8221; {<br \/>\n url         = aws_opensearchserverless_collection.rentals.collection_endpoint<br \/>\n healthcheck = false<br \/>\n}<\/p>\n<p>locals {<br \/>\n vector_field   = &#8220;bedrock-knowledge-base-default-vector&#8221;<br \/>\n metadata_field = &#8220;AMAZON_BEDROCK_METADATA&#8221;<br \/>\n text_field     = &#8220;AMAZON_BEDROCK_TEXT_CHUNK&#8221;<br \/>\n}<\/p>\n<p>resource &#8220;opensearch_index&#8221; &#8220;bedrock_knowledge_base&#8221; {<br \/>\n name                           = &#8220;bedrock-knowledge-base-default-index&#8221;<br \/>\n number_of_shards               = &#8220;2&#8221;<br \/>\n number_of_replicas             = &#8220;0&#8221;<br \/>\n index_knn                      = true<br \/>\n index_knn_algo_param_ef_search = &#8220;512&#8221;<br \/>\n mappings                       = &lt;&lt;-EOF<br \/>\n   {<br \/>\n     &#8220;properties&#8221;: {<br \/>\n       &#8220;${local.vector_field}&#8221;: {<br \/>\n         &#8220;type&#8221;: &#8220;knn_vector&#8221;,<br \/>\n         &#8220;dimension&#8221;: 1024,<br \/>\n         &#8220;method&#8221;: {<br \/>\n           &#8220;name&#8221;: &#8220;hnsw&#8221;,<br \/>\n           &#8220;engine&#8221;: &#8220;faiss&#8221;,<br \/>\n           &#8220;parameters&#8221;: {<br \/>\n             &#8220;m&#8221;: 16,<br \/>\n             &#8220;ef_construction&#8221;: 512<br \/>\n           },<br \/>\n           &#8220;space_type&#8221;: &#8220;l2&#8221;<br \/>\n         }<br \/>\n       },<br \/>\n       &#8220;${local.metadata_field}&#8221;: {<br \/>\n         &#8220;type&#8221;: &#8220;text&#8221;,<br \/>\n         &#8220;index&#8221;: &#8220;false&#8221;<br \/>\n       },<br \/>\n       &#8220;${local.text_field}&#8221;: {<br \/>\n         &#8220;type&#8221;: &#8220;text&#8221;,<br \/>\n         &#8220;index&#8221;: &#8220;true&#8221;<br \/>\n       }<br \/>\n     }<br \/>\n   }<br \/>\n EOF<br \/>\n}<\/p>\n<p>After configuring the vector store, you can create an Amazon Bedrock Knowledge Base for the S3 bucket with rental listings.<\/p>\n<h2>Configure RAG in Amazon Bedrock<\/h2>\n<p>In order to augment the responses from an LLM with rental listing information, you need to create an <a href=\"https:\/\/docs.aws.amazon.com\/bedrock\/latest\/userguide\/kb-how-it-works.html\">Amazon Bedrock Knowledge Base<\/a> to facilitate RAG. The knowledge base requires sufficient IAM access to the S3 bucket with rental listings and the OpenSearch collection. In addition to the data source and collection access, Amazon Bedrock\u2019s IAM role also needs permissions to invoke the model used for embeddings.<\/p>\n<p>resource &#8220;aws_iam_policy&#8221; &#8220;bedrock&#8221; {<br \/>\n name        = &#8220;bedrock-${var.name}-s3&#8221;<br \/>\n path        = &#8220;\/&#8221;<br \/>\n description = &#8220;Allow Bedrock Knowledge Base to access S3 bucket with rentals&#8221;<\/p>\n<p> policy = jsonencode({<br \/>\n   Version = &#8220;2012-10-17&#8221;<br \/>\n   Statement = [<br \/>\n     {<br \/>\n       Action = [<br \/>\n         &#8220;s3:ListBucket&#8221;,<br \/>\n       ],<br \/>\n       Effect   = &#8220;Allow&#8221;,<br \/>\n       Resource = aws_s3_bucket.rentals.arn,<br \/>\n       Condition = {<br \/>\n         StringEquals = {<br \/>\n           &#8220;aws:ResourceAccount&#8221; = data.aws_caller_identity.current.account_id<br \/>\n         }<br \/>\n       }<br \/>\n     },<br \/>\n     {<br \/>\n       Action = [<br \/>\n         &#8220;s3:GetObject&#8221;,<br \/>\n       ],<br \/>\n       Effect   = &#8220;Allow&#8221;,<br \/>\n       Resource = &#8220;${aws_s3_bucket.rentals.arn}\/*&#8221;,<br \/>\n       Condition = {<br \/>\n         StringEquals = {<br \/>\n           &#8220;aws:ResourceAccount&#8221; = data.aws_caller_identity.current.account_id<br \/>\n         }<br \/>\n       }<br \/>\n     },<br \/>\n     {<br \/>\n       Action = [<br \/>\n         &#8220;aoss:APIAccessAll&#8221;<br \/>\n       ],<br \/>\n       Effect   = &#8220;Allow&#8221;,<br \/>\n       Resource = aws_opensearchserverless_collection.rentals.arn,<br \/>\n     },<br \/>\n     {<br \/>\n       Action = [<br \/>\n         &#8220;bedrock:InvokeModel&#8221;<br \/>\n       ],<br \/>\n       Effect   = &#8220;Allow&#8221;,<br \/>\n       Resource = data.aws_bedrock_foundation_model.embedding.model_arn,<br \/>\n     }<br \/>\n   ]<br \/>\n })<br \/>\n}<\/p>\n<p>resource &#8220;aws_iam_role&#8221; &#8220;bedrock&#8221; {<br \/>\n name_prefix = &#8220;bedrock-${var.name}-&#8220;<\/p>\n<p> assume_role_policy = jsonencode({<br \/>\n   Version = &#8220;2012-10-17&#8221;<br \/>\n   Statement = [<br \/>\n     {<br \/>\n       Action = &#8220;sts:AssumeRole&#8221;<br \/>\n       Effect = &#8220;Allow&#8221;<br \/>\n       Principal = {<br \/>\n         Service = &#8220;bedrock.amazonaws.com&#8221;<br \/>\n       },<br \/>\n       Condition = {<br \/>\n         StringEquals = {<br \/>\n           &#8220;aws:SourceAccount&#8221; = data.aws_caller_identity.current.account_id<br \/>\n         },<br \/>\n       }<br \/>\n     },<br \/>\n   ]<br \/>\n })<br \/>\n}<\/p>\n<p>resource &#8220;aws_iam_role_policy_attachment&#8221; &#8220;bedrock&#8221; {<br \/>\n role       = aws_iam_role.bedrock.name<br \/>\n policy_arn = aws_iam_policy.bedrock.arn<br \/>\n}<\/p>\n<p>Configure the Amazon Bedrock knowledge base in Terraform. It uses the IAM role with permissions to the S3 bucket, OpenSearch collection, and embedding model. It also defines the field mappings defined in the OpenSearch index for the vector embeddings.<\/p>\n<p>resource &#8220;aws_bedrockagent_knowledge_base&#8221; &#8220;rentals&#8221; {<br \/>\n name     = var.name<br \/>\n role_arn = aws_iam_role.bedrock.arn<br \/>\n knowledge_base_configuration {<br \/>\n   vector_knowledge_base_configuration {<br \/>\n     embedding_model_arn = data.aws_bedrock_foundation_model.embedding.model_arn<br \/>\n   }<br \/>\n   type = &#8220;VECTOR&#8221;<br \/>\n }<\/p>\n<p> storage_configuration {<br \/>\n   type = &#8220;OPENSEARCH_SERVERLESS&#8221;<br \/>\n   opensearch_serverless_configuration {<br \/>\n     collection_arn    = aws_opensearchserverless_collection.rentals.arn<br \/>\n     vector_index_name = opensearch_index.bedrock_knowledge_base.name<br \/>\n     field_mapping {<br \/>\n       vector_field   = local.vector_field<br \/>\n       text_field     = local.text_field<br \/>\n       metadata_field = local.metadata_field<br \/>\n     }<br \/>\n   }<br \/>\n }<br \/>\n}<\/p>\n<p>Create a data source for the knowledge base linked to the Amazon S3 bucket with rental listings.<\/p>\n<p>resource &#8220;aws_bedrockagent_data_source&#8221; &#8220;listings&#8221; {<br \/>\n knowledge_base_id = aws_bedrockagent_knowledge_base.rentals.id<br \/>\n name              = &#8220;listings&#8221;<br \/>\n data_source_configuration {<br \/>\n   type = &#8220;S3&#8221;<br \/>\n   s3_configuration {<br \/>\n     bucket_arn = aws_s3_bucket.rentals.arn<br \/>\n   }<br \/>\n }<br \/>\n}<\/p>\n<p>After deploying this in Terraform, you can test the knowledge base with some requests.<\/p>\n<h2>Test the knowledge base<\/h2>\n<p>When you ask questions to the knowledge base with information about rental listings, you should get responses with additional detail based on the listings.<\/p>\n<p>This example asks Anthropic Claude 3.7 Sonnet how many vacation rentals are in the Box House Hotel. It returned a response indicating that it searched the rental listings and found at least ten of them.<\/p>\n<p>Input:<br \/>\nHow many vacation rentals are in the Box House Hotel?<\/p>\n<p>Output:<br \/>\nBased on the search results, there are at least 10 vacation rentals listed at<br \/>\nthe Box House Hotel in Greenpoint, Brooklyn. These include various room types<br \/>\nsuch as Superior rooms, Lofts with Terraces, Duplex rooms with Terraces, and<br \/>\nmulti-bedroom options. All of these listings share the same host<br \/>\n(host_id: 417504) and are located at approximately the same coordinates in the<br \/>\nGreenpoint neighborhood of Brooklyn.<\/p>\n<p>Loading data into Amazon Bedrock improved the response by adding new information that the model did not have for training. Since you took additional steps to encrypt the names of the hosts and protect them as sensitive data, the model should not return information with the plaintext names of the hosts. <\/p>\n<p>The example below asks an additional question about the name of the host who owns these vacation rental listings. After some searching, the model responds that the host name is encrypted and it found 29 other listings with the same ciphertext.<\/p>\n<p>Input:<br \/>\nWhat is the name of the host for these vacation rental listings in the Box House Hotel?<\/p>\n<p>Output:<br \/>\nThe host name for Box House Hotel (host_id: 417504) appears to be encrypted in<br \/>\nthe database. In the search results, the host_name field for this host is shown<br \/>\nas &#8220;vault:v1:gCMhLikagB3DgHEzpcc0TJbnreYNrS39w9eiHwxUyMOVR5ClyzUaFEnqZC+Ohhk=&#8221;<br \/>\nacross all their listings.<\/p>\n<p>This host appears to manage multiple properties under the Box House Hotel brand in the Greenpoint neighborhood of Brooklyn, with 29 total listings according to the calculated_host_listings_count field.<\/p>\n<p>Convergent encryption ensures the same names of hosts result in the same ciphertext, which allows the LLM to still analyze the data and identify that the same host offers 29 rental properties. If an LLM application needs to respond with the plaintext name of the host, it can use the Vault transit secrets engine to decrypt the ciphertext before returning the response.<\/p>\n<h2>Learn more<\/h2>\n<p>By encrypting sensitive data before augmenting a LLM with RAG, you can protect access to the data and prevent leakage of sensitive information. In this demo, an LLM in testing and other applications by default do not need access to the plaintext name of the host for the vacation rental. They can still analyze and provide other information without leaking the name of the host for each vacation rental listing.<\/p>\n<p>For applications that need access to the plaintext name of the host, they can implement additional code to decrypt the payload with Vault <a href=\"https:\/\/developer.hashicorp.com\/vault\/tutorials\/encryption-as-a-service\/eaas-transit\">transit secrets engine<\/a>. This ensures that only authorized applications have access to Vault\u2019s decryption endpoint to reveal sensitive data. Vault offers additional <a href=\"https:\/\/www.hashicorp.com\/blog\/how-to-choose-a-data-protection-method\">advanced data protection techniques<\/a> such as <a href=\"https:\/\/developer.hashicorp.com\/vault\/tutorials\/encryption-as-a-service\/transform\">format-preserving encryption<\/a>, masking, and <a href=\"https:\/\/developer.hashicorp.com\/vault\/tutorials\/encryption-as-a-service\/tokenization\">data tokenization<\/a> using the <a href=\"https:\/\/developer.hashicorp.com\/vault\/docs\/secrets\/transform\">transform secrets engine<\/a>. The transform secrets engine further masks or sanitizes sensitive information while allowing the LLM and its applications to process data.<\/p>\n<p>To learn more about Amazon Bedrock Knowledge Bases, check out its <a href=\"https:\/\/aws.amazon.com\/bedrock\/knowledge-bases\/\">documentation<\/a>. The OWASP <a href=\"https:\/\/genai.owasp.org\/llm-top-10\/\">Top 10 Risks &amp; Mitigations for LLMs and Gen AI Apps<\/a> includes a list of risks and mitigations for LLMs and Gen AI Apps, including some recommendations to prevent sensitive data disclosure. <\/p>","protected":false},"excerpt":{"rendered":"<p>You can use retrieval augmented generation (RAG) to refine and improve the output of a large language model (LLM) without [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[6],"tags":[],"class_list":["post-1879","post","type-post","status-publish","format-standard","hentry","category-terraform"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/1879","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=1879"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/1879\/revisions"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=1879"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=1879"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=1879"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}