Monitoring Cloud Environments via LLM
TL;DR
- ChatGPT’s analysis of the native AWS CloudTrail logs misunderstood the change and was confused about the meaining of many lines in the log event.
- When provided with the OpsHelm-formatted data, ChatGPT produced more accurate analysis of the events in question, eliminating all observed errors in the test run.
- Given the enhanced context provided by OpsHelm, the LLM was also able to make assertions about additional relevant data fields that were not accessible in the native event.
- The enhanced reliability of this output and ability to correctly identify key factors for further analysis provides a solid foundation for OpsHelm’s forthcoming retrieval-augmented generation (RAG) AI agent.
Intro
AI usage is constantly expanding in the technology industry today and many teams are attempting to integrate AI (mostly Large Language Models, or LLMs, such as OpenAI’s ChatGPT, Google’s Gemini, or Meta’s LLaMA) into all aspects of their workflows to augment staff and enable them to be more productive. Security and operations teams are no different, and so a number have been using LLMs to parse log and configuration data in an effort to seek out configuration and security issues, anomalies, and all manner of other information.
These efforts might integrate with the LLMs in different ways – some organizations will access third-party models via an API, some will run an off-the-shelf model locally, and still others will train their own models. Regardless of the access method, though, the success of these efforts will largely hinge on the ability to give the model access to data that it can understand and use to achieve the organization’s goals.
At OpsHelm we believe that the data we provide is more readily consumable by LLMs than the raw data provided by your cloud provider (in this example, AWS’s CloudTrail), and we wanted to put that thesis to the test.
A conversation with ChatGPT…
Before we embark on anything, let’s ask ChatGPT for its opinion on which is more useful. For this we simply use the GPT-4-turbo
model which is (currently) the default on chatgpt.com. The questions and answers below are shown verbatim and in their complete form with two exceptions:
- In the original prompt the formats were labelled
1
&2
to avoid any bias, I have renamed them back toOpsHelm
andCloudTrail
for clarity - The original prompt also included sample events of each type, which I have removed for brevity.
OpsHelm:
From the perspective of tracking changes in a cloud environment and asking an LLM or ML to process those changes, which format do you prefer?
ChatGPT:
I prefer OpsHelm for tracking changes in a cloud environment when working with an LLM or ML system. Here’s why:
1. Structured Change Tracking (Previous vs. Current State)
- OpsHelm explicitly captures both the previous and current state of the resource, making it much easier for an LLM to determine what changed without additional inference.
- CloudTrail logs an action but does not provide an explicit “before” state, requiring extra correlation with historical logs.
2. Event-Driven Model with Contextual Attribution
- OpsHelm follows an event-driven model (
com.opshelm.asset.delete
), which clearly indicates what happened and why (attribution details).- CloudTrail is more of an API event log that requires additional parsing and enrichment to infer meaningful change information.
3. Better Normalization Across Cloud Providers
- The
OH__Meta
field in OpsHelm provides normalized metadata (resource type, region, etc.), which makes it easier to integrate changes across different cloud providers.- CloudTrail is more AWS-specific, making it harder to generalize.
4. More Compact & Relevant Data
- OpsHelm removes unnecessary details like TLS cipher suites and SDK versions, reducing noise when processing changes.
- CloudTrail includes a lot of operational data that is useful for debugging but not necessarily relevant for tracking cloud asset changes.
5. Easier for LLM Processing
- A structured “diff-like” format (like OpsHelm) allows an LLM to answer queries like:
- What changed in this resource?
- Who initiated the change?
- What attributes were affected?
- Was this an expected modification?
- CloudTrail requires extra processing and comparison with historical logs, increasing complexity.
If the goal is ML-driven cloud change tracking, OpsHelm is superior because it aligns well with structured event processing, making it easier to train models, detect anomalies, and automate responses. 🚀
OpsHelm:
Can you provide any insights with regards to which would be better for training a new model?
ChatGPT:
For training a new ML model to track changes in a cloud environment, OpsHelm is a much better choice due to its structured nature and explicit before/after state representation. Here’s why:
1. Clear Input-Output Mapping (Supervised Learning Ready)
- OpsHelm has a natural “diff” structure:
- The
previous
state acts as input.- The
current
state (ornull
for deletions) is the expected output.- The event type (
com.opshelm.asset.delete
) provides the label.- This makes it ideal for supervised learning, where models can be trained to predict what changed, who changed it, and whether the change was expected or anomalous.
- CloudTrail lacks direct before/after state comparisons, making it harder to define an explicit learning target.
2. Easier Feature Engineering & Labeling
- OpsHelm provides a structured schema where key features like
VpcId
,SubnetId
,OwnerId
, andTagSet
can be easily extracted and transformed into training data.- Since the model learns from structured diffs, it can generalize well to new, unseen changes.
- CloudTrail is more unstructured, requiring log stitching and historical correlation before feature extraction, adding preprocessing overhead.
3. Anomaly Detection & Security Insights
- A model trained on OpsHelm can easily detect unexpected modifications by comparing before/after states.
- Example: If a
VpcId
suddenly changes, the model can flag it as an unexpected event.- Example: If a
TagSet
field is removed, the model can predict whether this is likely an error or misconfiguration.- CloudTrail mostly captures single-point-in-time events, making it harder to infer whether a change is normal or anomalous.
4. Generalization Across Cloud Providers
- OpsHelm normalizes resources via
OH__Meta
, making it easier to train a cross-cloud model (AWS, GCP, Azure, etc.).- CloudTrail is AWS-specific, which means retraining is needed for other cloud platforms.
5. Better for LLM Fine-Tuning
- LLMs trained on OpsHelm can:
- Predict missing context (e.g., if
VpcId
was removed, the model can predict what it might have been).- Detect unexpected attribution (e.g., a change by an unknown principal).
- Generate change summaries (e.g., “Network interface
eni-xyz
was deleted by ECS service”).- CloudTrail requires additional preprocessing before an LLM can extract meaningful patterns.
Conclusion: OpsHelm is the Best Choice For training an ML model, OpsHelm is better due to its structured nature, direct change tracking, and ease of feature extraction. CloudTrail is more suitable for forensic analysis or logging, but would require significant preprocessing to be useful for ML.
Thanks ChatGPT, I don’t think that we could have worded it better ourselves!!
Let’s delete an SSO user for science 👩🔬
Testing this thoroughly would mean either fine tuning or training a model ourselves, something which is beyond the intention of this first foray into testing an LLM with OpsHelm. (We’ll get there later in this blog series!) We can, however, use off the shelf LLMs to obtain a rough gauge of whether this holds true.
If anything, using an off the shelf model will be biased against OpsHelm. AWS logs have been documented on the internet, discussed in forums, and readily available for years and thus these logs and accompanying detailed analyses will most likely feature somewhere in the training data of most LLMs, whereas OpsHelm’s data format will not.
For this test we created and then deleted a test user in an AWS environment that uses another environment for Single Sign On (SSO). In this instance a CloudTrail event is created in the environment that provides the authentication, OpsHelm creates our own event which I read from EventStream.
We then pass each event into a LLM with the prompt (where { ... }
represents the JSON object):
Examine this event, and summarize the change that it represents:
{ ... }
For the sake of transparency, reproducibility, and to satisfy anyone who is curious, the blocks below can be expanded to see the full prompts of each event type.
Full (Redacted) AWS-native Event Prompt
Examine this event, and summarize the change that it represents: { "eventVersion": "1.10", "userIdentity": { "type": "Unknown", "principalId": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "accountId": "123456789012", "userName": "user@domain.com", "onBehalfOf": { "userId": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "identityStoreArn": "arn:aws:identitystore::555666777888:identitystore/d-123456789" }, "credentialId": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" }, "eventTime": "2025-03-26T20:24:50Z", "eventSource": "sso.amazonaws.com", "eventName": "Federate", "awsRegion": "us-west-2", "sourceIPAddress": "192.0.2.47", "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:136.0) Gecko/20100101 Firefox/136.0", "requestParameters": null, "responseElements": null, "requestID": "11111111-2222-3333-4444-555555555555", "eventID": "6666666-7777-8888-9999-000000000000", "readOnly": false, "eventType": "AwsServiceEvent", "managementEvent": true, "recipientAccountId": "111222333444", "serviceEventDetails": { "role_name": "UserRole", "account_id": "555666777888" }, "eventCategory": "Management" }
Full (Redacted) OpsHelm Event Prompt
Examine this event, and summarize the change that it represents: { "specversion": "1.0", "id": "17314159265358979323", "source": "/cloudAccounts/xxxxxxxxxxxxxxxxxxxxxxxxxx", "type": "com.opshelm.asset.delete", "subject": "/cloudAccounts/xxxxxxxxxxxxxxxxxxxxxxxxxx/assets/yyyyyyyyyyyyyyyyyyyyyyyyyy", "datacontenttype": "application/json", "time": "2025-03-26T20:09:29.531095387Z", "data": { "previous": { "resource": { "AccessKeys": [], "Arn": "arn:aws:iam::123456789012:user/test-user", "AttachedPolicies": [], "CreateDate": "2025-03-26T20:08:54Z", "Groups": [ { "Arn": "arn:aws:iam::123456789012:group/test-group", "CreateDate": "2024-11-22T17:55:57Z", "GroupId": "XXXXXXXXXXXXXXXXXXX", "GroupName": "test-group", "Path": "/" } ], "InlinePolicies": null, "LoginProfile": null, "MFAActive": false, "MFADevices": [], "OH__Meta": { "arn": "arn:aws:iam::1112223334444:user/test-user", "region": "us-east-1", "resource_type": "AWS::IAM::User" }, "PasswordLastUsed": null, "PasswordSet": false, "Path": "/", "PermissionsBoundary": null, "Tags": [ { "Key": "testtag", "Value": "testtagvalue" } ], "UserId": "XXXXXXXXXXXXXXXXXXXXX", "UserName": "test-user" }, "attribution": { "IP": "192.0.2.47", "Agent": "Console", "Platform": "AWS", "UserAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:136.0) Gecko/20100101 Firefox/136.0", "Principal": { "Type": "assumed_role", "ID": "AWSReservedSSO_SomeRole_1212121212121212/user@domain.com", "ARN": "arn:aws:sts::1212121212121212121:assumed-role/AWSReservedSSO_DevAdmin_1212121212121212/user@domain.com", "RoleHistory": [], "CloudProvider": "aws" }, "Version": 1 } }, "current": { "resource": null, "attribution": { "IP": "192.0.2.47", "Agent": "Console", "Platform": "AWS", "UserAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:136.0) Gecko/20100101 Firefox/136.0", "Principal": { "Type": "assumed_role", "ID": "AWSReservedSSO_DevAdmin_1212121212121212/user@domain.com", "ARN": "arn:aws:sts::1212121212121212121:assumed-role/AWSReservedSSO_DevAdmin_1212121212121212121/user@domain.com", "RoleHistory": [], "CloudProvider": "aws" }, "Version": 1 } } } }
-
Any of the shown responses where data is
X
’d out represents where we have redacted the response, but the LLM responded with the correct data. These do not represent hallucinations or other incorrect data. -
Responses are complete. We have broken up the content into coherent chunks to make this easier to read with commentary and to compare the output, but nothing has been removed.
-
We passed a variety of different events to the LLM during testing and believe that what follows is a representative result. IAM is several years old and is a fairly fundamental component of any cloud infrastructure and so this feels like a “fair” test, rather than trying to bias towards OpsHelm by choosing a service which is bleeding edge and likely to fall outside of an LLM’s training.
What kind of event are we dealing with?
- The LLM correctly identified that the
Federate
events are related to authentication in the CloudTrail formatted event.- … however it incorrectly assumed that it was authentication rather than deletion of a user…
- … despite correctly identifying that this is a management/administrative event.
- The correct event type was identified in the OpsHelm event. OpsHelm events are normalized so that
create
,update
, anddelete
events each have a single type.
ChatGPT (Cloudtrail):
- Event Type:
Federate
(Indicates an AWS SSO authentication event)- Management Event:
true
(This is an administrative action)
ChatGPT (OpsHelm):
- Event Type:
com.opshelm.asset.delete
(Indicates a deletion event)
Who was deleted?
- The CloudTrail format did not yield any summarization regarding the deleted user.
- The OpsHelm format caused identification of the user, their associated groups, and tags.
- The OpsHelm format also allowed the LLM to identify that the user had no access keys setup and did not have Multi-Factor Authentication (MFA) enabled. This is due to OpsHelm events including the previous and current states of an asset.
ChatGPT (OpsHelm):
- Deleted Resource:
- IAM User:
test-user
- ARN:
arn:aws:iam::XXXXXXXXXXXX:user/test-user
- User ID:
XXXXXXXXXXXXXX
- Associated Group:
test-group
(User was part of this IAM group)- Tags:
testtag: testkey
Comparison of
previous
vscurrent
:
- Previous State:
- The IAM user existed, had no access keys, was not MFA-enabled, and was part of
test-group
.- Current State:
- The user no longer exists (
resource
isnull
).
Who made the change?
-
Both responses correctly identified the user who performed the action, the role that the user assumed to perform the action, and the account which account the event occured in.
-
Both responses identified the browser via user agent and the IP address the user originated from, however:
- For the CloudTrail event, the IP address and UserAgent are misattributed to being for authentication. Presumably because of the earlier mischaracterization of the event as an authentication event.
- For the OpsHelm event these are correctly labelled.
-
For the OpsHelm event the
Performed via
context is additionally included as we provide event enrichment which makes a determination as to if changes were made via console, CLI, orchestration, internal AWS processes, etc. -
Both responses include an ARN in the attribution information. Unfortunately, for the CloudTrail event, it has confusingly included the Identity Store ARN (which relates to the user that was deleted) as part of the attribution information. However, in the case of OpsHelm, ChatGPT correctly included the ARN of the Principal, the user who performed the action.
ChatGPT (Cloudtrail):
- User Identity:
- User:
XXXXXX@opshelm.com
- Identity Store ARN:
arn:aws:identitystore::XXXXXXXXXXXX:identitystore/d-123456789
- Account ID:
XXXXXXXXXXXX
- Target Role & Account:
- Role Assumed:
TestRole
- Account:
XXXXXXXXXXXX
- Source of Authentication:
- IP Address:
XXX.XXX.XXX.XXX
- User Agent: Firefox 136.0 on macOS
- Event Source:
sso.amazonaws.com
(AWS Single Sign-On)
ChatGPT (OpsHelm):
- Attribution (Who performed the action?):
- Assumed Role:
AWSReservedSSO_TestRole_XXXXXXXXXXXXXX
- Principal ARN:
arn:aws:sts::XXXXXXXXXXXX:assumed-role/AWSReservedSSO_TestRole_XXXXXXXXXXXXXX/XXXXXX@opshelm.com
- Performed via: AWS Console
- Source IP:
XXX.XXX.XXX.XXX
- Browser: Firefox 136.0 on macOS
Summary
- Both prompts produced a summary, each of which represents what ChatGPT thought was the key information overall for this event. This is included for completeness:
ChatGPT (Cloudtrail):
This event confirms that
XXXXX@opshelm.com
used AWS IAM Identity Center (SSO) to assume the “XXXXXXXXXXXX” role in AWS accountXXXXXXXXXXXX
, authenticating from IPXXX.XXX.XXX.XXX
via Firefox.
ChatGPT (OpsHelm):
This event confirms that an IAM user (
test-user
) was deleted from AWS IAM by a user with AWS SSO DevAdmin permissions via the AWS Console.
Conclusion
As we hope you agree, this test seems to indicate that an untrained, off-the-shelf LLM, with a simple one sentance prompt was able to make better use of the OpsHelm-formatted event than the raw CloudTrail event.
We believe that it would be relatively simple to widen the gap between the responses if we were to enhance the prompt with information pertaining to the OpsHelm event structure or guide an LLM to pay particular attention to specific fields. (Again, that’s a topic for a future post!)
Of course, we are not saying that the additional information that was available via the OpsHelm event is impossible to obtain, either via pre-processing of the data or via retrieval-augmented generation (RAG) techniques. However, doing so poses a substantial uplift in the complexity of a solution as it pertains to gathering, aggregating, correlating, and redistributing such events. If someone is using data to train a model, or indeed pass it to an existing model, they are most likely looking to reduce labour and increase automation, not the opposite.