Apache logs 2021

Lord777

Professional
Messages
2,581
Reputation
15
Reaction score
1,322
Points
113
Types and modules of logs. Apache access log format.

Table of contents
1. Types and modules of journals. Apache access log format[/B]
1.1 Types of Apache logs
1.2 Apache log modules
1.3 Mod_log_config module
1.4 Access Log
1.5 How to customize the Apache log format. Custom Log Formats
1.6 BufferedLogs Directive
1.7 CustomLog Directive
1.8 GlobalLog Directive
1.9 LogFormat Directive
1.10 TransferLog Directive
1.11 Apache Log Formats
1.11.1 Common Log Format
1.11.2 Combined Log Format
1.11.3 Multiple Access Logs (Multiple Access Logs)
1.11.4 Conditional Logs
1.12 Rotating Logs
1.13 Piped Logs
1.14 Virtual Hosts
1.15 Security Issues
2. Format of error logs. Module event log
3. Programs for analyzing Apache logs
4. Forensic logs
5. Additional configurable debug logs. CGI script execution logs

Note: this series of articles is devoted to the log files (logs) of the Apache web server, their configuration, format, commands, and also special programs for analyzing the web server logs are considered. The information is presented in detail for a deep study of the topic, as well as for use as a reference material. For more concise information, it is recommended that you refer to the article "Apache log (logs): how to configure and analyze web server logs".

To effectively manage your web server, you need to get feedback on the activity and performance of the server, as well as any issues that may arise. Apache HTTP Server provides very rich and flexible logging capabilities.

Apache HTTP Server provides many different mechanisms for logging everything that happens to your server, from the initial request and the URL mapping process, to the final resolution of the connection, including any errors that may have occurred in the process. In addition to this, third-party modules can provide logging capabilities or insert records into existing log files, and applications such as CGI programs, PHP scripts, or other handlers can also send messages to the server error log.

The web server logs contain tons of interesting information! Using the server access logs, you can compose a collective portrait of the audience: in which countries and cities they live, what operating systems they use, which browsers they view the site, what time they are most active, from which sites they came to you, which search engines prefer, how many pages are viewed for every visit to the site. And no less important are logs for monitoring the state of the web server and sites: which pages were not found, web server errors, the degree of congestion, detecting bot activity, detecting malicious activity, searching for traces of hacking, identifying hacking paths.

In general, server access logs should be understood, configured, and used primarily by the webmasters and system administrators serving the server. At the same time, an attacker, or someone who is investigating the consequences of an attacker's actions, also needs to understand what exactly is saved in the web server logs, how they can benefit from them, or how to disguise their traces, or how to analyze access log files for searching. problems, attacks and traces of hacking.

The first part is about configuring the format of the logs in the Apache web server. This information will be useful to you even if you do not have sites and you are not engaged in maintaining a web server. You will need it in order to be able to analyze web logs using tools (as a rule, they need to specify the format of the analyzed log, for which the same specifiers are used as in the Apache config).

Types of Apache logs
Different types of Apache logs are managed by different web server modules and have different control directives and the ability to specify the format of the log string.

The following types of Apache web server logs are available:
  • Error Log
  • Per-module logging
  • Access Log
  • Additional configurable debug logging
  • Forensic (forensic logs)
  • CGI script execution logs
The following is a brief description of each type of journal, and then in this first part, the Access Journal will be discussed in detail. In subsequent parts, other types of logs will be discussed in detail.

Error Log
The server error log is the most important log file. This is where Apache httpd will send diagnostic information and record any errors it encounters while processing requests. This is the first place to look when a server startup or server problem occurs, as it often contains details of what went wrong and how to fix it.

Per-module logging
The LogLevel directive allows you to specify the log severity level for each module. Thus, if you are troubleshooting a problem with only one specific module, you can increase its size in the log, while not getting unnecessary information about other modules that you are not interested in. This is especially useful for modules like mod_proxy or mod_rewrite where you want to know the details of what it is trying to do.

Access Log
The server access log records all requests processed by the server.

Additional configurable debug logging
This directive causes a custom message to be logged in the error log. The message can use variables and functions from the ap_expr syntax. References to HTTP headers do not result in header names being added to the Vary header. Messages are logged at the log level.

Forensic (forensic logs)
Logging is done before and after the request is processed, so the forensic journal contains two journal lines for each request. Differs in increased severity.

CGI script execution logs
If ScriptLog is not specified, no error log is generated. If ScriptLog is set, then any CGI errors are logged in the file specified as an argument.

Apache log modules
Apache has several modules that are responsible for web logs:
  • mod_log_config. Keeps a log of requests made to the server. This is the main module, which is enabled by default and it is he who stores information about requests. Basically, here we will consider this particular module and its settings. Provides the Access Log.
  • mod_log_debug. Additional configurable debug logs. Enables Additional configurable debug logging. Has experimental status.
  • mod_log_forensic. Forensic registration of requests to the server. Provides Forensic (forensic logs).
  • mod_logio. Registration of input and output bytes of each request. This module must be included in the Apache configuration if you want to log information about the amount of transmitted and / or received data. Provides the functionality of the Access Log format.
  • Apache Core Features - Apache HTTP Server core features that are always available. Including provides the operation of Error Log (error log) and Per-module logging (logging of module events).
  • mod_cgi and mod_cgid. Provides the operation of the CGI script execution log.
Mod_log_config module
The mod_log_config module provides flexible logging of client requests. Logs are written in a customizable format and can be written directly to a file or to an external program. Conditional logging is provided, that is, individual queries can be included or excluded from the log based on the characteristics of the query. This module is key to making the Access Log work .

This module supports the following directives:
  • TransferLog to create a log file,
  • LogFormat to set a custom format. This directive is followed by a log format string, as well as a name that can be used as an alias for this string. After setting an alias with this directive, it can be specified in CustomLog.
  • CustomLog to define log file and format in one step. Specifies how the log file is saved (for example, to a file) and the format to use. The format can be either an alias set using the LogFormat directive, or a format string.
  • BufferedLogs. Keep log entries in memory before writing to disk
  • GlobalLog. Sets the file name and format of the log file
The TransferLog and CustomLog directives on each server can be used multiple times so that each request is logged in multiple files.

Access Log
The server access log records all requests processed by the server. The location and contents of the access log are controlled by the CustomLog directive. The LogFormat directive can be used to simplify the selection of log content. This section describes how to configure the server to write information to the access log.

Of course, keeping information in the access log is just the beginning of log management. The next step is to analyze this information to obtain useful statistics. The analysis of logs in general is not part of the work of the web server itself, but will be discussed in a subsequent article in this series.

Various versions of Apache httpd used different modules and directives to manage the access log, including mod_log_referer, mod_log_agent, and the TransferLog directive. The CustomLog directive now includes the functionality of all the old directives.

The access log format is highly customizable. The format is specified using a format string, which is very similar to the C-style printf (1) format string. Some examples are provided below. For a complete list of the possible contents of the format string, see the next section, How to Customize the Format of Apache Access Logs. Custom log formats”.

That is, from a practical point of view, the Access Log is the same as mod_log_config, since it is this module that provides the Access Log functionality. Additionally Access Log uses mod_logio and mod_setenvif modules to extend functionality. For example, the mod_logio module allows you to log the exact size of transmitted and / or received data during user request and response.

Since they are one and the same, the directives for Access Log and mod_log_config are the same. Further information in this section pertains to the Access Log and mod_log_config.

uS5K8Wvl0bw.jpg


How to customize the format of Apache access logs. Custom log formats
The format argument for the LogFormat and CustomLog directives is a string. Based on this line, a log file will be generated for each request. This line can contain literal characters to be copied to the log files as they are, and C-style control characters "\ n" and "\ t" to write new-line and tab characters. Literal quotes and backslashes must be escaped with a backslash ( \ ).

The various characteristics of a query are denoted by lines that begin with a "%" character. In the log file, these lines will be replaced with the following values:

%% - A literal percent sign.

% a - Client IP address of the request (see also mod_remoteip module).

% {c} a - The underlying IP address of the connection (see mod_remoteip).

% A - Local IP address.

% B - The size of the response in bytes, excluding HTTP headers.

% b - The size of the response in bytes, excluding HTTP headers. In CLF format, that is, when bats are not sent, it will be '-', not 0.

% {VARNAME} C - Content of the VARNAME cookie in the request sent to the server. Only version 0 cookies are fully supported.

% D - Time taken to process the request, in microseconds. See % T for more details .

% {VARNAME} e - Content of the VARNAME environment variable.

% f - File name.

% h - The name of the remote host. Will record the IP address if HostnameLookups is set to Off, this is the default. If you are only registering a hostname for a few hosts, you may have access control directives that refer to them by name. See the Require host documentation. This format is affected by modifications to the remote hostname by modules such as mod_remoteip.

% {c} h - Like % h, but always reports the hostname of the underlying TCP connection, not any modifications to the remote hostname by modules such as mod_remoteip.

% H - Request protocol.

% {VARNAME} i - Content VARNAME: header line (s) in the request sent to the server. Changes made by other modules (like mod_headers) affect this. If you are wondering what the request header was before most modules would change it, use mod_setenvif to copy the header into an internal environment variable and log the value of % {VARNAME} e above. Examples of such variables: % {Referer} i (referrer), % {User-agent} i (user agent, browser).

% k - The number of keepalive requests processed for this connection. I wonder if KeepAlive is used, for example, "1" means the first keepalive request after the original, "2" means the second, and so on; otherwise, it is always 0 (indicating the initial request).

% l - The name of the remote log (from identd, if any). This will return a dash if mod_ident is not present and IdentityCheck is not set to On.

% L - The ID of the query log from the error log (or "-" if nothing was logged in the error log for this query). Search for the corresponding error log line to see which query caused which error.

% {c} L - Connection log identifier from the error log (or "-" if nothing is written to the error log for this request). Search for the corresponding error log line to see which query caused which error.

% m - Request method.

% {VARNAME} n - Content of the VARNAME from another module.

% {VARNAME} o - Content VARNAME: header lines in the response.

% p - The canonical port of the server serving the request.

% {format} p - The canonical port of the server serving the request, or the actual port of the server, or the actual port of the client. Valid formats are canonical, local, or remote .

% P - The ID of the child process that served the request.

% {format} P - Process ID or child thread ID that serviced the request. Valid formats are pid, tid, and hextid. hextid requires APR 1.2.0 or higher.

% q - Query string (prefixed with ? if query string exists, otherwise empty string).

% r - First line of the query

% R - The handler that generates the response (if any).

% s - Status. For requests that were internally redirected, this is the status of the original request. Use %> s for final status.

% t - Time when the request was received in the format [18 / Sep / 2011: 19: 18: 28 -0400]. The last number indicates the time zone offset from GMT

% {format} t - Time in the form specified by the format, which should be in extended strftime (3) format (possibly localized). If the format starts with begin: (default) time is taken at the beginning of request processing. If it starts with end: this is the logging time, near the end of the request processing. In addition to the formats supported by strftime (3), the following format markers are supported:
sec number of seconds since the beginning of the Age
msec number of milliseconds since the beginning of the Age
usec microseconds since the beginning of the Age
msec_frac fractions of milliseconds
usec_frac fractions of microseconds
These tokens on the same format string cannot be combined with each other or with strftime (3) formatting. You can use multiple % {format} t tokens instead .

Example: % {% d /% b /% Y% T} t.% {Msec_frac} t% {% z} t

% T
- Time taken to service the request, in seconds. The measured time starts when the HTTP server reads the first line of the HTTP request from the host operating system and ends when the last byte of the response is written by the HTTP server to the host operating system.

Measured time does not include any of the following:
  • Time spent on TCP or TLS handshakes.
  • Time before the web server thread can read the first line of the request.
  • Delays in the issuance of response data by the operating system to the network.
  • The time it takes to receive a response at the client's host.
  • The time taken by the user agent to read and process the response.
% {UNIT} T - Time spent serving the request, in units of time specified by UNIT. Valid units are ms for milliseconds, us for microseconds, and s for seconds. Using s gives the same result as % T without any format; Use gives us the same result as the % D. Combining % T with a unit is available in 2.4.13 and later.

% u - Remote user if the request was authenticated. May be bogus if return status (% s) is 401 (not authorized).

% U - The requested URL path, not including the query string.

% v - The canonical ServerName of the server serving the request.

% V - Server name according to the UseCanonicalName setting.

% X - The state of the connection when the response is complete:

X = The connection was terminated before the answer was completed.

+ = The connection can remain active after sending a response.

- = The connection will be closed after sending the response.
% I - Bytes received, including request and headers. There cannot be zero. You must enable mod_logio to use this.

% O - Bytes sent, including headers. It can be zero in rare cases, for example, when the request is interrupted before sending a response. You must enable mod_logio to use this.

% S - Transmitted (received and sent) bytes, including request and headers, cannot be zero. It is a combination of% I and% O. You must enable mod_logio to use this.

% {VARNAME} ^ ti - VARNAME content: trailer strings in the request sent to the server.

% {VARNAME} ^ to - Content of VARNAME: trailer strings in the request sent from the server.

Modifiers
Individual items can be restricted to print only for responses with specific HTTP status codes by placing a comma-separated list of status codes immediately after the " % ". The status code list may be preceded by " ! " To indicate negation.

The "< "and" >" modifiers are used to choose whether to write the original or final query. This can be used for requests that have been redirected internally. By default, the % s, % U, % T, % D, and % r directives look at the original request and everyone else looks at the final request. So, for example, %> s can be used to record the final state of a request, and % <u can be used to record the original authenticated user on a request internally redirected to an unauthenticated resource.

Format Notes
For security reasons, since version 2.0.46, non-printable and other special characters in % r, % i, and % o are escaped using \ xhh sequences, where hh denotes the hexadecimal representation of the raw byte. The exceptions to this rule are " and \, which are escaped by adding a backslash and any whitespace characters that are written using C-style notation (\ n, \ t, etc.). In versions prior to 2.0.46, no escaping has been done for these lines, so you have to be careful enough when working with raw log files in these versions.

Because in httpd 2.0, unlike 1.3, the % b and % B format strings do not represent the number of bytes sent to the client, but simply the size in bytes of the HTTP response (which will differ, for example, if the connection is interrupted) or if SSL is used). The % O format provided by mod_logio logs the actual number of bytes sent over the network.

Note: mod_cache is implemented as a fast handler, not a standard handler. Therefore, the % R format string will not return any handler information when content caching is enabled.

Note. The "^" at the beginning of 3-character formats is irrelevant, but must be the first character of any newly added 3-character format to avoid potential conflicts with log formats that use literal strings adjacent to the format specifier, such as "% Dus".

Examples of

Some commonly used log format strings:
  • Common Log Format (CLF) - common log format
Code:
"% h% l% u% t \"% r \ "%> s% b"
  • Common Log Format with Virtual Host - common log format with virtual hosts
Code:
"% v% h% l% u% t \"% r \ "%> s% b"
  • NCSA extended / combined log format - NCSA extended / combined log format
Code:
"% h% l% u% t \"% r \ "%> s% b \"% {Referer} i \ "\"% {User-agent} i \ ""
  • Referer log format - Referer record format
Code:
"% {Referer} i ->% U"
  • Agent (Browser) log format - the format of the user agent (browser) log
Code:
"% {User-agent} i"

You can use the % {format} t directive multiple times to build the time format using extended format markers such as msec_frac:
  • Timestamp including milliseconds:
Code:
1"%{%d/%b/%Y %T}t.%{msec_frac}t %{%z}t"

BufferedLogs directive
Description: Preserves log entries in memory before writing them to disk.

Syntax:
Code:
BufferedLogs On | Off
Default value:

BufferedLogs Off
Context: server config

The BufferedLogs directive forces mod_log_config to keep multiple log entries in memory and write them together to disk rather than writing them after each request. On some systems, this can lead to more efficient disk access and therefore higher performance. It can only be installed once for the entire server; it cannot be configured for every virtual host.

This directive should be used with caution because a failure can result in loss of log data.

CustomLog directive
Description: Sets the file name and format of the log file.

Syntax:
Code:
CustomLog file | pipe | provider format | nickname [env = [!] Environment-variable | expr = expression]
Context: server config, virtual hosts

The CustomLog directive is used to register requests to the server. The format of the log, the method of logging is indicated, here you can also specify a condition based on the characteristics of the query using environment variables at which the log will be written.

The first argument, which specifies the location where the logs will be written, can take one of the following three value types:

file
The file name is relative to ServerRoot.

pipe
The pipe symbol "|" followed by the path to the program that will receive the log entries on its standard input. See the Piped Logs section below for more information.

Security: if the program is used, it will be run as the user who launched httpd. This will be root if the server was started as root; make sure the program is safe.

Note: When entering a file path on non-Unix platforms, be careful to only use forward slashes, even though the platform may allow backslashes. As a general rule, it is recommended to always use forward slashes in configuration files.

provider

Modules that implement ErrorLog providers can also be used as targets for CustomLog messages. To use the ErrorLog provider as a target, you must use the "provider: argument" syntax. For example, you can use mod_journald or mod_syslog as the provider:

Code:
# Logging CustomLog to journald
CustomLog "journald" "% h% l% u% t \"% r \ "%> s% b"

# Log CustomLog to syslog with "user" object
CustomLog "syslog: user" "% h% l% u% t \"% r \ "%> s% b"

The second argument specifies what will be written to the log file. It can either specify an alias defined by the previous LogFormat directive, or it can be an explicit format string, as described in the section How to customize the Apache log format. Custom log formats”.

For example, the following two sets of directives have exactly the same effect:

1.
Code:
# CustomLog specifying the format alias
LogFormat "% h% l% u% t \"% r \ "%> s% b" common
CustomLog "logs / access_log" common

2.
Code:
# CustomLog with explicit format string
CustomLog "logs / access_log" "% h% l% u% t \"% r \ "%> s% b"

The third argument is optional and determines whether or not to log a specific request. The condition can be the presence or absence (in the case of the 'env =! Name' clause ) of a certain variable in the server environment. Alternatively, the condition can be expressed as an arbitrary boolean expression. If the condition is not met, the request will not be registered. References to HTTP headers in an expression do not result in header names being added to the Vary header.

Environment variables can be set on a per-request basis using the mod_setenvif and / or mod_rewrite modules. For example, if you want to log requests for all GIF images on your server to a separate log file, but not to your main log, you can use:
Code:
SetEnvIf Request_URI \ .gif $ gif-image
CustomLog "gif-requests.log" common env = gif-image
CustomLog "nongif-requests.log" common env =! Gif-image

Or, to reproduce the behavior of the old RefererIgnore directive, you can use the following:
Code:
SetEnvIf Referer example \ .com localreferer
CustomLog "referer.log" referer env =! Localreferer
GlobalLog directive
Description: Sets the file name and format of the log file.

Syntax:
Code:
GlobalLog file | pipe | provider format | nickname [env = [!] Environment-variable | expr = expression]
Context: server config

Compatibility: Available in Apache HTTP Server 2.4.19 and later.

The GlobalLog directive defines a log that is common to the configuration of the main server and all configured virtual hosts.

The GlobalLog directive is identical to the CustomLog directive, with the following differences:
  • GlobalLog is not valid in the context of a virtual host.
  • GlobalLog is used by virtual hosts that define their own CustomLog, not a globally defined CustomLog.

LogFormat directive
Description: Describes the format for use in the log file.

Syntax:
Code:
LogFormat format | alias [alias]

Default value:
Code:
LogFormat "% h% l% u% t \"% r \ "%> s% b"
Context: server config, virtual hosts.

This directive defines the format of the access log file.

The LogFormat directive can take one of two forms. In the first form, where only one argument is specified, this directive sets the log format to be used by the logs specified in subsequent TransferLog directives. One argument can specify an explicit format, as discussed in the section on custom log formats above. In addition, it can use an alias to refer to the log format defined in the previous LogFormat directive, as described below.

The second form of the LogFormat directive associates an explicit format with an alias. This alias can then be used in subsequent LogFormat or CustomLog directives rather than repeating the entire format string. The LogFormat directive defining the alias does nothing else, that is, it only defines the alias, does not actually apply the format and does not set it by default. Hence, it will not affect subsequent TransferLog directives. In addition, LogFormat cannot use one alias to define another alias. Note that the alias must not contain percent signs (%).

Example:
Code:
LogFormat "% v% h% l% u% t \"% r \ "%> s% b" vhost_common
TransferLog directive
Description: Specifies the location of the log file.

Syntax:
Code:
TransferLog file | pipe
Context: server config, virtual hosts

This directive has the same arguments and effect as the CustomLog directive, except that it does not allow you to explicitly specify the log format or log queries based on conditions. Instead, the log format is determined by the most recently specified LogFormat directive, which does not define an alias. The general log format is used if no other format is specified.

Example:
Code:
LogFormat "% h% l% u% t \"% r \ "%> s% b \"% {Referer} i \ "\"% {User-agent} i \ ""
TransferLog "logs / access_log"

Apache log formats
Common Log Format
A typical configuration for an access log might look like the following.
Code:
LogFormat "% h% l% u% t \"% r \ "%> s% b" common
CustomLog "logs / access_log" common
It sets an alias to common and associates it with a specific log format string. The format string consists of percent-sign directives, each of which tells the server to register a specific piece of information. Literal characters can also be placed in the format string and will be copied directly to the log output. The quote character ( " ) must be escaped by placing a backslash in front of it so that it is not interpreted as the end of the format string. The format string can also contain the special control characters" \ n "for newlines and" \ t "for tabs.

The CustomLog directive sets up a new log file using a specific alias. The file name for the access log is relative to ServerRoot unless it starts with a forward slash.

The above configuration will write log entries in a format known as the Common Log Format (CLF). This standard format can be generated by many different web servers and read by many log analysis programs. The log file entries generated in CLF will look something like this:
Code:
95.152.63.100 - frank [18 / Aug / 2019: 08: 58: 34 +0300] "GET / ru /? Act = myip HTTP / 1.1" 200 25858
Each part of this log entry is described below.

95.152.63.100 (% h)

This is the IP address of the client (remote host) that made the request to the server. If HostnameLookups is set to On, the server will try to determine the hostname and write it instead of the IP address. However, this configuration is not recommended because it can significantly slow down the server. Instead, it is best to use a log post processor such as logresolve to resolve hostnames. The IP address specified here is not necessarily the address of the machine the user is on. If a proxy server exists between the user and the server, this address will be the proxy address, not the original machine.

- (% l)

A hyphen in the output indicates that the requested piece of information is not available. In this case, the information that is not available is the RFC 1413 client credential identified using identd on the client computer. This information is highly unreliable and should almost never be used except on tightly controlled internal networks. Apache httpd won't even try to determine this information unless IdentityCheck is set to On .

frank (% u)

This is the identifier of the user requesting the document, as determined by HTTP authentication. The same value is usually provided to CGI scripts in the REMOTE_USER environment variable. If the status code for the request is 401, then this value should not be trusted because the user is not yet authenticated. If the document is not password protected, this part will be "-" like the previous one.

[18 / Aug / 2019: 08: 58: 34 +0300] (% t)

The time the request was received. The format is:
Code:
[day / month / year: hour: minute: second zone]
day = 2 * digits
month = 3 * letters
year = 4 * digits
hour = 2 * digits
minute = 2 * digits
second = 2 * digits
zone = (`+ '|` -') 4 * digits

You can display the time in a different format by specifying % {format} t in the log format string, where the format is the same as in strftime (3) from the C standard library, or one of the supported special markers. For details, see the section “How to customize the Apache log format. Custom log formats ”.

"GET / ru /? Act = myip HTTP / 1.1" (\ "% r \")

The request string from the client, specified in double quotes. The query string contains a lot of useful information. First, the client uses the GET method. Second, the client requested the resource / ru /? Act = myip, and third, the client was using the HTTP / 1.1 protocol. It is also possible to register one or more parts of the query string independently. For example, the format string " % m% U% q% H " will log the method, path, query string, and protocol, resulting in exactly the same output as " % r ".

200 (%> s)

This is the status code that the server sends back to the client. This information is very valuable because it shows whether the request resulted in a successful response (codes start with 2), a redirect (codes start with 3), an error caused by the client (codes start with 4), or errors on the server (codes start with 5). A complete list of possible status codes can be found in the HTTP specification (RFC2616 section 10).

25858 (% b)

The last part indicates the size of the object returned to the client, not including the response headers. If content has not been returned to the client, this value will be "-". To write "0" when there is no content, use % Bed and .

Combined Log Format
Another commonly used format string is called the Combined Log Format. It can be used as follows.

Code:
LogFormat "% h% l% u% t \"% r \ "%> s% b \"% {Referer} i \ "\"% {User-agent} i \ "" combined
CustomLog "log / access_log" combined

This format is exactly the same as the Common Log Format, with the addition of two more fields. Each of the additional fields uses a% {header} i percentage directive, where header can be any HTTP request header. The access log in this format will look like this:
Code:
2a02: 2168: a13: 430b :: 1 - - [18 / Aug / 2019: 09: 38: 53 +0300] "POST / ru /? Act = locatepicture HTTP / 1.1" 200 25627 "https://suip.biz / ru /? act = locatepicture "" Mozilla / 5.0 (Windows NT 6.1; Win64; x64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 76.0.3809.100 Safari / 537.36 "
Please note that the IP address can also be IPv6 as in the example above.

Additional fields are:
"https://suip.biz/en/?act=locatepicture" (\ "% {Referer} i \")

This is the HTTP header of the "Referer" request. In this line, the client tells the site from which site and which page he came from (this should be the page on which the link to the requested address is posted, or the page that includes the requested file (for example, an image).

"Mozilla / 5.0 (Windows NT 6.1; Win64; x64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 76.0.3809.100 Safari / 537.36" (\ "% {User-agent} i \")

User-Agent HTTP request header. This is the identifying information that the client browser communicates about itself.

Multiple Access Logs
Multiple access logs can be created simply by specifying a few CustomLog directives in the configuration file. For example, the following directives will create three access logs. The first contains the basic CLF information, while the second and third contain the referrer and browser information. The last two lines of CustomLog show how to simulate the effects of the ReferLog and AgentLog directives.
Code:
LogFormat "% h% l% u% t \"% r \ "%> s% b" common
CustomLog "logs / access_log" common
CustomLog "logs / referer_log" "% {Referer} i ->% U"
CustomLog "logs / agent_log" "% {User-agent} i"

This example also shows that there is no need to define an alias using the LogFormat directive. Instead, the log format can be specified directly in the CustomLog directive.

Conditional Logs
There are times when it is convenient to exclude certain entries from the access logs based on the characteristics of the client request. It's easy to do this with environment variables. First, you need to set an environment variable to indicate that the request meets certain conditions. This is usually achieved with SetEnvIf. The env = clause of the CustomLog directive is then used to include or exclude requests in which the environment variable is set. Some examples:
Code:
# Flag requests from the loop-back interface
SetEnvIf Remote_Addr "127 \ .0 \ .0 \ .1" dontlog
# Flag requests for robots.txt file
SetEnvIf Request_URI "^ / robots \ .txt $" dontlog
# Write down what's left
CustomLog "logs / access_log" common env =! Dontlog

As another example, consider writing requests from English-speaking users to one log file and non-English speakers to a different log file.
Code:
SetEnvIf Accept-Language "en" english
CustomLog "logs / english_log" common env = english
CustomLog "logs / non_english_log" common env =! English

In a caching scenario, I would like to know about the efficiency of the cache. A very simple way to find out would be:
Code:
SetEnv CACHE_MISS 1
LogFormat "% h% l% u% t"% r "%> s% b% {CACHE_MISS} e" common-cache
CustomLog "logs / access_log" common-cache
mod_cache will run before mod_env and, if successful, will deliver content without it. In this case, the cache will lead to the appearance of the entry -, and if there is no cache, then 1 will be written.

In addition to the env = syntax, LogFormat supports variable registration values depending on the HTTP response code:
Code:
LogFormat "% 400,501 {User-agent} i" browserlog
LogFormat "%! 200,304,302 {Referer} i" refererlog

In the first example, the User-agent will be logged if the HTTP status code is 400 or 501. Otherwise, the literal string “-” will be written instead. Likewise, in the second example, the Referer will be logged if the HTTP status code is not 200, 204, or 302. (Note the " ! " In front of the status codes).

While we have just shown that conditional logging is very powerful and flexible, it is not the only way to manage the content of the logs. Log files are more useful when they contain a complete record of server activity. In most cases, it is easier to simply process the complete log files to extract only the data you need from them, or to remove certain information.

Rotation of logs
Even on a moderately busy server, the amount of information stored in the log files is very large. The access log file typically grows 1 MB or more for 10,000 requests. Therefore, it is necessary to periodically rotate the log files by moving or deleting existing logs. This cannot be done while the server is running because Apache httpd will continue to write to the old log file as long as it keeps the file open. Instead, the server must be restarted after moving or deleting log files to open the new log files.

By using a graceful restart, the server can be instructed to open new log files without losing existing or pending connections from clients. However, to do this, the server must continue to write to the old log files while it finishes serving old requests. Therefore, you must wait a while after the restart before doing any processing on the log files. A typical scenario that just rotates logs and shrinks old logs to save space:
Code:
mv access_log access_log.old
mv error_log error_log.old
apachectl graceful
sleep 600
gzip access_log.old error_log.old

Another way to perform log rotation is by using pipelining, as described in the next section.

Pipelining (Piped Logs)
Apache httpd is capable of writing access and error log files down a pipe (through a pipe) to another process, not directly to a file. This capability greatly improves the flexibility of logging without adding code to the core server. To write logs to a pipe, simply replace the filename with the pipe character "|" followed by the name of the executable file that should receive log entries on its standard input. The server will start the piped-log process on server startup and restart it if it crashes while the server is running (this latter feature allows us to call this technique "reliable pipe logging".)

Pipeline processes are spawned by the parent Apache httpd process and inherit the user ID of that process. This means that pipelined log programs are usually run as root. Therefore, it is very important that the programs are simple and safe.

One of the important uses of pipelined logs is to allow log rotation without rebooting the server. The Apache HTTP Server includes a simple rotatelogs program for this purpose. For example, to rotate logs every 24 hours, you can use:

CustomLog "| / usr / local / apache / bin / rotatelogs / var / log / access_log 86400" common
Note that quotes are used to include the entire command that will be invoked for the pipe. While these examples are for the access log, the same method can be used for the error log.

As with conditional logging, piped logs are very powerful, but should not be used where a simpler solution such as offline post-processing is available.

By default, the piped log process is spawned without invoking the shell. Use "| $" instead of "|" to run with a shell (usually with / bin / sh -c):

Code:
# Call "rotatelogs" using the shell
CustomLog "| $ / usr / local / apache / bin / rotatelogs / var / log / access_log 86400" common
This was the default behavior for Apache 2.2. Depending on the specifics of the shell, this can lead to an extra shell process for the lifetime of the log pipe program and problems with signal handling on restart. For Apache 2.2 compatibility reasons, the notation "||" also supported and equivalent to using "|".

Note for Windows: Please note that on Windows you may run into problems when starting many logging processes, especially when HTTPD is running as a service. This is caused by the lack of desktop heap space. The desktop heap space provided by each service is specified by the third argument of the SharedSection parameter in the HKEY_LOCAL_MACHINE \ System \ CurrentControlSet \ Control \ SessionManager \ SubSystems \ Windows registry value. Change this value with care; The usual caveats for modifying the Windows registry apply, but you can also deplete the desktop heap space if the number is set too high.

Virtual Hosts
When starting a server with many virtual hosts, there are several options for handling the log files. First, it is possible to use logs in the same way as on a single-host server. By simply placing the logging directives outside of the <VirtualHost> sections in the main server context, you can log all requests in a single access log and error log. This method does not make it easy to collect statistics on individual virtual hosts.

If a CustomLog or ErrorLog directive is placed in the <VirtualHost> section, all requests or errors for that virtual host will only be written to the specified file. Any virtual host that does not have logging directives will still send its requests to the core server logs. This method is very useful for a small number of virtual hosts, but if the number of hosts is very large, it can be difficult to manage. In addition, it can often create problems with insufficient file descriptors.

There is a very good compromise for the access log. By adding virtual host information to the log format string, you can log all hosts to a single log and then split the log into separate files. For example, consider the following directives.

Code:
LogFormat "% v% l% u% t \"% r \ "%> s% b" comonvhost
CustomLog "logs / access_log" comonvhost
% v is used to log the name of the virtual host that is serving the request. A program such as split-logfile can then be used to post-process the access log to split it into one file for each virtual host.

Security questions
Anyone who can write to the directory where Apache httpd writes the log file can almost certainly access the uid the server is running from, which is usually the root user. DO NOT give people write access to the directory where the logs are stored without knowing the consequences.

In addition, the log files can contain information provided directly by the customer, without escaping. Therefore, malicious clients can insert control characters into log files, so care must be taken when handling raw logs.

ServerRoot Directories Permissions
In normal operation, Apache is started by the root user and switches to the user specified in the User directive to serve calls. As with any command executed by the root user, you must ensure that it is protected from being changed by non-root users. Not only the files themselves should be writable only by root, but also the directories and the parents of all directories. For example, if you decide to put ServerRoot in / usr / local / apache, then it is recommended to create this directory as root using the following commands:
Code:
mkdir / usr / local / apache
cd / usr / local / apache
mkdir bin conf logs
chown 0. bin conf logs
chgrp 0. bin conf logs
chmod 755. bin conf logs

It is assumed that /, / usr, and / usr / local can only be modified by the root user. When installing the httpd executable, you must ensure that it is protected in the same way:
Code:
cp httpd / usr / local / apache / bin
chown 0 / usr / local / apache / bin / httpd
chgrp 0 / usr / local / apache / bin / httpd
chmod 511 / usr / local / apache / bin / httpd

You can create an htdocs subdirectory that can be modified by other users - since root never executes any files from there and should not create files there.

If you allow non-root users to modify any files that root is executing or writing, then you open up your system to compromise root. For example, someone might replace the httpd binary so that the next time it runs, it will execute arbitrary code. If the log directory is writable (by a non-root user), someone can replace the log file with a symbolic link to some other system file, and then root can overwrite that file with arbitrary data. If the log files themselves are writable (by a non-root user), then someone could overwrite the log itself with fake data.

Format of error logs. Module event log.

Table of contents
1. Types and modules of magazines. Apache access log format
2. Format of error logs. Module event log
2.1 Apache error
logs 2.2 ErrorLog directive
2.3 ErrorLogFormat directive
2.4 LogLevel
directive
2.5 LogLevelOverride directive
2.6 Module event log
3. Programs for analyzing Apache logs
4. Forensic logs
5. Additional configurable debug logs. CGI script execution logs

Apache error logs
The server error log, whose name and location is specified by the ErrorLog directive, is the most important log file. This is where Apache httpd will send diagnostic information and record any errors it encounters while processing requests. This is the first place to look when a server startup or server problem occurs, as it often contains details about what went wrong and how to fix it.

The error log is usually written to a file (usually error_log on Unix systems and error.log on Windows and OS / 2). On Unix systems it is also possible for the server to send errors to the syslog or to pipe them to a program.

The error log format is determined by the ErrorLogFormat directive, with which you can customize what values are written to the log. If you do not specify it, then the default value is used. A typical log message is as follows:

[Sun Aug 18 12: 43: 09.867536 2019] [authz_core: error] [pid 30395] [client 144.76.28.10:42847] AH01630: client denied by server configuration: / srv / http / suip / ru /, referer: https: //suip.biz/?act=proxy2
The first item in the log entry is the date and time of the message. Next is the module that generates the message (in this case authz_core) and the severity of that message. This is followed by the process ID and, if necessary, the thread ID of the process in which the condition occurred. Next, we have the address of the client who made the request (his IP address and the port number from which the connection was opened). And finally, a detailed error message that in this case the server refused the connection.

A very large number of different messages can appear in the error log. Most look similar to the example above. The error log will also contain debug information from CGI scripts. Any information written to stderr by the CGI script will be copied directly to the error log.

If you put the % L token in the error log and access log, a log record ID will be generated that you can map the error log entry to the access log entry. If mod_unique_id is loaded, its unique request ID will also be used as the log entry ID.

During testing, it is often helpful to constantly monitor the error log for problems. On Unix systems, you can do this using a command like:
Code:
tail -f / path / to / log / errors

For example:
Code:
tail -f / var / log / httpd / error_log
ErrorLog directive
Description: Sets the location where the server will log errors.

Syntax:
Code:
ErrorLog path-to-file | syslog [: [facility] [: tag]]

Default value:
Code:
ErrorLog logs / error_log # (Unix)
ErrorLog logs / error.log # (Windows and OS / 2)
Context: server config, virtual hosts.

The ErrorLog directive sets the name of the file into which the server will log any errors it encounters. If the file path is not absolute, it is assumed to be relative to ServerRoot.

Code:
ErrorLog "/ var / log / httpd / error_log"
If path-to-file begins with a pipe " | " then this is assumed to be the command to invoke the error log.

ErrorLog "| / usr / local / bin / httpd_errors"
See the Conveyor section for details.

Using syslog instead of a filename allows logging to syslogd (8) if the system supports it and if mod_syslog is loaded. The default syslog facility is local7, but you can override this using the syslog syntax : facility, where the facility can be one of the names usually documented in syslog (1). The tool is effectively global, and if it changes on individual virtual hosts, the said last resort affects the entire server. The same rules apply to the syslog tag, which by default uses the Apache binary name, in most cases httpd. You can also override this using the syntaxsyslog :: tag.

Code:
ErrorLog syslog: user
ErrorLog syslog: user: httpd.srv1
ErrorLog syslog :: httpd.srv2

Additional modules can provide their own ErrorLog providers. The syntax is similar to the syslog example above.

SECURITY: See the Security Considerations section to find out why your security could be compromised if the directory where the log files are stored is writable by someone other than the user starting the server.

Note: When entering a file path on non-Unix platforms, be careful to only use forward slashes, even though the platform may allow backslashes. In general, it is recommended to always use forward slashes in configuration files.

ErrorLogFormat directive
Description: Defines the format for error log entries.

Syntax:
Code:
ErrorLogFormat [connection | request] format
Context: server config, virtual hosts.

ErrorLogFormat allows you to specify what additional information is written to the error log in addition to the actual log message.

Code:
# Simple example
ErrorLogFormat "[% t] [% l] [pid% P]% F:% E: [client% a]% M"

Specifying a connection or request as the first parameter allows additional formats to be specified, resulting in additional information being logged when the first message is logged for a specific connection or request, respectively. This additional information is logged only once per connection / request. If the connection or request is processed without any log message, no additional information is logged either.

It may happen that some elements of the format string do not produce output. For example, the Referer header is only present if the log message is associated with the request, and the log message appears when the Referer header has already been read from the client. If no output is generated, the default behavior is to remove everything from the previous space to the next space. This means that the log line is implicitly split into fields at the transitions between non-white spaces and white spaces. If a format string element produces no output, the entire field is omitted. For example, if the remote address is % a in log format [% t] [% l] [% a]%M is not available, surrounding parentheses are also not registered. Space characters can be escaped with a backslash to prevent them from delimiting the field. The combination ' % ' (percent and space) is a zero-width field separator that produces no output.

The above behavior can be changed by adding modifiers to the format string element. The - (minus) modifier causes minus to be written if the corresponding element produces no output. In one-time-per-join / query formats, the + (plus) modifier can also be used. If the element with the plus modifier produces no output, the entire line is omitted.

A modifier number can be used to assign a log severity to a format element. The item will only be logged if the severity of the log message is not greater than the specified log severity level. The number can range from 1 (alert) to 4 (warn) and 7 (debug) to 15 (trace8).

For example, this is what happens if you add modifiers to the % {Referer} i token, which registers the Referer request header.

% - {Referer} i - Registers - if Referer is not set.

% + {Referer} i - Omit the entire line if Referer is not set.

% 4 {Referer} i - Registers the Referrer only if the severity of the log message is greater than 4.

Some format string elements take additional parameters in curly braces.

%% - A literal percent sign.

% a - Client IP address of the request (see also mod_remoteip module ).

% {c} a - The underlying IP address of the connection (see mod_remoteip ).

% A - Local IP address.

% {NAME} e - Contents of the environment variable request with NAME.

% E - APR / OS status code and error string

% F - Source file name and call log line number

% {NAME} i - NAME of the request header

% k - Number of keep-alive requests for this connection

% l - Log level of the message

% L - Log request ID

% {c} L - Log connection ID

% {C} L - Connection log ID, if used in connection scope, otherwise empty

% m - Name of the module logging the message

% M - Actual log message

% {NAME} n - Process ID of the current process

% P - The ID of the child process that served the request.

% T - Current process ID

% {g} T - Unique system ID of the current thread (same ID as displayed, for example, top; currently only on Linux)

% t - Current time

% {u} t - Current time, including microseconds

% {cu} t - Current time in compact ISO 8601 format, including microseconds

% v - The canonical ServerName of the server serving the request.

% V - Server name according to the UseCanonicalName setting .

\ ( backslash and space) - Whitespace separator without creating a new field (separator within one field, space within a field)

% (percent and space) - Field separator (no output)

The % L log identifier format creates a unique identifier for a connection or request. This can be used to match which log lines belong to the same connection or request, which request is happening with which connection. The % L format string is also available in mod_log_config to allow you to correlate access log entries with error log lines. If mod_unique_id is loaded, its unique identifier will be used as the log identifier for requests.

Code:
# Example (default format for MPM streams)
ErrorLogFormat "[% {u} t] [% -m:% l] [pid% P: tid% T]% 7F:% E: [client \% a]% M%, \ referer \% {Referer} i "

This can lead to error messages such as:
Code:
[Thu May 12 08: 28: 57.652118 2011] [core: error] [pid 8777: tid 4326490112] [client :: 1: 58619] File does not exist: /usr/local/apache2/htdocs/favicon.ico

Note that, as discussed above, some of the fields are completely omitted as they are undefined.

Code:
# Example (similar to 2.2.x format) ErrorLogFormat "[% t] [% l]% 7F:% E: [client \% a]% M%, \ referer \% {Referer} i"

2.
Code:
# Extended example with request / connection log ids
ErrorLogFormat "[% {uc} t] [% -m:% - l] [R:% L] [C:% {C} L]% 7F:% E:% M"
ErrorLogFormat request "[% {uc} t] [R:% L] Request% k on C:% {c} L pid:% P tid:% T"
ErrorLogFormat request "[% {uc} t] [R:% L] UA: '% + {User-Agent} i'"
ErrorLogFormat request "[% {uc} t] [R:% L] Referer: '% + {Referer} i'"
ErrorLogFormat connection "[% {uc} t] [C:% {c} L] local \% a remote \% A"
LogLevel directive
Description: Controls the verbosity of the ErrorLog .

Syntax:
Code:
LogLevel [module:] level [module: level] ...

Default value:
Code:
LogLevel warn
Context: server config, virtual hosts, directories

Compatibility: Module-level and directory-level customization is available in Apache HTTP Server 2.3.6 and later.

LogLevel adjusts the verbosity of messages written to error logs (see the ErrorLog directive). The following levels are available in decreasing order of importance:

If a specific level is specified, messages from all other levels of higher significance will also be reported. For example, if specified
Code:
LogLevel info
then messages with levels notice and warn will also be posted.

Please note that 404 (file not found) error messages generated by the web server (core) itself have info status:
Code:
[Mon Aug 19 05: 21: 07.846623 2019] [core: info] [pid 29057] [client 2604: a880: 2: d0 :: 651: 5001: 50
XR3XLnlGLEw.jpg


This means that with the default settings (LogLevel is set to warn), requests that end with a 404 status will not get into the error logs! To fix this, you need to set the level to info:

LogLevel info
Note that if a file that is being processed by another module is not found, then that module may set its own level. For example, the php7 module, if no PHP script is found, will set the error severity level for such a message and such an entry will go to the error log even with the default settings:

[Mon Aug 19 05: 26: 02.847140 2019] [php7: error] [pid 29256] [client 115.28.240.215:1920] script '/srv/http/suip/wp-login.php' not found or unable to stat
See Why 404 Error Logs Don't Save Apache Error Logs for details.

It is recommended to use a level of at least crit (or lower significance).

For example:

LogLevel notice
Note: When recording in a conventional message file a notification with the level notice can not be suppressed and thus are always logged. However, this is not the case when logging is done using syslog.

Specifying a level without a module name will reset the level for all modules to that level. Specifying a level with a module name will set the level for that module only. You can use the module source file name, module ID, or module ID with the final _module omitted as the module specification. This means that the following three specifications are equivalent:
Code:
LogLevel info ssl: warn
LogLevel info mod_ssl.c: warn
LogLevel info ssl_module: warn

It is also possible to change the level for each directory:
Code:
LogLevel info
<Directory "/ usr / local / apache / htdocs / app">
 LogLevel debug
</Directory>

Directory-level configuration for each directory only affects messages that are logged and associated with a query after parsing. Log messages related to the server or connection are not affected. However, the latter can be affected by the LogLevelOverride directive.

LogLevelOverride directive
Description: Override ErrorLog verbosity for specific clients.

Syntax:
Code:
LogLevel IP_address [/ range_prefix] [module:] level [module: level] ...
Default value: not set.

Context: server config, virtual hosts.

Compatibility: Available in Apache HTTP Server 2.5.0 and later.

LogLevelOverride configures the LogLevel for requests coming from specific client IP addresses. This allows you to enable verbose logging for specific test clients only. The IP address is checked at a very early state when processing the connection. Hence, LogLevelOverride allows you to change the log level for things like the SSL handshake that happen before the LogLevel directive in the <If> container is evaluated .

LogLevelOverride accepts either a single IP address or a CIDR IP address specification / subnet_length. For the syntax of the loglevel specification, see The LogLevel Directive.

For requests matching the LogLevelOverride directive, the LogLevel specifications for each directory are ignored.

Examples:
Code:
LogLevelOverride 192.0.2.0/24 ssl: trace6
   LogLevelOverride 192.0.2.7 ssl: trace8

LogLevelOverride only affects log messages related to a request or connection. Server-related log messages are not affected.

Module event log
The LogLevel directive allows you to specify a logging severity threshold for each module. Thus, if you are troubleshooting a problem with only one specific module, you can increase the amount of its information in the log without receiving information about other modules that you are not interested in. This is especially useful for modules like mod_proxy or mod_rewrite where you want to know the details of what they are trying to do and what is going on in them.

Do this by specifying the module name in your LogLevel directive:

LogLevel info rewrite: trace5
This sets the main LogLevel to info, but mod_rewrite will make it to trace5.

This replaces the per-module logging directives, such as RewriteLog, that were present in earlier versions of the server.

Please note that the information generated by the modules always ends up in the error log, even if it is not, in fact, an error! Also note that some modules will not display any information unless you set the trace level in the range trace1 to trace8.

Programs for analyzing Apache logs.

Content
  1. Programs for analyzing Apache logs
    1.1 Combining Apache logs into one file
    1.2 GoAccess
    1.3 LORG
    1.3.1 How to edit log formats in LORG
    1.4 ARTLAS
    1.5 Analyzing logs using command line tools (Bash)
Consolidating Apache Logs into One File
The current Apache log file is usually stored in a plain text file called access_log, and the error log in the error_log. Logs from previous days are usually also saved, but compressed into archives. They are named access_log.1.gz, access_log.2.gz, and so on.

bVz1JwrwNVA.jpg


If you need to analyze the log not only for the last day, but also for the previous ones, then all Apache logs can be combined into one file. This can be done on the command line using command grouping:
Code:
(zcat access_log. * gz && cat access_log)> biglog.txt

GoAccess
GoAccess is the most powerful Apache log analyzer, the program creates interactive reports that can be viewed in any browser. Works on both Linux and Windows. Suitable for general analysis of web server logs, for monitoring activity in real time, or for analyzing specific aspects of activity or problems.

Installation methods and even more examples can be found in the detailed description of this program on the page "GoAccess: a program for analyzing web server logs (full documentation, examples)".

The most typical run of the goaccess program to parse log files and generate a report that can be opened in a web browser:
Code:
cat LOG_FILE | goaccess - --log-format = FORMAT --output = FILE.html

The following log formats and values are supported for the --log-format option:
  • COMBINED - combined log format,
  • VCOMBINED - a combined log format with a virtual host,
  • COMMON - normal log format,
  • VCOMMON is a common log format with a virtual host,
  • W3C - W3C extended log format,
  • SQUID is the native Squid log format,
  • CLOUDFRONT - Amazon CloudFront Web Distribution,
  • CLOUDSTORAGE - Google Cloud Storage,
  • AWSELB - Amazon Elastic Load Balancing,
  • AWSS3 - Amazon Simple Storage Service (S3)
If you have a special format that does not fit any of the above, then you can configure the processing of any format in the configuration file, for this see the section " How to configure goaccess.conf".

In order to collect statistics on the countries that accessed the site (geolocation), you need to specify the path to the GeoIP database with the --geoip-database option, for example GeoLiteCity.dat or GeoLite2-City.mmdb .

If GeoIP2 is used, you need to download the GeoLite2-City.mmdb or GeoLite2-Country.mmdb database. These databases can be downloaded from the MaxMind.com website - download is free, but requires obtaining an API, so you need to register on the site - all this is free.

So, my large combined Apache log is located in the biglog.txt file, it is in COMBINED format, I want to save the generated report to the logs_report.html file and use geolocation using the GeoLite2-City.mmdb database for analysis, then the command is as follows:
Code:
cat biglog.txt | goaccess - --log-format = COMBINED --output = logs_report.html --geoip-database = GeoLite2-City.mmdb

You can open the generated report in any browser:
Code:
firefox logs_report.html

Example of a report:

If you are interested in a detailed description of each item, then see the article "Why and How to Analyze Web Server Logs".

Various output formats can be specified: -o --output = <path / file. [Json | csv | html]>:
  • / path / file. csv - Comma Separated Values (CSV)
  • / path / file. json - JSON (JavaScript Object Notation)
  • / path / file. html - HTML
That is, the format is determined by the file extension, so you can specify any name, and the file extension can be selected from one of the three presented.

To analyze referrers (referring sites), you can exclude the analyzed site itself, as well as various incorrect values, this is done with the --hide-referer option, which can be used many times:
Code:
cat biglog.txt | goaccess - --log-format = COMBINED --output = logs_report.html --hide-referer = "hackware.ru" --hide-referer = "-" --geoip-database = GeoLite2-City.mmdb

If search engines are not of interest among the referring sites, then they can also be added to the exclusions:
Code:
cat biglog.txt | goaccess - --log-format = COMBINED --output = logs_report.html --hide-referer = "hackware.ru" --hide-referer = "-" --hide-referer = "* google *" --hide -referer = "* yandex *" --geoip-database = GeoLite2-City.mmdb

Another option that improves the readability of the results is -d or the long version --with-output-resolver, this option enables the conversion of IP addresses to host names, only works for HTML and JSON formats.

Please note that when using the -d option, a large number of DNS lookups are performed and the generation of the log file may be slower.

By the way, you can use the online service GoAccess to analyze web server logs: https://suip.biz/?act=goaccess

This service accepts log files as unpacked text files or in .gz archives.

It says that this is an Apache log analyzer, but in fact, any log format that GoAccess supports is accepted.

LORG
LORG - Apache log file security analyzer, is a tool for advanced security analysis of HTTPD logs. It aims to implement various modern approaches to detecting web application attacks in HTTP traffic logs (such as Apache access logs (access_log files)), including signature-based, statistics, and machine learning techniques. Detected incidents are subsequently grouped into sessions that are classified as “manual” or automated to determine if the attacker is human or machine. In addition, geo-targeting and DNSBL lookups can be performed to see if attacks are originating from a specific geolocation or botnet. Additionally, attacks can be quantified in terms of success or failure based on anomalies within the size of HTTP responses.

A detailed description of LORG, a complete list of options and installation instructions can be found on this page: https://kali.tools/?p=4852

Run command:
Code:
./lorg OPTIONS input_file [output_file]

I will use the following options in the command:
  • -i input format. Options: common combined vhost logio cookie
  • -o output format. Variants: html json xml csv
  • -u perform URL decode for encoded requests (only affects reports)
  • -g enable geotag
The Apache log file is located in the ~ / access_log file, I want to save the report to the current folder in a file named report.htm, then the command is as follows:
Code:
./lorg -u -i combined -g -o html ~ / access_log report.htm

The report can be opened in a web browser:
Code:
firefox report.htm
How to edit log formats in LORG
In fact, the format of my web server log does not fit any of the suggested LORG (common combined vhost logio cookie) formats. The format of my file is very similar to combined, with the difference that at the end of the line is the hostname (site domain). You can edit the existing supported log formats or add your own. To do this, open the executable file of the program:
Code:
gedit ./lorg

We find the lines there:
Code:
static $ allowed_input_types = array (
 'common' => '% h% l% u% t \ "% r \"%> s% b',
 'combined' => '% h% l% u% t \ "% r \"%> s% b \ "% {Referer} i \" \ "% {User-Agent} i \"',
 'vhost' => '% v% h% l% u% t \ "% r \"%> s% b \ "% {Referer} i \" \ "% {User-Agent} i \"',
 'logio' => '% h% l% u% t \ "% r \"%> s% b \ "% {Referer} i \" \ "% {User-Agent} i \% I% O"' ,
 'cookie' => '% h% l% u% t \ "% r \"%> s% b \ "% {Referer} i \" \ "% {User-Agent} i \" \ "% {Cookie } i \ "'
);

To these lines, I will add the new format of my Hostland host:
Code:
'combined_hostland' => '% h% l% u% t \ "% r \"%> s% b \ "% {Referer} i \" \ "% {User-Agent} i \"% v',

It turned out like this:
Gem3zGUU8ds.jpg


We save and close the file.

I run the command again to analyze the logs, but in this case I specify combined_hostland as the format type:
Code:
./lorg -u -i combined_hostland -g -o html ~ / biglog.txt report2.htm

Although there are almost 6 million entries in the biglog.txt file, the analysis was pretty quick.

8R1vaRBNBMg.jpg


When the program terminates, it displays generalized statistics - how many incidents were found and how many users are involved in them:

GM8_7YQg7L0.jpg


Open the generated report:
Code:
firefox report2.htm

Above is a diagram with summarized information:
s0k4MVg-8Jw.jpg


Detailed information can be viewed for each incident:
9iXImJguMmI.jpg


ARTLAS
ARTLAS is a real-time Apache log analyzer. Based on the top 10 OWASP vulnerabilities, this program detects attempts to exploit your web applications and notifies you or your incident response team via Telegram, Zabbix and Syslog / SIEM.

ARTLAS uses regular expressions from the PHP-IDS project to identify exploitation attempts.

For details and installation instructions, see the page: https://kali.tools/?p=4832

Unfortunately, this program is written in Python 2 and has not been updated for a long time.

Analyzing logs with command line tools (Bash)
It is very convenient to use a combination of Linux commands for quick analysis of logs. This will help determine, for example, from which IPs the most requests came.

In the following commands, replace the access_log file with the name of your Apache log file. You can specify the full path to this file, for example, / var / log / httpd / access_log.

If the file is zipped, use zcat instead of cat:
Code:
zcat site.ru/logs/access_log.1.gz

If the command does not use cat, but the file is zipped, then you can slightly edit the command. For example, the following example processes the access_log file:
Code:
awk -F \ "'{print $ 6}' access_log | sort | uniq -c | sort -fr

The awk program (see Awk Tutorials), like most others, can accept data from standard input, so the same command can be rewritten as follows:
Code:
cat access_log | awk -F \ "'{print $ 6}' | sort | uniq -c | sort -fr

As you can see, it now has cat in it and therefore for compressed files this snippet can be used as follows:
Code:
zcat access_log.1.gz | awk -F \ "'{print $ 6}' | sort | uniq -c | sort -fr

Search by arbitrary string

The simplest example, search among requests by an arbitrary string (IP address, User-Agent, page address, etc.) using grep:
Code:
cat access_log | grep 'STRING'

To find all lines containing a specific response status, for example 403 (access denied):
Code:
cat access_log | grep '403'

List of all user agents, sorted by the number of times they have appeared:
Code:
awk -F \ "'{print $ 6}' access_log | sort | uniq -c | sort -fr

Analysis of various server responses and requests that triggered them:
Code:
awk '{print $ 9}' access_log | sort | uniq -c | sort

The output shows how many types of requests your site has received. A "normal" request result is a code of 200, which means that the page or file was requested and delivered. But many other options are also possible.

The most common answers are:
  • 200 - OK
  • 206 - Partial Content
  • 301 - Moved Permanently
  • 302 - Found
  • 304 - Not Modified
  • 401 - Unauthorized (password required)
  • 403 - Forbidden
  • 404 - Not Found

A 404 error indicates a missing resource. Take a look at the requested URIs that got this error.
Code:
grep "404" access_log | cut -d '' -f 7 | sort | uniq -c | sort -nr

Another option for displaying the most frequently not found pages on the site:
Code:
cat access_log | awk '($ 9 ~ / 404 /)' | awk '{print $ 7}' | sort | uniq -c | sort -rn | head -n 25

The IP addresses that made the most requests:
Code:
cat access_log | awk '{print $ 1}' | sort | uniq -c | sort -rn | head -n 25

Top 25 IP addresses with the most requests showing their country:

Install the required dependencies:
Code:
sudo apt install geoip-bin geoip-database-extra

The command to display the country of the IP addresses that made the most requests to the server:
Code:
cat access_log | awk '{print $ 1}' | sort | uniq -c | sort -rn | head -n 25 | awk '{printf ("% 5d \ t% -15s \ t", $ 1, $ 2); system ("geoiplookup" $ 2 "| cut -d \\: -f2")} '

To find sites that insert images of my site (when stealing articles, for example):
Code:
awk -F\" '($2 ~ /\.(jpg|png|gif)/ && $4 !~ /^https:\/\/(|www\.)hackware\.ru/){print $4}' access_log | sort | uniq -c | sort

Remember to edit the domain name in the previous and next commands.

To analyze all archives:
Code:
zcat access_log. * gz | awk -F \ "'($ 2 ~ /\.(jpg|png|gif)/ && $ 4! ~ /^https:\/\/(|www\.)hackware\.ru/) {print $ 4}' | sort | uniq -c | sort

Empty user agent
An empty user agent usually indicates that the request is coming from an automated script. The following command will display a list of IP addresses for these user agents, and based on it, you can decide what to do with them next - block or allow access:
Code:
awk -F \ "'($ 6 ~ / ^ -? $ /)' access_log | awk '{print $ 1}' | sort | uniq

Too much load from one source?
When your site is under heavy load, you need to figure out if the load is coming from real users or something else:
  • Setup or system problems
  • A custom app or bot is requesting information from your site too quickly
Displaying IP addresses sorted by the number of requests:
Code:
cat access_log | cut -d '' -f 1 | sort | uniq -c | sort -nr

10 most active IPs:
Code:
cat access_log | awk '{print $ 1; } '| sort | uniq -c | sort -n -r | head -n 10

Traffic in kilobytes by status codes:
Code:
cat access_log | awk '{total [$ 9] + = $ 10} END {for (x in total) {printf "Status code% 3d:% 9.2f Kb \ n", x, total [x] / 1024}}'

10 most popular referrers (don't forget to edit your domain name):
Code:
cat access_log | awk -F \ "'{print $ 4}' | grep -v '-' | grep -v 'https://hackware.ru' | sort | uniq -c | sort -rn | head -n 10

10 most popular user agents:
Code:
cat access_log | awk -F \ "'{print $ 6}' | sort | uniq -c | sort -rn | head -n 10

Analysis of IP activity for the last 10,000 site requests.
Code:
tail -10000 access_log | awk '{print $ 1}' | sort | uniq -c | sort -n

Distribution of user activity over time

Number of requests per day:
Code:
awk '{print $ 4}' access_log | cut -d: -f1 | uniq -c

Number of requests by hour (specify the day):
Code:
grep "04 / Jun" access_log | cut -d [-f2 | cut -d] -f1 | awk -F: '{print $ 2 ": 00"}' | sort -n | uniq -c
Apache logs (part 3): Programs for analyzing Apache logs, image # 7


Number of requests per minute (specify date and time):
Code:
grep "04 / Jun / 2020: 16" access_log | cut -d [-f2 | cut -d] -f1 | awk -F: '{print $ 2 ":" $ 3}' | sort -nk1 -nk2 | uniq -c | awk '{if ($ 1> 10) print $ 0}'
Apache logs (part 3): Programs for analyzing Apache logs, image # 8


Total unique visitors:
Code:
cat access_log | awk '{print $ 1}' | sort | uniq -c | wc -l

Unique visitors today:
Code:
cat access_log | grep `date '+% e /% b /% G`` | awk '{print $ 1}' | sort | uniq -c | wc -l

Unique visitors this month:
Code:
(zcat access_log. * gz && cat access_log) | grep `date '+% b /% G`` | awk '{print $ 1}' | sort | uniq -c | wc -l

Unique visitors for any date:
Code:
(zcat access_log. * gz && cat access_log) | grep 04 / Jun / 2020 | awk '{print $ 1}' | sort | uniq -c | wc -l

Unique visitors per month:
Code:
(zcat access_log. * gz && cat access_log) | grep Jun / 2020 | awk '{print $ 1}' | sort | uniq -c | wc -l

Popular on the site

Sorted statistics by "number of visitors / requests" "IP addresses of visitors":
Code:
cat access_log | awk '{print "requests from" $ 1}' | sort | uniq -c | sort

Most popular URLs:
Code:
cat access_log | awk '{print $ 7}' | sort | uniq -c | sort -rn | head -n 25

Monitoring site requests in real time

Monitoring requests in real time:
Code:
tail -f access_log | awk '{printf ("% - 15s \ t% s \ t% s \ t% s \ n", $ 1, $ 6, $ 9, $ 7)}'

Real-time IP address information:
Code:
tail -f access_log | awk '{"geoiplookup" $ 1 "| cut -d \\: -f2" | getline geo; printf ("% - 15s \ t% s \ t% s \ t% -20s \ t% s \ n", $ 1, $ 6, $ 9, geo, $ 7); } '

Analysis of IP addresses

List of all unique IP addresses:
Code:
cat access_log | awk '{print $ 1}' | sort | uniq

Unique IP addresses with date-time stamp:
Code:
cat access_log | awk '{print $ 1 "" $ 4}' | sort | uniq

Unique IP addresses and browsers:
Code:
cat access_log | awk '{print $ 1 "" $ 12 "" $ 19}' | sort | uniq

Unique IP addresses and OS:
Code:
cat access_log | awk '{print $ 1 "" $ 13}' | sort | uniq

Unique IP addresses, date-time and request method:
Code:
cat access_log | awk '{print $ 1 "" $ 4 "" $ 6}' | sort | uniq

Unique IP addresses, date-time and requested URL:
Code:
cat access_log | awk '{print $ 1 "" $ 4 "" $ 7}' | sort | uniq

Commands for checking logs on shared hosting
A quick display of the number of requests for each site on your Hostland account (useful when you need to find out which site is causing the increased load, which site consumes the most server resources):
Code:
while read -r line; do echo ===============; echo $ line; cat $ {line} logs / access_log | wc -l; done <<(ls -d * /)
 
Top