<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>blog.easytospell.net</title>
	<atom:link href="http://blog.easytospell.net/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.easytospell.net</link>
	<description>digging about the linux kernel, bad puns, and various other computer science things</description>
	<lastBuildDate>Mon, 21 Nov 2011 22:16:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Capital Punishment</title>
		<link>http://blog.easytospell.net/2011/11/21/capital-punishment/</link>
		<comments>http://blog.easytospell.net/2011/11/21/capital-punishment/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 22:16:15 +0000</pubDate>
		<dc:creator>rgm</dc:creator>
				<category><![CDATA[User Land]]></category>
		<category><![CDATA[libc]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[process creation]]></category>

		<guid isPermaLink="false">http://blog.easytospell.net/?p=125</guid>
		<description><![CDATA[There is no stay on these executions After talking about fork() the next logical step is to continue with exec(). The fork()/exec() pairing, as mentioned in my previous post in this series, is a method by which new processes are started in unix like systems. exec is the stage at which a new program is [...]]]></description>
			<content:encoded><![CDATA[<p><strong>There is no stay on these executions</strong></p>
<p>After talking about <a href="http://blog.easytospell.net/2011/01/18/a-fork-in-the-road/"><code>fork()</code></a> the next logical step is to continue with <code>exec()</code>. The <code>fork()</code>/<code>exec()</code> pairing, as mentioned in my previous post in this series, is a method by which new processes are started in unix like systems. <code>exec</code> is the stage at which a new program is loaded from disk to replace the currently running process. In this post I will cover the user land side of the <code>exec</code> family of functions which provide userspace the interface to start processes other than init. Most of the meat of how <code>exec()</code> works will be covered when I post about the kernel side of things.</p>
<p><strong>Diving into the details</strong></p>
<p>The library function that ultimately makes the magic happen is <code>execve</code>. It is the version of <code>exec</code> that accepts an argument vector (<code>v</code>) and an environment vector (<code>e</code>) to be used by the newly loaded program. This function is a very simple wrapper in front of the <code>sys_execve</code> system call. That is if you ignore the case of having bounds checking enabled in glibc. The <code>execve</code> function is as straightforward as they come, it simply passes it&#8217;s arguments on as is and makes the system call. In the unlikely case that bounds checking is enabled, it first iterates overs both <code>argv</code> and <code>envp</code> to make sure the strings actually terminate within legal bounding limits. The path to the executable is checked as well. The details are unimportant as it is disabled in almost all cases. When the bounds checking support is not compiled in it is a single line function.</p>
<pre class="brush: cpp; title: ;">
int __execve(const char *file, char *const argv[], char *const envp[])
{
  return INLINE_SYSCALL(execve, 3, file, argv, envp);
}
</pre>
<p>Granted the <code>INLINE_SYSCALL</code> macro does expand to substantially more than 1 line, but that&#8217;s a topic for another day. As you can see <code>execve</code> only takes 3 arguments; a path to the executable (<code>file</code>), a null terminated array of strings to use as program arguments (<code>argv</code>), and a null terminated array of strings to use as the environment (<code>envp</code>). To see this in action lets take a look at a very contrived example of what is marginally equivalent to a small portion of what happens when you type &#8220;ls -l .&#8221; at a shell prompt.</p>
<pre class="brush: cpp; title: ;">
char *envp[] = {&quot;FOO=bar&quot;, (char *)0};
char *argv[] = {&quot;/bin/ls&quot;, &quot;-l&quot;, &quot;.&quot;, (char *)0};
execve(&quot;/bin/ls&quot;, argv, envp);
</pre>
<p>The unusual thing about <code>execve</code> is when it succeeds it doesn&#8217;t return. This makes sense since once <code>execve</code> completes without error the calling process has been replaced and there isn&#8217;t a valid place to return to. In the error case the function returns -1 and <code>errno</code> is set to give you an idea of what went wrong. Check out the man page <code><a href="http://linux.die.net/man/2/execve">execve(2)</a></code> for a full list of those error numbers and what they mean. So instead of returning when all is well execution instead continues at the entry point of the new program that was loaded into memory.</p>
<p>It is also worth noting that the handling of scripts using the <code>#!</code> mechanism is done entirely in the kernel. Though with one caveat, in the case the calling program makes use of a <code>execve</code> wrapper function glibc does some magic in certain situations to preform this it self. If a file is passed in that is executable but the kernel fails to find a handler to run the file the library functions will try again with the file as an argument to <code>/bin/sh</code>. This provides a way to have simple shell scripts without a <code>#!</code> and can also provide lots of confusion.</p>
<p>I could leave it at that, as for the most part that is really all that happens in user space. However there are a plethora of convenience wrapper functions in libc to make calling <code>execve</code> &#8220;less&#8221; of a headache. These include, <code>execl</code>, <code>execlp</code>, <code>execle</code>, <code>execv</code>, <code>execvp</code>, and <code>execvpe</code>. They all are build on top of <code>execve</code> and have varying call signatures for various use cases.</p>
<p><strong>Letters make the difference</strong></p>
<p>The various letters after <code>exec</code> all have different meanings and relate to the type and number of arguments accepted and how they are handled.</p>
<p><code>exec<strong>l</strong>*()</code> refers to the fact that the arguments passed to the program being executed are included as individual arguments to the <code>exec<strong>l</strong>*()</code> function. This proves to be the simplest way of doing argument passing if the number of arguments is fixed. Since the arguments are passed directly to the <code>exec<strong>l</strong>*()</code> function you are limited to a fixed number of arguments at compile time.</p>
<pre class="brush: cpp; title: ;">
int execlp(const char *file, const char *arg, ...);
</pre>
<p>Arguments are read from the stack until a <code>NULL</code> character is found.</p>
<p><code>exec<strong>v</strong>*()</code> means that program arguments are passed via an array of strings in the same manner as <code>execve</code>. The array must be terminated with a <code>NULL</code> character as the last element. This allows you to have more flexibility for argument passing in comparison to <code>execl*</code>, at the cost of having to build the array your self.</p>
<p><code>exec*<strong>e</strong>()</code> means that environment is passed via an array of strings also in the same manner as <code>execve</code>. The array must be terminated with a <code>NULL</code> character as the last element just as the argument list in <code>execv*()</code>.</p>
<p><code>exec*<strong>p</strong>()</code> functions do path expansion using the <code>PATH</code> environment variable (or current directory is <code>PATH</code> is not set). Both <code>execvp</code> and <code>execlp</code> are wrappers around <code>execvpe</code> that does the actual path lookup before calling <code>execve</code>. While these functions do the same thing that a shell like bash does to allow you to type &#8216;<code>ls</code>&#8216; and correctly invoke &#8216;<code>/bin/ls</code>&#8216;, bash doesn&#8217;t actually use any of the <code>exec*<strong>p</strong>()</code> functions and does the path look up it self.</p>
<p>In the <code>exec</code> functions that do not specify an environment vector the new program inherits the environment of the calling process.</p>
<p><strong>Call signatures</strong></p>
<p>For completeness the following is a list of example calls for each of the <code>execve</code> helper functions:</p>
<p><code>execl</code>:</p>
<pre class="brush: cpp; title: ;">
execl(&quot;/bin/ls&quot;, &quot;/bin/ls&quot;, &quot;-l&quot;, &quot;.&quot;, (char *)0);
</pre>
<p><code>execlp</code>:</p>
<pre class="brush: cpp; title: ;">
execlp(&quot;ls&quot;, &quot;ls&quot;, &quot;-l&quot;, &quot;.&quot;, (char *)0);
</pre>
<p><code>execle</code>:</p>
<pre class="brush: cpp; title: ;">
char *envp[] = {&quot;FOO=bar&quot;, (char *)0};
execle(&quot;/bin/ls&quot;, &quot;/bin/ls&quot;, &quot;-l&quot;, &quot;.&quot;, (char *)0, envp);
</pre>
<p><code>execv</code>:</p>
<pre class="brush: cpp; title: ;">
char *argv[] = {&quot;/bin/ls&quot;, &quot;-l&quot;, &quot;.&quot;, (char *)0};
execv(&quot;/bin/ls&quot;, argv);
</pre>
<p><code>execvp</code>:</p>
<pre class="brush: cpp; title: ;">
char *argv[] = {&quot;ls&quot;, &quot;-l&quot;, &quot;.&quot;, (char *)0};
execvp(&quot;ls&quot;, argv);
</pre>
<p><code>execvpe</code>:</p>
<pre class="brush: cpp; title: ;">
char *envp[] = {&quot;FOO=bar&quot;, (char *)0};
char *argv[] = {&quot;ls&quot;, &quot;-l&quot;, &quot;.&quot;, (char *)0};
execvpe(&quot;ls&quot;, argv, envp);
</pre>
<p>That concludes this brief description of <code>execve</code> and friends. Stay tuned for further posts that dive into the kernel implementation of this functionality.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.easytospell.net/2011/11/21/capital-punishment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A fork() in the road</title>
		<link>http://blog.easytospell.net/2011/01/18/a-fork-in-the-road/</link>
		<comments>http://blog.easytospell.net/2011/01/18/a-fork-in-the-road/#comments</comments>
		<pubDate>Wed, 19 Jan 2011 02:47:08 +0000</pubDate>
		<dc:creator>rgm</dc:creator>
				<category><![CDATA[User Land]]></category>
		<category><![CDATA[libc]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[process creation]]></category>

		<guid isPermaLink="false">http://blog.easytospell.net/?p=60</guid>
		<description><![CDATA[Go fork yourself Several months ago a good friend of mine suggested that I write a post about process creation. Initially I planned on writing a single post on fork, clone, exec, and friends however after thinking about the scope of the topic I&#8217;ve decided to break the subject up into several posts. This is [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Go fork yourself</strong></p>
<p>Several months ago a good friend of mine suggested that I write a post about process creation. Initially I planned on writing a single post on <code>fork</code>, <code>clone</code>, <code>exec</code>, and friends however after thinking about the scope of the topic I&#8217;ve decided to break the subject up into several posts. This is the first and will cover the libc magic of the <code>fork</code> function. </p>
<p>For clarity i will be referring to library functions by name (ex <code>fork</code>) and system calls as <code>sys_name</code> (ie <code>sys_fork</code>). Several of the examples will be making use of the <code>strace</code> and <code>ltrace</code> utilities so if you&#8217;re not familiar with them, now would be a good time to read their man pages.</p>
<p><strong>Back to basics</strong></p>
<p>While I am assuming there is a certain level of existing knowledge of how unix style process creation works, a quick overview seems appropriate. This topic has been covered at length by many a author much more eloquent than I, and if you are new to these ideas I would suggest a more in depth review of the material else where.</p>
<p>In the before time new processes were created by duplicating the calling process in it&#8217;s enterty. That meant that all of the process was copied, the kernel data structures, page tables, and the memory allocated to the process. I&#8217;m sure you can imagine how slow this would get as processes eat more and more memory that would need to get copied each time a new process was created. This describes a very naive simplistic version of what happened at a fork that hasn&#8217;t actually been the way it implemented for a very long time. The modern version of the same process does mostly the same things, copies the kernel data structures and page tables associated with the process. However the actual memory allocated to the process is not copied. Instead the page table entries for both the parent and the child are marked as copy-on-write (COW). This allows the child to share the allocated memory with the parent and new pages are only allocated to the child when a write occurs. COW provides a mechanizm by which all the memory in both the parent and child is shared until one of the who writes to a page. When a write occurs a duplicate is created for the writing process (parent or child). Allowing a single set of pages to service both processes with minimal copying overhead. This process continues until all the pages have been duplicated, one of the processes exit, or exec is called. The details of how COW is implemented is outside the scope of this post but it is an important concept that I will more than likely write another post about in the future. The important thing to take from this is COW prevents needless data duplication and reduces the overhead of creating a new process to a much more manageable level. Since one of the more common situations that requires creating a new  process is executing a new program it makes sense to try and duplicate as little as possible from the parent. </p>
<p>Once a new process has been created by fork what is it to do? Well if you type &#8220;ls&#8221; at your command prompt bash <code>fork</code>s and then <code>exec</code>s ls. This means that the ls image is loaded from disk and replaces the child copy of bash. The <code>fork</code>/<code>exec</code> pattern is very common indeed; but from a functional level it is no different than any other process creation that doesn&#8217;t result in an <code>exec</code>. The only difference is that the logic in the process doing the <code>fork</code> is to have the child immediate call <code>exec</code>. There is nothing requiring the child to do that, and in many cases it won&#8217;t. The kernel and libc don&#8217;t care what the child does and while I am going to continue talking about exec as far as the process creation portion side of things is concerned the work is complete.</p>
<p>So <code>exec</code>. While in linux there is no actual &#8220;<code>exec</code>&#8221; library function, conceptually <code>exec</code> loads a new process image from disk to replace the currently running one. Most of the environment from the calling process is preserved after an <code>exec</code>, however there are a few things that get cleaned up. Such as signal handlers are reset to defaults, memory mappings are unmapped, shared memory segments are detached, etc. Once <code>exec</code> completes execution resumes at the entry point to the loaded process. </p>
<p>You may be wondering if all processes come from a parent process <code>fork</code>&#8216;ing where did the first one come from? Lets just say that in the beginning there was nothing and the kernel said &#8220;let there be pid 1&#8243; and so it was. Simply put the kernel creates the first process as part of the initial boot up and from that point onwards all new processes are created with <code>fork</code>. </p>
<p>That concludes a rather brief background on the <code>fork</code>/<code>exec</code> concept the details of many of the pieces described will be covered in this and subsequent posts in this series. </p>
<p><strong>Behind the libc curtain</strong></p>
<p>The description of the libc side of fork will use glibc as the reference implementation. That noted there is quite a lot of linux specific stuff to follow in this section since <code>fork</code> is so tightly wound with the linux threading code.</p>
<p>The first bit of libc magic around fork is that the library wrapper function <code>fork</code> does not actually call <code>sys_fork</code> but instead uses <code>sys_clone</code>. So I could have named this post &#8220;a clone() in the road&#8221; but that doesn&#8217;t have the same ring to it. With the integration of the nptl (native posix thread library) into glibc (happened in v2.3.2 in case you care.) the usage of the <code>sys_fork</code> call on most Linux systems went the way of the dodo. </p>
<p>A very contrived example will show the <code>fork</code> -&gt; <code>sys_clone</code> relationship. </p>
<pre class="brush: bash; title: ;">
[/tmp]: ltrace -e fork sh -c 'ls'
fork()                                           = 2785
&lt;ls output omitted for brevity&gt;
--- SIGCHLD (Child exited) ---
+++ exited (status 0) +++
</pre>
<p>Since <code>ltrace</code> gives us information about library calls this shows the library function <code>fork()</code> being called by &#8216;sh&#8217; to spawn &#8216;ls&#8217;.</p>
<pre class="brush: bash; title: ;">
[/tmp]: strace -e fork sh -c 'ls'
&lt;ls output omitted for brevity&gt;
</pre>
<p>Here with <code>strace</code> we are looking at system calls and can see that <code>sys_fork</code> is not being called.</p>
<pre class="brush: bash; title: ;">
[/tmp]: strace -e clone sh -c 'ls'
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fdad87d29d0) = 2789
&lt;ls output omitted for brevity&gt;
--- SIGCHLD (Child exited) @ 0 (0) ---
</pre>
<p>As promised a <code>sys_clone</code> is being done by the library call to <code>fork()</code>. <code>sys_clone</code> accepts a wide range of arguments that allow the child process to share various amounts of data with the parent. That fact is why clone is so important to threading, but that&#8217;s a topic for later. The arguments passed to <code>sys_clone</code> by <code>fork</code> result in the same behavior as when using <code>sys_fork</code> so this was a mostly transparent library change. </p>
<p>The <code>fork</code> wrapper is a bit more involved then one might initially imagine. Without looking into things it might appear that since <code>fork</code> takes no arguments all it would need to do is setup a hard coded set of arguments for <code>sys_clone</code> and trigger the syscall to let the kernel do it&#8217;s business. However that is not the case. Due to complications with threading there was a need for the ability to register handlers to be called before and after a call to <code>fork</code>. <code>pthread_atfork</code> provides this mechanism which is commonly used by multi-threaded libraries to protect internal state during a fork in a single threaded process making use of said library. The details of why this functionality is important is outside the scope of this article but needless to say it is important. If you are interested in why the man page for <code>pthread_atfork</code> is a good start. What is important right now is the fact that these handlers can be registered and they need to be dealt with during the <code>fork</code> process.</p>
<p>The handlers that are registered are stored in a single linked list which is walked by the <code>fork</code> code. The structure contains function pointers that perform the tasks that the code that registered the fork handler needs done. The structure is defined as follows:</p>
<pre class="brush: cpp; title: ;">
 struct fork_handler
 {
   struct fork_handler *next;
   void (*prepare_handler) (void);
   void (*parent_handler) (void);
   void (*child_handler) (void);
   void *dso_handle;
   unsigned int refcntr;
   int need_signal;
 };
</pre>
<p>The fields that are most relevant to this topic are the *_handler fields. These are the call-back function pointers mentioned earlier. Their names are pretty self evident on when they are called. But for due diligence the <code>prepare_handler</code> is called in the parent in the preparation for a call to <code>sys_clone</code>. <code>parent_handler</code> is called after the fork in the parent process, and <code>child_handler</code> is called from the child also after the fork. <code>refcntr</code> is used to prevent the list from being removed after a call to <code>fork</code> has already started. </p>
<p>Once all of the prepare handlers have been dealt with the actual call to <code>sys_clone</code> happens. This is done via a macro <code>ARCH_FORK</code> which on linux ends up calling <code>sys_clone</code>. Once the &#8220;fork&#8221; has happened two different code paths are followed depending on if execution is in the parent or the child. </p>
<p>In the child the first order of business is to reset some libc locks so the child gets a fresh lock states. The call-backs registered for the child are then run. In the parent the call-backs for, the parent, are run. Mostly the same in both cases just subtle differences in regards to lock states.</p>
<p>The final step is to return the pid value returned by <code>sys_clone</code>. In the child value will be 0 and the parent will be the new pid of the child. This simply makes detecting and providing different behavior depending on which process the code continues in easy. Often in the child the first thing done is to exec a new program, but that is the topic for the next post in this series.</p>
<p>That concludes the first of the process creation series. While I do intend to start working on the next part after completing this one other posts may wiggle their way in between each new post in this series. See you next time.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.easytospell.net/2011/01/18/a-fork-in-the-road/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>NUMA NUMA &#8211; Architectural Overview</title>
		<link>http://blog.easytospell.net/2010/06/04/numa-numa-architectural-overview/</link>
		<comments>http://blog.easytospell.net/2010/06/04/numa-numa-architectural-overview/#comments</comments>
		<pubDate>Sat, 05 Jun 2010 05:10:06 +0000</pubDate>
		<dc:creator>rgm</dc:creator>
				<category><![CDATA[Computer Architecture]]></category>
		<category><![CDATA[Kernel Architecture]]></category>
		<category><![CDATA[computer architecture]]></category>
		<category><![CDATA[kernel]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[mm]]></category>
		<category><![CDATA[numa]]></category>
		<category><![CDATA[operating system design]]></category>
		<category><![CDATA[smp]]></category>
		<category><![CDATA[uma]]></category>

		<guid isPermaLink="false">http://blog.easytospell.net/?p=19</guid>
		<description><![CDATA[Architectural overview of NUMA No not the silly youtube video, Non-Uniform Memory Access (NUMA) is a design model used in many newer multi-cpu computer systems. To understand NUMA it is best to first understand how things were before the its advent. The prototypical multiprocessor computer layout is symmetric multiprocessing (SMP) which uses a uniform memory [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Architectural overview of NUMA</strong></p>
<p>No not the silly <a title="youtube" href="http://www.youtube.com/watch?v=KmtzQCSh6xk">youtube</a> video, Non-Uniform Memory Access (NUMA) is a design model used in many newer multi-cpu computer systems. To understand NUMA it is best to first understand how things were before the its advent. The prototypical multiprocessor computer layout is symmetric multiprocessing (SMP) which uses a uniform memory access (UMA) model. The UMA nature of SMP means that each cpu is connected to a single memory bus. This methodology works well for a relatively small number of CPUs but as the number grows the contention for a single bus grows and cpus start having to wait in line for memory access for unacceptable lengths of time.</p>
<p>Bellow is a simple diagram of a SMP setup with 2 cpus connected to a single memory bus which is in turn connected to a single bank of memory.</p>
<p><a href="http://blog.easytospell.net/uploads/2010/05/smp.png"><img title="smp" src="http://blog.easytospell.net/uploads/2010/05/smp.png" alt="" width="162" height="200" /></a></p>
<p>Next  is a diagram of a NUMA setup with 4 cpus and 2 memory buses 2 cpus each. The 2 memory buses have their own bank of memory which in the NUMA nomenclature is referred to as local memory. The memory connected to the other bus is accessible however since it is not directly connected there are performance reductions for fetching information from a &#8220;remote&#8221; memory bank. This concept of local and remote memory is the fundamental principle of the NUMA architecture. Another important term when talking about NUMA is a node; in the diagram each group of memory and cpus that are connected to the same bus are considered a node.</p>
<p><a href="http://blog.easytospell.net/uploads/2010/05/numa.png"><img title="numa" src="http://blog.easytospell.net/uploads/2010/05/numa.png" alt="" width="321" height="200" /></a></p>
<p>The main consideration that needs to be make when building a NUMA aware system is recognizing the fact that not all memory takes the same amount of time to access. Memory allocations and process cpu locality need to be done taking account for what node the process that is requesting the allocation is in, and when doing process scheduling attempting not to evict a process from a cpu in one node to a cpu in a different node is important to providing the best performance.</p>
<p>Since cpus in addition to the main memory also include their own on chip caches another complication surfaces keeping all the caches consistent. This issue is referred to as cache coherence and affect all multi-cpu configurations that have a shared memory resource. Virtually all NUMA setups you will see in the wild are Cache coherent NUMA (ccNUMA) and I am not going to talk about non coherent setups.</p>
<p>Yet another complication with NUMA comes from the fact that accessing remote memory takes longer which can lead to problems with locking mechanisms. Taking the above diagram as an example if there is a lock structure in memory local to CPU0/1 a situation where the remote cpus are unable to take hold of the lock can occur. Say cpu0 is holding a spinlock, cpu2 then requests the lock and starts spinning, then cpu1 requests the lock and spins. Once cpu0 releases the lock due to the delay in accessing remote memory cpu2s request will likely have been beat out by cpu1.</p>
<p>Overall NUMA allows an operating system that is correctly accounting for the quirks of the design to scale well beyond the limitations of SMP. This mostly is due to the reduction of contention of a single memory bus which reduced the performance gained on SMP systems for each cpu added. The way most of us will see NUMA implemented is on dual or quad socket server motherboards with multi-core cpus. These setups often have a bank of memory for each socket meaning between 2-6 cpus per node.</p>
<p>This concludes the architectural overview. A future article will cover how linux takes advantage of NUMA hardware.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.easytospell.net/2010/06/04/numa-numa-architectural-overview/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SLAB Cache Organization</title>
		<link>http://blog.easytospell.net/2010/05/21/slab-cache-organization/</link>
		<comments>http://blog.easytospell.net/2010/05/21/slab-cache-organization/#comments</comments>
		<pubDate>Sat, 22 May 2010 06:38:45 +0000</pubDate>
		<dc:creator>rgm</dc:creator>
				<category><![CDATA[Kernel Architecture]]></category>
		<category><![CDATA[kernel]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[mm]]></category>

		<guid isPermaLink="false">http://blog.easytospell.net/?p=7</guid>
		<description><![CDATA[Slab based memory allocation is a mechanism to provide efficient allocations for commonly used data structures that has been implemented in a variety of UNIX derived operating systems.  Linux is among the *NIXs that uses the slab allocation method and actually provides a few variations. I will be covering just the basic slab system however [...]]]></description>
			<content:encoded><![CDATA[<p>Slab based memory allocation is a mechanism to provide efficient allocations for commonly used data structures that has been implemented in a variety of UNIX derived operating systems.  Linux is among the *NIXs that uses the slab allocation method and actually provides a few variations. I will be covering just the basic slab system however there are others including the slub system which is a variation intended to improve performance and reduce metadata overhead.</p>
<p><strong>To begin with; what is a slab allocator?</strong></p>
<p>Conceptually slab allocation is very simple, set aside some pages in memory designated for providing allocations of a specific size. The size of an object within a slab is usually based on the size of a commonly used kernel data structure. A few common examples of a structs that are allocated from a slab cashe are inodes, dentries, buffer_heads, and many more. The primary advantage to this methodology is a significant reduction in fragmentation of allocated memory as well as reducing the complexity of attempting to find an available chunk of memory for to satisfy the request.</p>
<p><strong>Organization of the slab cache</strong></p>
<p><a href="http://blog.easytospell.net/uploads/2010/05/slab.png"><img class="alignnone size-full wp-image-12" title="Organization of slab structures" src="http://blog.easytospell.net/uploads/2010/05/slab.png" alt="" width="511" height="181" /></a></p>
<p>(hurray for dia&#8230; starting to get the hang of that app)</p>
<p>Starting at the top is <em>cache_chain</em> a linked list containing all of the caches currently in existence. Each entry is a <em>kmem_cache</em> struct that organizes a cache for one size of object. Each <em>kmem_cache</em> contains an list of slabs for each NUMA node defined as an array of<em> kmem_list3 </em>structs. <em>kmem_list3</em> contains 3 (could you guess?) lists of slabs for the cach<em>e; slabs_partial, slabs_full, and slabs_free</em>. Each list is use to make decisions when servicing requests for a new object. <em>slabs_partial </em>is the list of slabs that are (wait for it) partially full and is the first place to look when allocating a new object. If there are no more free objects in a slab it gets moved to <em>slabs_full</em>, and if <em>slabs_partial</em> is empty <em>slabs_free</em> is checked for an available empty slab to be used.</p>
<p>Each <em>struct slab</em> within the 3 lists is a group of contiguous pages (quite often 1) and is the size a cache can be grown or shrunk. The process of keeping track of what objects within a slab are in use will be the subject of a future post. Each slab contains a different number of objects depending on the size of the object and the number of pages in each slab.</p>
<h4>Creating and Destroying slabs</h4>
<p>As objects are allocated and deallocated the number of slabs in <em>slabs_free</em> will change. When there are no available slabs in <em>slabs_free</em> a new slab must be allocated which is done by the <em>cache_grow</em>() function. <em>cache_grow()</em> kmalloc&#8217;s (indirectly) enough pages for the given slab from the NUMA node for the corresponding <em>kmem_list3</em> struct, sets up the struct <em>slab</em> and attaches it to the <em>slabs_free</em> list. The slab system sets up a workqueue on each cpu to shrink caches by calling the <em>cache_reap()</em> function. cache_reap() walks down the <em>cache_chain</em> list and attempts to free pages associated with slabs in various <em>slabs_free</em> lists. This process happens every few seconds and is designed to keep the slabs_free lists from holding onto pages for too long. Another time that caches are &#8220;reaped&#8221; is when kswapd is attempting to free up some memory if the overall system memory is getting low.</p>
<h4>Final Thoughts</h4>
<p>This has been a relatively light overview of the slab system and while there is a lot left out it serves as a good jumping off point for further articles on the subject. To poke around the slab cache on your running system you can cat the proc knob <em>/proc/slabinfo</em> which contains a list of all of the existing caches and a number of interesting statistics about each one.</p>
<p>I hope you have found this useful and if you have any comments, questions, hate mail, do let me know.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.easytospell.net/2010/05/21/slab-cache-organization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>digging back into computer science</title>
		<link>http://blog.easytospell.net/2010/05/17/digging-back-into-computer-science/</link>
		<comments>http://blog.easytospell.net/2010/05/17/digging-back-into-computer-science/#comments</comments>
		<pubDate>Tue, 18 May 2010 00:05:18 +0000</pubDate>
		<dc:creator>rgm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.easytospell.net/?p=4</guid>
		<description><![CDATA[it has been several years since i spent many brain cycles on what i would consider computer science. sure i&#8217;ve written some tools and started the process of learning the oddities of python, but nothing of real consequence. soon i will be starting a new job that will require me to get back into the [...]]]></description>
			<content:encoded><![CDATA[<p>it has been several years since i spent many brain cycles on what i would consider computer science. sure i&#8217;ve written some tools and started the process of learning the oddities of python, but nothing of real consequence. soon i will be starting a new job that will require me to get back into the cs mind set and as such this blog will serve as a repository of my studies.</p>
<p>a combination of topics related to linux kernel architecture and general computer science.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.easytospell.net/2010/05/17/digging-back-into-computer-science/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

