PHP
downloads | documentation | faq | getting help | mailing lists | licenses | wiki | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

stripcslashes> <strcspn
Last updated: Fri, 30 Jan 2009

view this page in

strip_tags

(PHP 4, PHP 5)

strip_tags文字列から HTML および PHP タグを取り除く

説明

string strip_tags ( string $str [, string $allowable_tags ] )

この関数は、指定した文字列 (str ) から全ての HTML および PHP タグを取り除きます。 この関数は、fgetss() 関数と同じタグ除去アルゴリズムを使用します。

パラメータ

str

入力文字列。

allowable_tags

オプションの2番目の引数により、取り除かないタグを指定できます。

注意: HTML コメントや PHP タグも削除されるようになりました。この機能はハードコードされており、 allowable_tags で変更することはできません。

返り値

タグを除去した文字列を返します。

変更履歴

バージョン 説明
5.0.0 strip_tags() がバイナリセーフとなりました。
4.3.0 HTML のコメントも除去するようになりました。
4.0.0 allowable_tags パラメータが追加されました。

例1 strip_tags() の例

<?php
$text 
'<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo 
strip_tags($text);
echo 
"\n";

// <p> と <a> は許可します
echo strip_tags($text'<p><a>');
?>

上の例の出力は以下となります。

Test paragraph. Other text
<p>Test paragraph.</p> <a href="#fragment">Other text</a>

注意

警告

strip_tags() は HTML の検証を行わないため、 不完全または壊れたタグにより予想以上に多くのテキスト/データが削除される可能性があります。

警告

この関数は、allowable_tags で許可した全てのタグの属性を修正しません。 これには、style および onmouseover属性が含まれており、 悪意のあるユーザが他のユーザに見せるようなテキストを投稿する際に危険な行為を行う可能性があります。

参考



stripcslashes> <strcspn
Last updated: Fri, 30 Jan 2009
 
add a note add a note User Contributed Notes
strip_tags
Leendert W
27-Jan-2009 12:51
Maybe also a usefull function for someone.

function removeUnsafeAttributesAndGivenTags($input, $validTags = '')
{
    $regex = '#\s*<(/?\w+)\s+(?:on\w+\s*=\s*(["\'\s])?.+?
\(\1?.+?\1?\);?\1?|style=["\'].+?["\'])\s*>#is';
    return preg_replace($regex, '<${1}>',strip_tags($input, $validTags));
}
phzzyzhou at gmail dot com
17-Jan-2009 12:01
strip_tags will strip '<' and the string behind, like this

<?php
$str
= <<<EOF
123 < 456
<a>link</a>
bbb
EOF;
echo
strip_tags($str);
?>

will output:
123

---------------------------------
this function will repiar this

<?php
function will_strip_tags($str) {
    do {
       
$count = 0;
       
$str = preg_replace('/(<)([^>]*?<)/' , '&lt;$2' , $str , -1 , $count);
    } while (
$count > 0);
   
$str = strip_tags($str);
   
$str = str_replace('>' , '&gt;' , $str);
    return
$str;
}

echo
will_strip_tags($str);
?>

will output:
123 &lt; 456
link
bbb
tleblan at pricegrabber dot com
14-Jan-2009 05:03
I think it is worth mentioning that if some tags are allowed using the second parameter, this function does not allow to strip attributes within the allowed tags and hence should not be used against XSS vulnerabilities.

One can still execute javascript by 2 means:
- by inserting attributes that typically accept javascript
  >> onClick="alert('XSS');"
- by using styles
  >> style="width:expression(alert('XSS'));" (works on IE7 and probably other versions)
matt at lvi dot org
02-Jan-2009 07:56
I have a correction for mdw252's whitelist-based stripAttributes function.

Using that function, if a user sends one of the following strings to the server, you get some undesired output, with varying levels of severity:
1. xss vulnerablities:
  <div onload..;,;.."xss.attack();">, <a href="javascript:xss.attack();">, <div onclick=";xss.attack();">
2. some characters break the html:
  <div style="border:1px solid blue;">

I believe that the function below takes care of those issues and is a little more flexible by allowing some parameters.

By default, the function will strip all attributes.  So, you could sanitize a string this way:
$string=stripAttributes($string);

To acheive the same as mdw252's function, which is to allow the id and class attributes as well as href on links, you would:
$allowable = array('class','id');
$exceptions = array('a'=>'href');
$string=stripAttributes($string,$allowable,$exceptions);

Or, if you wanted to
  - allow the "class" and "style" attributes generally,
  - the "align" attribute only on table cells, and
  - specify possible values for the align attribute,
you would:

$allowable = array('class','style');
$exceptions = array('table'=>'width','td'=>'align');
$values = array('align'=>array('left','center','right'));
$string=stripAttributes($string,$allowable,$exceptions,$values);

<?php
   
function stripAttributes($string, $allowable = NULL, $exceptions = NULL, $values = NULL, $nohrefevents= true, $crs=NULL) {

     
$string=str_replace('..;,;..', '=', $string);
      if (!
$crs)
       
$crs = 'a-zA-Z0-9 \>\<\-:;\(\)\.\,\/=\&';
     
$string=preg_replace('/[^'.$crs.'\'"]/i','',$string);
     
$string=preg_replace('/(<.*) (.*=)(['.$crs.']*) (.*>)/', '${1} ${2}"${3}" ${4}', $string);
      if (
$nohrefevents)
       
$string=preg_replace('/(<a .* )href="(javascript:.*>)/', '${1}onclick=${2}', $string);
     
     
//generally allowed attributes
     
if (is_array($allowable)){
        foreach (
$allowable as $allowed)
         
$string=preg_replace('/(<.* )'.$allowed.'=(.*>)/', '${1}'.$allowed.'..;,;..${2}', $string);
      }
     
     
//tag by tag exceptions
     
if (is_array($exceptions)){
        foreach (
$exceptions as $tag=>$attribute){
         
$string=preg_replace('/(<'.$tag.' ?.* )'.$attribute.'=(.*>)/', '${1}'.$attribute.'..;,;..${2}', $string);
        }
      }
     
     
//specified attribute values
     
if (is_array($values)){
        foreach (
$values as $attribute=>$value){
          if (
is_array($value)){
            foreach (
$value as $val)
              while(
preg_match('/(<.*) '.$attribute.'=(\'|")'.$val.'(\'|".*>)/', $string)) $string=preg_replace('/(<.*) '.$attribute.'=(\'|")'.$val.'(\'|".*>)/', '${1} '.$attribute.'..;,;..${2}'.$val.'${3}', $string);
          }
        }
      }

      while(
preg_match('/(<.*) .*=(\'|")(['.$crs.']*)(\'|")(.*>)/', $string)) $string=preg_replace('/(<.*) .*=(\'|")(['.$crs.']*)(\'|")(.*>)/', '${1}${5}', $string);
     
$string=str_replace('..;,;..', '=', $string);
     
      return
$string;
    }
?>
mariusz.tarnaski at wp dot pl
12-Nov-2008 06:05
Hi. I made a function that removes the HTML tags along with their contents:

Function:
<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {

 
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
 
$tags = array_unique($tags[1]);
   
  if(
is_array($tags) AND count($tags) > 0) {
    if(
$invert == FALSE) {
      return
preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
    }
    else {
      return
preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
    }
  }
  elseif(
$invert == FALSE) {
    return
preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
  }
  return
$text;
}
?>

Sample text:
$text = '<b>sample</b> text with <div>tags</div>';

Result for strip_tags($text):
sample text with tags

Result for strip_tags_content($text):
 text with

Result for strip_tags_content($text, '<b>'):
<b>sample</b> text with

Result for strip_tags_content($text, '<b>', TRUE);
 text with <div>tags</div>

I hope that someone is useful :) The exact explanation for Polish PHP programmers at http://www.tarnaski.eu/blog/rozszerzone-strip_tags/
lucky760 at VideoSift dot com
20-Oct-2008 08:21
It's come to my attention that PHP's strip_tags has been doing something funky to some video embed codes that our members submit. I'm not sure the exact situation, but whenever there is a <param> tag that is very long, strip_tags() will completely remove the tag even though it's specified as an allowable tag.

Here's an example of the existing problem:
<?php
// a single very long <param> tag
$html =<<<EOF
<param name="flashVars" value="skin=http%3A//cdn-i.dmdentertainm
...
[snip]...
vie%20of%20All-Time"/>
EOF;
echo
strip_tags($html, '<param>');
// this outputs an empty string
?>

This is the function I built to fix and extend the functionality of strip_tags(). The args are:
- $i_html - the HTML string to be parsed
- $i_allowedtags - an array of allowed tag names
- $i_trimtext - whether or not to strip all text outside of the allowed tags

<?php

function real_strip_tags($i_html, $i_allowedtags = array(), $i_trimtext = FALSE) {
  if (!
is_array($i_allowedtags))
   
$i_allowedtags = !empty($i_allowedtags) ? array($i_allowedtags) : array();
 
$tags = implode('|', $i_allowedtags);

  if (empty(
$tags))
   
$tags = '[a-z]+';

 
preg_match_all('@</?\s*(' . $tags . ')(\s+[a-z_]+=(\'[^\']+\'|"[^"]+"))*\s*/?>@i', $i_html, $matches);

 
$full_tags = $matches[0];
 
$tag_names = $matches[1];

  foreach (
$full_tags as $i => $full_tag) {
    if (!
in_array($tag_names[$i], $i_allowedtags))
      if (
$i_trimtext)
        unset(
$full_tags[$i]);
      else
       
$i_html = str_replace($full_tag, '', $i_html);
  }

  return
$i_trimtext ? implode('', $full_tags) : $i_html;
}
?>

And here's an example with the a block of full video embed code with <object><embed><param> and some extraneous HTML:

<?php
$html
=<<<EOF
<em><div><object type="application/x-shock
...
[snip]...
me.html">Wal-Mart Makes The Worst Movie of All-Time</a> -- powered by whatever</div></em>
EOF;
$good_html = real_strip_tags($html, array('object', 'embed', 'param'), TRUE);

?>

Now $good_html contains only the specified tags and none of the "powered by" type text. I hope someone finds this as useful as I needed it to be. :)
southsentry at yahoo dot com
25-Sep-2008 07:15
I was looking for a simple way to ban html from review posts, and the like. I have seen a few classes to do it. This line, while it doesn't strip the post, effectively blocks people from posting html in review and other forms.

<?php
if (strlen(strip_tags($review)) < strlen($review)) {
    return
false;
}
?>

If you want to further get by the tricksters that use & for html links, include this:

<?php
if (strlen(strip_tags($review)) < strlen($review)) {
        return
false;
} elseif (
strpos($review, "&") !== false) {
        return
5;
}
?>

I hope this helps someone out!
valentin -DOT- moreira -AT- atapear.com
15-Sep-2008 06:43
The following function has a small error:

<?
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si'// Strip out javascript
              
'@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
              
'@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
              
'@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return
$text;
}
?>

It´s a great function and work fine!, but don´t erase the inline <style> code.

This function only works 100% fine changing the regexp order to this:

<?
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si'// Strip out javascript
              
'@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
              
'@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
              
'@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return
$text;
}
?>
mdw252 at psu dot edu
13-Sep-2008 09:58
I think I may have come up with a pretty simple white-list based approach to attribute management. It's kind of hack-ish, but it's been pretty resilient with everything I've thrown at it. Check it out:

function stripAttributes($string)
{
        $string=preg_replace('/(<a.*)href=(.*>)/', '${1}href..;,;..${2}', $string);
        $string=preg_replace('/(<.*)id=(.*>)/', '${1}id..;,;..${2}', $string);
        $string=preg_replace('/(<.*)class=(.*>)/', '${1}class..;,;..${2}', $string);
        while(preg_match('/(<.*) .*=(\'|"|\w)\w*(\'|"|\w)(.*>)/', $string)) $string=preg_replace('/(<.*) .*=(\'|"|\w)\w*(\'|"|\w)(.*>)/', '${1}${4}', $string);
        $string=str_replace('..;,;..', '=', $string);
        return $string;
}

As you can see, I have it set to only allow href (in <a> tags), id, and class attributes, and everything else will be deleted. It should be pretty self-explanatory to customize it to your own purposes.
Liam Morland
24-Aug-2008 03:58
Here is a suggestion for getting rid of attributes: After you run your HTML through strip_tags(), use the DOM interface to parse the HTML. Recursively walk through the DOM tree and remove any unwanted attributes. Serialize the DOM back to the HTML string.

Don't make the default permit mistake: Make a list of the attributes you want to ALLOW and remove any others, rather than removing a specific list, which may be missing something important.
Logic
16-Jun-2008 01:58
Remember sometimes with regex it's easier to list what you want to keep instead of everything you do not want. Not to mention this makes it easier on the server. With html attributes there may only be a select few you would want to keep.

Rather than an array that says
$trash = array('a','b','d','f','g');

You could use
$keep = array('c','e');

Simply remember your ^not operator in your final regex.
razonklnbd at hotmail dot com
10-Jun-2008 12:45
When I attempt to use strip_tags it didn't strip text of that string. But I need to strip text all the text into an html page header code. This function will perform it operation like following way...
1. Check if string contain "<body>" tag
2. If found then keep the body text and remove other staff like css, js or any
3. Then do strip_tag function

Its a small but handy function... so I like to share.

Function Definition:

function extractBodyText($p_str, $p_allowedtag=NULL){
    $fstr=(preg_match('/<body[^>]*>(.*?)<\/body>/si', $p_str, $regs)?$fstr=$regs[1]:$p_str);
    $rtrn=(isset($p_allowedtag)?strip_tags($fstr, $p_allowedtag):strip_tags($fstr));
    return $rtrn;
}

Example:

$str01='
     <dd>

      <p class="para">
       You can use the optional second parameter to specify tags which should
       not be stripped.
      </p>
      <blockquote><p><b class="note">Note</b>:
      
        HTML comments and PHP tags are also stripped. This is hardcoded and
        can not be changed with <i><tt class="parameter">allowable_tags</tt></i>.
       <br />
      </p></blockquote>

     </dd>
';
$str='<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>

<body>
'.$str01.'
</body>
</html>
';
echo('<div>'.extractBodyText($str, '<p>').'</div>');
echo('<div>'.extractBodyText($str01).'</div>');
echo('<div>'.strip_tags($str, '<p>').'</div>');

Result:
<div>

    

      <p class="para">
       You can use the optional second parameter to specify tags which should
       not be stripped.
      </p>
      <p>Note:
      
        HTML comments and PHP tags are also stripped. This is hardcoded and
        can not be changed with allowable_tags.
      
      </p>

    

</div><div>
    

     
       You can use the optional second parameter to specify tags which should
       not be stripped.
     
      Note:
      
        HTML comments and PHP tags are also stripped. This is hardcoded and
        can not be changed with allowable_tags.
      
     

    
</div><div>

Untitled Document

    

      <p class="para">
       You can use the optional second parameter to specify tags which should
       not be stripped.
      </p>
      <p>Note:
      
        HTML comments and PHP tags are also stripped. This is hardcoded and
        can not be changed with allowable_tags.
      
      </p>

    

</div>
ZlobnyNigga
21-May-2008 07:14
Attempts to write stip_tags_attributes function looks like endless loop of finding vulnerabilities in function, patching them, then again vulnerabilities, then again patch...
I decided to use HTML_Safe package from http://pear.php.net/package/HTML_Safe
I works fine, but, of course, it is slower then functions written below. You decide =)
Massoud Abbagash
08-May-2008 01:56
Danno, your script has a flaw.

Try this :

<?php

function strip_tags_keep_links($sSource)
    {
        return
preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/\b((?![hH][rR][eE][fF]\b)\w+)[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource,'<a>'));
    }

$source = "<a href=javascript:alert('doesn\'t&nbsp;work!') title=\"move your mouse here\" href=http://www.a_web_site.org onmouseover\n=\nalert(\"doesn\'t&nbsp;work!\")  onmouseover='alert(\"doesn\'t&nbsp;work!\")' alt=\"move your mouse here\" > test</a>";

$result=strip_tags_keep_links($source);

echo(
$result);

?>
Massoud Abbagash
07-May-2008 01:26
There is still a flaw in your function.
Look at this, the [onmouseover] sample script below remains. Even after the treatment with the function [strip_tags_attributes].

<?php

function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onabort', 'onactivate', 'onafterprint', 'onafterupdate', 'onbeforeactivate', 'onbeforecopy', 'onbeforecut', 'onbeforedeactivate', 'onbeforeeditfocus', 'onbeforepaste', 'onbeforeprint', 'onbeforeunload', 'onbeforeupdate', 'onblur', 'onbounce', 'oncellchange', 'onchange', 'onclick', 'oncontextmenu', 'oncontrolselect', 'oncopy', 'oncut', 'ondataavaible', 'ondatasetchanged', 'ondatasetcomplete', 'ondblclick', 'ondeactivate', 'ondrag', 'ondragdrop', 'ondragend', 'ondragenter', 'ondragleave', 'ondragover', 'ondragstart', 'ondrop', 'onerror', 'onerrorupdate', 'onfilterupdate', 'onfinish', 'onfocus', 'onfocusin', 'onfocusout', 'onhelp', 'onkeydown', 'onkeypress', 'onkeyup', 'onlayoutcomplete', 'onload', 'onlosecapture', 'onmousedown', 'onmouseenter', 'onmouseleave', 'onmousemove', 'onmoveout', 'onmouseover', 'onmouseup', 'onmousewheel', 'onmove', 'onmoveend', 'onmovestart', 'onpaste', 'onpropertychange', 'onreadystatechange', 'onreset', 'onresize', 'onresizeend', 'onresizestart', 'onrowexit', 'onrowsdelete', 'onrowsinserted', 'onscroll', 'onselect', 'onselectionchange', 'onselectstart', 'onstart', 'onstop', 'onsubmit', 'onunload'))
    {
        if (empty(
$aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));

        return
preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
    }

$source="<big onmouseover=alert('Hello!')>Move your mouse here (this doesn't work with [ strip_tags_attributes ])</big>";
$striped_source=strip_tags_attributes($source,array('<big>'));
echo(
$striped_source);

?>

Now,this is my correction:

<?php

function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onabort', 'onactivate', 'onafterprint', 'onafterupdate', 'onbeforeactivate', 'onbeforecopy', 'onbeforecut', 'onbeforedeactivate', 'onbeforeeditfocus', 'onbeforepaste', 'onbeforeprint', 'onbeforeunload', 'onbeforeupdate', 'onblur', 'onbounce', 'oncellchange', 'onchange', 'onclick', 'oncontextmenu', 'oncontrolselect', 'oncopy', 'oncut', 'ondataavaible', 'ondatasetchanged', 'ondatasetcomplete', 'ondblclick', 'ondeactivate', 'ondrag', 'ondragdrop', 'ondragend', 'ondragenter', 'ondragleave', 'ondragover', 'ondragstart', 'ondrop', 'onerror', 'onerrorupdate', 'onfilterupdate', 'onfinish', 'onfocus', 'onfocusin', 'onfocusout', 'onhelp', 'onkeydown', 'onkeypress', 'onkeyup', 'onlayoutcomplete', 'onload', 'onlosecapture', 'onmousedown', 'onmouseenter', 'onmouseleave', 'onmousemove', 'onmoveout', 'onmouseover', 'onmouseup', 'onmousewheel', 'onmove', 'onmoveend', 'onmovestart', 'onpaste', 'onpropertychange', 'onreadystatechange', 'onreset', 'onresize', 'onresizeend', 'onresizestart', 'onrowexit', 'onrowsdelete', 'onrowsinserted', 'onscroll', 'onselect', 'onselectionchange', 'onselectstart', 'onstart', 'onstop', 'onsubmit', 'onunload'))
    {
        if (empty(
$aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));

        return
preg_replace('/\s(' . implode('|', $aDisabledAttributes) . ').*?([\s\>])/', '\\2', preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags))) );
    }

$source="<big onmouseover=alert('Hello!')>Move your mouse here (this work with [ strip_tags_attributes corrected ])</big>";
$striped_source=strip_tags_attributes($source,array('<big>'));
echo(
$striped_source);

?>
bluej100@gmail
02-May-2008 10:31
Allowing user HTML while preventing XSS is non-trivial. Don't just try to hack together a regexp for it; at very least, check your solution against all of the ha.ckers.org exploit examples:

http://ha.ckers.org/xss.html

Really, though, you should be using a solid library that recognizes tags, attributes, and styles from a whitelist and rebuilds the markup from scratch. HTMLPurifier has a "linkify" option that does what you're looking for.

http://htmlpurifier.org
LK
20-Apr-2008 02:30
Concerning all of the notes about which attributes to include in strip_tags_attributes(), the latest of which is by Kalle Sommer Nielsen:
Correct me if I'm wrong, but isn't it a lot easier to simply reject any attribute that starts with "on"? Thus, the whole array of various javascript attributes could be replaced with "on\w+".
I am not aware of any non-javascript attributes that start with these two letters, and if there were, it would be easier to make an exception for them than for the countless JS attributes.
Danno
08-Apr-2008 11:20
Hi everyone,

I came across this thread looking for a way to strip out all tags but links and leaving only the HREF attribute. I took what you guys have worked on and made it allow only the HREF attribute. This way even if the spec changes you are sure to not let any javascript sneak in, who knows what the future will bring :P . So I think its pretty tight, take a look at it and modify if you see any holes.

<?php

function strip_tags_keep_links($sSource)
    {
        return
preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/\b((?![hH][rR][eE][fF]\b)\w+)[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource,'<a>'));
    }

?>
Kalle Sommer Nielsen
31-Mar-2008 01:05
This adds alot of missing javascript events on the strip_tags_attributes() function from below entries.

Props to MSDN for lots of them ;)

<?php
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onabort', 'onactivate', 'onafterprint', 'onafterupdate', 'onbeforeactivate', 'onbeforecopy', 'onbeforecut', 'onbeforedeactivate', 'onbeforeeditfocus', 'onbeforepaste', 'onbeforeprint', 'onbeforeunload', 'onbeforeupdate', 'onblur', 'onbounce', 'oncellchange', 'onchange', 'onclick', 'oncontextmenu', 'oncontrolselect', 'oncopy', 'oncut', 'ondataavaible', 'ondatasetchanged', 'ondatasetcomplete', 'ondblclick', 'ondeactivate', 'ondrag', 'ondragdrop', 'ondragend', 'ondragenter', 'ondragleave', 'ondragover', 'ondragstart', 'ondrop', 'onerror', 'onerrorupdate', 'onfilterupdate', 'onfinish', 'onfocus', 'onfocusin', 'onfocusout', 'onhelp', 'onkeydown', 'onkeypress', 'onkeyup', 'onlayoutcomplete', 'onload', 'onlosecapture', 'onmousedown', 'onmouseenter', 'onmouseleave', 'onmousemove', 'onmoveout', 'onmouseover', 'onmouseup', 'onmousewheel', 'onmove', 'onmoveend', 'onmovestart', 'onpaste', 'onpropertychange', 'onreadystatechange', 'onreset', 'onresize', 'onresizeend', 'onresizestart', 'onrowexit', 'onrowsdelete', 'onrowsinserted', 'onscroll', 'onselect', 'onselectionchange', 'onselectstart', 'onstart', 'onstop', 'onsubmit', 'onunload'))
    {
        if (empty(
$aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));

        return
preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
    }
?>
sych
13-Mar-2008 05:26
brian, this solution is not good, because there are events that you will forget any way. Like, with this code you are vulnerable to attr "onMouseEnter" and tons of others that actually exist in javascript specs.
brian at diamondsea dot com
03-Mar-2008 09:47
An update agolna's update to sbritton's function:

Adds additional javascript events to the aDisabledAttributes array.

<?php
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onabort', 'onblue', 'onchange', 'onclick', 'ondblclick', 'onerror', 'onfocus', 'onkeydown', 'onkeyup', 'onload', 'onmousedown', 'onmousemove', 'onmouseover', 'onmouseup', 'onreset', 'onresize', 'onselect', 'onsubmit', 'onunload'))
    {
        if (empty(
$aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));

        return
preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
    }
?>
agolna at gmail dot com
29-Feb-2008 05:37
An update to sbritton's function:

If you have whitespace between the = sign and the attribute, it would bypass the regex.  This updates that.

<?php
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onclick', 'ondblclick', 'onkeydown', 'onkeypress', 'onkeyup', 'onload', 'onmousedown', 'onmousemove', 'onmouseout', 'onmouseover', 'onmouseup', 'onunload'))
    {
        if (empty(
$aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));

        return
preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
    }
?>
ZlobnyNigga
21-Feb-2008 06:22
sbritton's function is not so good...
<?php
$str
= "<p onmouseover = 'alert(1);'>123</p>";
echo
strip_tags_attributes($str);
?>
sbritton
04-Feb-2008 08:35
The function below corrects a typo in y5's function to strip tags and attributes - it also adds lithium1330's recommended 's' parameter:

<?php
   
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onclick', 'ondblclick', 'onkeydown', 'onkeypress', 'onkeyup', 'onload', 'onmousedown', 'onmousemove', 'onmouseout', 'onmouseover', 'onmouseup', 'onunload'))
    {
        if (empty(
$aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));

        return
preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")=[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
    }
?>
lithium1330[(at)]msn.com
25-Jan-2008 02:02
Please note: in the code given by y5, Tony Freeman, tREXX [www.trexx.ch] and maybe others, you need to use the modifier "s" at the end of the preg_replace()'s regex (/ies) in order to strip attributes that have a line break before them, otherwise those attributes wont be stripped.
bstrick at gmail dot com
15-Jan-2008 07:52
This will strip all PHP and HTML out of a file.  Leaves only plain txt.

// Open the search file
$file = fopen($filename, 'r');
               
// Get rid of all PHP code.       
$search = array('/<\?((?!\?>).)*\?>/s');
       
$text = fread($file, filesize($filename));

$new = strip_tags(preg_replace($search, '', $text));

echo $new;

fclose($file);

- Strick
y5
15-Jan-2008 06:59
An improved version of tREXX and Tony Freeman's code, this keeps the code clean while removing unwanted attributes, including the javascript: protocol. Unlike the built-in strip_tags() function, this takes an array for allowed tags, rather than a string. For example: array('<a>', '<object>');

I don't understand why the built-in function uses a string.. oh well =)

<?php
   
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onclick', 'ondblclick', 'onkeydown', 'onkeypress', 'onkeyup', 'onload', 'onmousedown', 'onmousemove', 'onmouseout', 'onmouseover', 'onmouseup', 'onunload'))
    {
        if (empty(
$aDisabledEvents)) return strip_tags($sSource, implode('', $aAllowedTags));

        return
preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")=[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
    }
?>
Matthieu Larcher
27-Jun-2007 06:44
I noticed some problems with the strip_selected_tags() function below, sometimes big chunks of contents where suppressed...
Here is a modified version that should run better.

<?php
function strip_selected_tags($text, $tags = array())
{
   
$args = func_get_args();
   
$text = array_shift($args);
   
$tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;
    foreach (
$tags as $tag){
        while(
preg_match('/<'.$tag.'(|\W[^>]*)>(.*)<\/'. $tag .'>/iusU', $text, $found)){
           
$text = str_replace($found[0],$found[2],$text);
        }
    }

    return
preg_replace('/(<('.join('|',$tags).')(|\W.*)\/>)/iusU', '', $text);
}

?>
birwin at suddensales dot com
23-Jun-2007 10:18
This is an upgrade to the illegal characters script by robt. This script will handle the input, even if the one or all of the fileds include arrays. Of course another loop could be added to handle compound arrays within arrays, but if you are savvy enough to be using compound arrays, you don't need me to rewrite the program.
<?
function screenForm($ary_check_for_html)
{
   
// check array - reject if any content contains HTML.
   
foreach($ary_check_for_html as $field_value)
    {       
        if(
is_array($field_value))
        {
            foreach(
$field_value as $field_array// if the field value is an array, step through it
           
{
           
$stripped = strip_tags($field_array);
                if(
$field_array!=$stripped)
                {
               
// something in the field value was HTML
               
return false;
                }
            }
        }else{
           
$stripped = strip_tags($field_value);
                if(
$field_value!=$stripped)
                {
               
// something in the field value was HTML
               
return false;
                }
            }
    }
    return
true;
}  
?>
geersc at hotmail dot com
12-May-2007 01:13
Hi,

I made the following adjustments to the "stripeentag()" function listed here.

Improvements are always welcome.

Regards,

Chris

<?php

function strip_attributes($msg, $tag, $attr, $suffix = "")
{                           
 
$lengthfirst = 0;
  while (
strstr(substr($msg, $lengthfirst), "<$tag ") != "")
  {
   
$tag_start = $lengthfirst + strpos(substr($msg, $lengthfirst), "<$tag ");       
   
   
$partafterwith = substr($msg, $tag_start);
   
   
$img = substr($partafterwith, 0, strpos($partafterwith, ">") + 1);
   
$img = str_replace(" =", "=", $img);                   
   
   
$out = "<$tag";
    for(
$i=0; $i < count($attr); $i++)
    {                 
      if (empty(
$attr[$i])) {
        continue;
      }                       
     
$long_val =
        (
strpos($img, " ", strpos($img, $attr[$i] . "=")) === FALSE) ?         
       
strpos($img, ">", strpos($img, $attr[$i] . "=")) - (strpos($img, $attr[$i] . "=") + strlen($attr[$i]) + 1) :
       
strpos($img, " ", strpos($img, $attr[$i] . "=")) - (strpos($img, $attr[$i] . "=") + strlen($attr[$i]) + 1);                                  
     
$val = substr($img, strpos($img, $attr[$i] . "=" ) + strlen($attr[$i]) + 1, $long_val);                                         
      if (!empty(
$val)) {
       
$out .= " " . $attr[$i] . "=" . $val;         
      }                     
    }
    if (!empty(
$suffix)) {
     
$out .= " " . $suffix;
    }       
   
   
$out .= ">";
   
$partafter = substr($partafterwith, strpos($partafterwith,">") + 1);           
   
$msg = substr($msg, 0, $tag_start). $out. $partafter;               
   
$lengthfirst = $tag_start + 3;
  } 
  return
$msg;
}

?>
lucky760 at yahoo dot com
23-Feb-2007 06:52
I needed a way to allow user comments to contain only hyperlinks as the only allowed HTML tags. This is easy enough to accomplish, but I also needed a way to convert full URLs into hyperlinks, and this complicated things a bit.

The functions below are not very elegant, but do the job. Function strip_tags_except() works similarly to the strip_selected_tags() function defined a few times on this page, but instead of allowing the user to specify the tags to strip, she can specify the tags to allow and strip all others. The third parameter, $strip, when TRUE removes "<" and ">" from the string and when FALSE converts them to "&lt;" and "&gt;" respectively.

Function url_to_link() simply converts full URLs into an equivalent hyperlink taking into consideration that users may end a URL with a character that's not actually part of the address.

When using both, url_to_link() should be called before strip_tags_except(). Here's an example as we are using it on http://www.VideoSift.com:
<?php
$summary
= url_to_link($summary);
$summary = strip_tags_except($summary, array('a'), FALSE);
?>
Here are the function definitions:
<?php
function strip_tags_except($text, $allowed_tags, $strip=TRUE) {
  if (!
is_array($allowed_tags))
    return
$text;

  if (!
count($allowed_tags))
    return
$text;

 
$open = $strip ? '' : '&lt;';
 
$close = $strip ? '' : '&gt;';

 
preg_match_all('!<\s*(/)?\s*([a-zA-Z]+)[^>]*>!',
   
$text, $all_tags);
 
array_shift($all_tags);
 
$slashes = $all_tags[0];
 
$all_tags = $all_tags[1];
  foreach (
$all_tags as $i => $tag) {
    if (
in_array($tag, $allowed_tags))
      continue;
   
$text =
     
preg_replace('!<(\s*' . $slashes[$i] . '\s*' .
       
$tag . '[^>]*)>!', $open . '$1' . $close,
       
$text);
  }

  return
$text;
}

function
url_to_link($text) {
 
$text =
   
preg_replace('!(^|([^\'"]\s*))' .
     
'([hf][tps]{2,4}:\/\/[^\s<>"\'()]{4,})!mi',
     
'$2<a href="$3">$3</a>', $text);
 
$text =
   
preg_replace('!<a href="([^"]+)[\.:,\]]">!',
   
'<a href="$1">', $text);
 
$text = preg_replace('!([\.:,\]])</a>!', '</a>$1',
   
$text);
  return
$text;
}
?>
rodt
16-Jan-2007 05:46
I have used this function successfully to prevent bots inserting HTML to web forms. Put the fields' contents into an array, then feed array to this function as an argument. Returns false if HTML is included; true if there is no HTML in any of the array's values. Hope it's helpful to someone.

    /*
    Checks that there is no HTML in any of provided fields.
    
     $ary_no_html_allowed = Array to check for HTML content.
    */
    function screenForm($ary_check_for_html){
        // check array - reject if any content contains HTML.
        foreach($ary_check_for_html as $field_value) {       
            $stripped = strip_tags($field_value);
               
            if($field_value!=$stripped) { // something in the field value was HTML
                return false;
            }
        }

        return true;
    }   
}
uersoy at tnn dot net
26-Dec-2006 03:18
admin at automapit dot com's function is great. Cleans everything I don't need :). But there is a small problem; strip style tags line should be before strip html tags line. Otherwise, strip html tags section cleans the <style></style> and between them is stays there as text.

<?php
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si'// Strip out javascript
              
'@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
              
'@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
              
'@<![\s\S]*?--[ \t\n\r]*>@'        // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return
$text;
}
?>
bermi ferrer
27-Nov-2006 11:40
Here is a faster and tested version of strip_selected_tags.

Previous example had a small bug that has been fixed now.

<?php

   
function strip_selected_tags($text, $tags = array())
    {
       
$args = func_get_args();
       
$text = array_shift($args);
       
$tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;
        foreach (
$tags as $tag){
            if(
preg_match_all( '/<'.$tag.'[^>]*>([^<]*)<\/'.$tag.'>/iu', $text, $found) ){
               
$text = str_replace($found[0],$found[1],$text);
            }
        }

        return
preg_replace( '/(<('.join('|',$tags).')(\\n|\\r|.)*\/>)/iu', '', $text);
    }

?>
bermi ferrer at (google it yourself :P )
24-Nov-2006 07:08
This is Salaverts function improved with suggestions from this page as it has been refactored forthe Akelos Framework (http://www.akelos.org) by Jose Salavert

Please note that the "u" modifier need to be lowercased. This function will also replace self-closing tags (XHTML <br /> <hr />) and will work if the text contains line breaks.

<?php

function strip_selected_tags($text, $tags = array())
{
   
$args = func_get_args();
   
$text = array_shift($args);
   
$tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;
    foreach (
$tags as $tag){
        if(
preg_match_all('/<'.$tag.'[^>]*>((\\n|\\r|.)*)<\/'. $tag .'>/iu', $text, $found)){
           
$text = str_replace($found[0],$found[1],$text);
        }
    }

    return
preg_replace('/(<('.join('|',$tags).')(\\n|\\r|.)*\/>)/iu', '', $text);
}

?>
David
05-Nov-2006 09:29
<?php

   
/**
     * strip_selected_tags ( string str [, string strip_tags[, strip_content flag]] )
     * ---------------------------------------------------------------------
     * Like strip_tags() but inverse; the strip_tags tags will be stripped, not kept.
     * strip_tags: string with tags to strip, ex: "<a><p><quote>" etc.
     * strip_content flag: TRUE will also strip everything between open and closed tag
     */
   
public function strip_selected_tags($str, $tags = "", $stripContent = false)
    {
       
preg_match_all("/<([^>]+)>/i",$tags,$allTags,PREG_PATTERN_ORDER);
        foreach (
$allTags[1] as $tag){
            if (
$stripContent) {
               
$str = preg_replace("/<".$tag."[^>]*>.*<\/".$tag.">/iU","",$str);
            }
           
$str = preg_replace("/<\/?".$tag."[^>]*>/iU","",$str);
        }
        return
$str;
    }

?>
anonymous
01-Nov-2006 02:52
A different approach to cleaning up HTML would be to first escape all unsafe characters:
& to &amp;
< to &lt;
> to &gt;
then to unescape matching pairs of tags back (e.g. "&lt;b&gt;hello&lt;/b&gt;" => "<b>hello</b>"), if it is identified safe.

This backwards-approach should be safer because if a tag is not identified correctly, it is, at the end, in an escaped state.

So if a user enters invalid html, or tags that are unsupported or unwanted, they are shown in plain text, and not stripped away. This is good, because the characters "<" and ">" might have been used in a different way (e.g. to make a text arrow: "a <=> b").
This is the case in most forums (apart from the fact that they use "[tag]"-tags instead of "<tag>"-tags)
pierresyraud at hotmail dot com
05-Oct-2006 10:43
A function inverse of, for strip any text and keep html tags !!!

function strip_text($a){
$i=-1;$n='';$ok=1;
while(isset($a{++$i})){
    if($ok&&$a{$i}!='<'){continue;}
    elseif($a{$i}=='>'){$ok=1;$n.='>';continue;}
    elseif($a{$i}=='<'){$ok=0;}
    if(!$ok){$n.=$a{$i};}}
  return $n;}
magdolen at elepha dot info
01-Oct-2006 05:24
i edited strip_selected_tags function that salavert created to strip also single tags (xhtml only)

here it is also with metric modification:

function strip_selected_tags($text, $tags = array()) {
    $args = func_get_args();
    // metric edit
    $text = preg_replace("/\r\n|\n|\r/","",array_shift($args));
    $tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;
   
    foreach ($tags as $tag){
        if(preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){
            $text = str_replace($found[0],$found[1],$text);
        }
        // hrax edit
        if(preg_match_all('/<'.$tag.'.*\/>/iU', $text, $found)){
            $text = str_replace($found[0], "", $text);
        }
    }
   
    return $text;
}
jausions at php dot net
19-Sep-2006 09:57
To sanitize any user input, you should also consider PEAR's HTML_Safe package.

http://pear.php.net/package/HTML_Safe
bfmaster_duran at yahoo dot com dot br
14-Sep-2006 04:32
I made this function with regular expression to remove some style properties from tags based in  other exaples here ;D
<?
function removeAttributes($htmlText)
{
      
$stripAttrib = "'\\s(class)=\"(.*?)\"'i"; //remove classes from html tags;
      
$htmlText = stripslashes($htmlText);
      
$htmlText = preg_replace($stripAttrib, '', $htmlText);
      
$stripAttrib = "/(font\-size|color|font\-family|line\-height):\\s".
             
"(\\d+(\\x2E\\d+\\w+|\\W)|\\w+)(;|)(\\s|)/i";
//remove font-style,color,font-family,line-height from style tags in the text;
      
$htmlText = stripslashes($tagSource);
      
$htmlText = preg_replace($stripAttrib, '', $htmlText);
      
$htmlText = str_replace(" style=\"\"", '', $htmlText); //remove empty style tags, after the preg_replace above (style="");
      
return $htmlText;
}
function
removeEvilTags($source)
{
   return
preg_replace('/<(.*?)>/ie', "'<'.removeEvilAttributes('\\1').'>'", $source);
}
?>

Usage:
<?

$text
= '<p style="line-height: 150%; font-weight: bold" class="MsoNormal"><span style="font-size: 10.5pt; line-height: 150%; font-family: Verdana">Com o compromisso de pioneirismo e aprimoramento, caracter&iacute;sticas da Oftalmocl&iacute;nica, novos equipamentos foram adquiridos para exames e diagn&oacute;sticos ainda mais precisos:</span></p>'; //This text is in brazillian portuguese ;D

echo htmlentities(removeEvilTags($text))."\r\n";

//This is return: <p style="font-weight: bold"><span>Com o compromisso de pioneirismo e aprimoramento, caracter&iacute;sticas da Oftalmocl&iacute;nica, novos equipamentos foram adquiridos para exames e diagn&oacute;sticos ainda mais precisos:</span></p>

?>

W0oT ! This is fantastic !

If you find an error, please report me to my mail ;D

(Y)
metric at 152 dot org
10-Aug-2006 09:46
I tried using the strip_selected_tags function that salavert created. It works really well for one line text, but if you have hard returns in the text it can't find the other tag.

I altered the line where it shifts the text into a variable to replace on OS line returns.
$text = preg_replace("/\r\n|\n|\r/","",array_shift($args));
admin at automapit dot com
09-Aug-2006 08:01
<?
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si'// Strip out javascript
              
'@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
              
'@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
              
'@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return
$text;
}
?>

This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way.

It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed!
09-Aug-2006 12:08
<?
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si'// Strip out javascript
               
'@<[\\/\\!]*?[^<>]*?>@si',            // Strip out HTML tags
               
'@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
               
'@<![\\s\\S]*?--[ \\t\\n\\r]*>@'          // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return
$text;
}
?>

This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way.

It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed!
elgios at gmail dot com
05-Aug-2006 10:33
I think that the new function works, but don't remove PHP tags, only html!!

<?php
function theRealStripTags2($string)
{

  
$tam=strlen($string);
  
// tam have number of cars the string

  
$newstring="";
  
// newstring will be returned

  
$tag=0;
  
/* if tag = 0 => copy car from string to newstring
       if tag > 0 => don't copy. Found one or more  '<' and need
       to search '>'. If we found 3 '<' need to find all the 3 '>'
   */

   /* I am C programmer. walk in a string is natural for me and more efficient
   */
  
for ($i=0; $i < $tam; $i++){
      
// If I found one '<', $tag++ and continue whithout copy
      
if ($string{$i} == '<'){
          
$tag++;
           continue;
       }

      
// if I found '>', decrease $tag and continue
      
if ($string{$i} == '>'){
           if (
$tag){
              
$tag--;
           }
      
/* $tag never be negative. If string is "<b>test</b>>"
           (error, of course) $tag will stop in 0
       */
          
continue;
       }

      
// if $tag is 0, can copy
      
if ($tag == 0){
          
$newstring .= $string{$i}; // simple copy, only one car
      
}
   }
    return
$newstring;
}

echo
theRealStripTags2("<tag>test</tag>");
// return  "test"

?>
elgios at gmail dot com
04-Aug-2006 06:24
I think that new function works.

function theRealStripTags2($string)
{

    $tam=strlen($string);
    // tam have number of cars the string

    $newstring="";
    // newstring will be returned

    $tag=0;
    /* tag = 0 => copy car from string to newstring
       tag > 0 => don't copy. Find one or mor tag '<' and
          need to find '>'. If we find 3 '<' need to find
          all 3 '>'
    */

    /* I am C programm. seek in a string is natural for me
        and more efficient

        Problem: copy a string to another string is more
        efficient but use more memory!!!
    */
    for ($i=0; $i < $tam; $i++){

        /* If I find one '<', $tag++ and continue whithout copy*/
        if ($string{$i} == '<'){
            $tag++;
            continue;
        }

        /* if I find '>', decrease $tag and continue */
        if ($string{$i} == '>'){
            if ($tag){
                $tag--;
            }
        /* $tag never be negative. If string is "<b>test</b>>" (error, of course)
            $tag stop in 0
        */
            continue;
        }

        /* if $tag is 0, can copy */
        if ($tag == 0){
            $newstring .= $string{$i}; // simple copy, only car
        }
    }
        return $newstring;
}
Sébastien
24-May-2006 06:22
hum, it seems that your function "theRealStripTags" won't have the right behavior in some cases, for example:

<?php
theRealStripTags
("<!-- I want to put a <div>tag</div> -->");
theRealStripTags("<!-- Or a carrot > -->");
theRealStripTags("<![CDATA[what about this! It's to protect from HTML characters like <tag>, > and so on in XML, no?]]> -->");
?>
xyexz at yahoo dot com
09-May-2006 06:41
I have found with this function that sometimes it will only remove the first carrot from a tag and leave the rest of the tag in the string, which obviously isn't what I'm looking for.

EX:
<?php

//Returns "tag>test/tag>"
echo strip_tags("<tag>test</tag>");

?>

I'm trying to strip_tags on a string I'm importing from xml so perhaps it has something to do with that but if you've run into this same issue I've written a function to fix it once and for all!

<?php

function theRealStripTags($string)
{
   
//while there are tags left to remove
   
while(strstr($string, '>'))
    {
       
//find position of first carrot
       
$currentBeg = strpos($string, '<');
       
       
//find position of end carrot
       
$currentEnd = strpos($string, '>');
       
       
//find out if there is string before first carrot
        //if so save it in $tmpstring
       
$tmpStringBeg = @substr($string, 0, $currentBeg);
       
       
//find out if there is string after last carrot
        //if so save it in $tmpStringEnd
       
$tmpStringEnd = @substr($string, $currentEnd + 1, strlen($string));
       
       
//cut the tag from the string
       
$string = $tmpStringBeg.$tmpStringEnd;
    }
       
    return
$string;
}

//Returns "test"
echo theRealStripTags('<tag>test</tag>');

?>
soapergem at gmail dot com
28-Apr-2006 07:21
In my prior comment I made a mistake that needs correcting. Please change the forward slashes that begin and terminate my regular expression to a different character, like the at-sign (@), for instance. Here's what it should read:

$regex  = '@</?\w+((\s+\w+(\s*=\s*';
$regex .= '(?:".*?"|\'.*?\'|[^\'">\s]+))?)+';
$regex .= '\s*|\s*)/?>@i';

(There were forward-slashes embedded in the regular expression itself, so using them to begin and terminate the expression would have caused a parse error.)
JeremysFilms.com
07-Apr-2006 11:57
A simple little function for blocking tags by replacing the '<' and '>' characters with their HTML entities.  Good for simple posting systems that you don't want to have a chance of stripping non-HTML tags, or just want everything to show literally without any security issues:

<?php

function block_tags($string){
   
$replaced_string = str_ireplace('<','&lt',$string);
   
$replaced_string = str_ireplace('>','&gt',$replaced_string);
    return
$replaced_string;
}

echo
block_tags('<b>HEY</b>'); //Returns &ltb&gtHEY&lt/b&gt

?>
cesar at nixar dot org
07-Mar-2006 09:44
Here is a recursive function for strip_tags like the one showed in the stripslashes manual page.

<?php
function strip_tags_deep($value)
{
  return
is_array($value) ?
   
array_map('strip_tags_deep', $value) :
   
strip_tags($value);
}

// Example
$array = array('<b>Foo</b>', '<i>Bar</i>', array('<b>Foo</b>', '<i>Bar</i>'));
$array = strip_tags_deep($array);

// Output
print_r($array);
?>
debug at jay dot net
24-Feb-2006 12:24
If you wish to steal quotes:
$quote=explode( "\n",
str_replace(array('document.writeln(\'','\')',';'),'',
strip_tags(
file_get_contents('http://www.quotationspage.com/data/1mqotd.js')
)
)
);
use $quote[2] & $quote[3]
It gives you a quote a day
balluche AROBASE free.fr
18-Feb-2006 12:16
//balluche:22/01/04:Remove even bad tags
function strip_bad_tags($html)
{
    $s = preg_replace ("@</?[^>]*>*@", "", $html);
    return $s;
}
salavert at~ akelos
13-Feb-2006 12:21
<?php
      
/**
    * Works like PHP function strip_tags, but it only removes selected tags.
    * Example:
    *     strip_selected_tags('<b>Person:</b> <strong>Salavert</strong>', 'strong') => <b>Person:</b> Salavert
    */

   
function strip_selected_tags($text, $tags = array())
    {
       
$args = func_get_args();
       
$text = array_shift($args);
       
$tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;
        foreach (
$tags as $tag){
            if(
preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){
               
$text = str_replace($found[0],$found[1],$text);
          }
        }

        return
$text;
    }

?>

Hope you find it useful,

Jose Salavert
webmaster at tmproductionz dot com
02-Feb-2006 05:28
<?php

function remove_tag ( $tag , $data ) {
   
    while (
eregi ( "<" . $tag , $data ) ) {
       
       
$it    = stripos ( $data , "<" . $tag   ) ;
               
       
$it2   = stripos ( $data , "</" . $tag . ">" ) + strlen ( $tag ) + 3 ;
               
       
$temp  = substr ( $data , 0    , $it  ) ;
   
       
$temp2 = substr ( $data , $it2 , strlen ( $data ) ) ;
       
       
$data = $temp . $temp2 ;
           
    }
   
    return
$data ;
   
}

?>

this code will remove only and all of the specified tag from a given haystack.
lucahomer at hotmail dot com
30-Jan-2006 03:42
I think the Regular expression posted <a href=function.strip-tags.php#51383>HERE</a>  is not correct

<?php
$disalowedtags
= array("font");

foreach (
$_GET as $varname)
foreach (
$disalowedtags as $tag)

----------------------------------------------------------
if (
eregi("<[^>]*".$tag."*\"?[^>]*>", $varname)) <---
----------------------------------------------------------

die(
"stop that");

?>

this function also replaces  links like this :
<a href=font.php>test</a>
because word "font" is between tags "<" ">".

I changed reg exp with this
-----------------------------------------------------
if (eregi("(<|</)".$tag."*\"?[^>]*>", $varname))
-----------------------------------------------------

bye

Luca
Nyks
11-Oct-2005 11:39
Note for BRYN at drumdatabse dot com (http://www.php.net/manual/fr/function.strip-tags.php#52085) :

I've changed your script to support more possibilities.
- The first WHILE loop reiterates the second WHILE to strip_tags the html tags which possibly are cuted by the substr() function (and not recognized by the strip_tags() function)
- There's no more bugs with substr($textstring,0,1024) ... yes, when the WHILE loop reiterates for the second, third, fourth... time, if the length of $textstring is smaller than 1024 it returns error

<?php
function strip_tags_in_big_string($textstring){
while(
$textstring != strip_tags($textstring))
    {
    while (
strlen($textstring) != 0)
         {
         if (
strlen($textstring) > 1024) {
             
$otherlen = 1024;
         } else {
             
$otherlen = strlen($textstring);
         }
        
$temptext = strip_tags(substr($textstring,0,$otherlen));
        
$safetext .= $temptext;
        
$textstring = substr_replace($textstring,'',0,$otherlen);
         }  
   
$textstring = $safetext;
    }
return
$textstring;
?>
info at christopher-kunz dot de
29-Aug-2005 04:34
Please note that the function supplied by daneel at neezine dot net is not a good way of avoiding XSS attacks. A string like
<font size=">>" <script>alert("foo")</script> face="tahoma" color="#DD0000">salut</font>
will be sanitized to
<font>>" <script>alert("foo")</script> face="tahoma" color="#DD0000">salut</font>
which is a pretty good XSS.

If you are in need of XSS cleaning, you might want to consider the Pixel-Apes XSS cleaner: http://pixel-apes.com/safehtml
daneel at neezine dot net
23-Aug-2005 03:08
Remove attributes from a tag except the attributes specified, correction of cool routine from joris878 (who seems don't work) + example.
When PHP will going to support this natively ?
Sorry for my english. Hope everybody understand.

--French--
Enlve des attributs d'une balise, sauf les attributs spcifis dans un tableau.
C'est une correction et un exemple de mise en oeuvre du code (trs utile) post par joris878 qui ne semblait pas fonctionner en l'tat.
Quand PHP supportera ceci de faon native ?
----------

<?
function stripeentag($msg,$tag,$attr) {
 
$lengthfirst = 0;
  while (
strstr(substr($msg,$lengthfirst),"<$tag ")!="")
  {
  
$imgstart = $lengthfirst + strpos(substr($msg,$lengthfirst), "<$tag ");
  
$partafterwith = substr($msg,$imgstart);
  
$img = substr($partafterwith,0,strpos($partafterwith,">")+1);
  
$img = str_replace(" =","=",$msg);
  
$out = "<$tag"

 for(
$i=0; $i <= (count($attr) - 1 );$i++)
 {
   
$long_val = strpos($img," ",strpos($img,$attr[$i]."=")) - (strpos($img,$attr[$i]."=") + strlen($attr[$i]) + 1) ;
   
$val = substr($img, strpos($img,$attr[$i]."=") + strlen($attr[$i]) + 1,$long_val);
     if(
strlen($val)>0) $attr[$i] = " ".$attr[$i]."=".$val;
     else
$attr[$i] = "";
    
$out .= $attr[$i];
 }

  
$out .= ">";
  
$partafter = substr($partafterwith,strpos($partafterwith,">")+1);
  
$msg = substr($msg,0,$imgstart).$out.$partafter;
  
$lengthfirst = $imgstart+3;
  }
  return
$msg;
}

$message = "<font size=\"10\" face=\"tahoma\" color=\"#DD0000\" >salut</font>" ;

//on ne garde que la couleur
//we want only "color" attribute
$message = stripeentag($message,"font",array("color"));

echo
$message ;
?>
10-Aug-2005 10:08
<?php
/**removes specifed tags from the text where each tag requires a
     *closing tag and if the later
     *is not found then everything after will be removed
     *typical usage:
     *some html text, array('script','body','html') - all lower case*/
   
public static function removeTags($text,$tags_array){
       
$length = strlen($text);
       
$pos =0;
       
$tags_array = $array_flip($tags_array);
        while (
$pos < $length && ($pos = strpos($text,'<',$pos)) !== false){
           
$dlm_pos = strpos($text,' ',$pos);
           
$dlm2_pos = strpos($text,'>',$pos);
            if (
$dlm_pos > $dlm2_pos)$dlm_pos=$dlm2_pos;
           
$which_tag = strtolower(substr($text,$pos+1,$dlm_pos-($pos+1)));
           
$tag_length = strlen($srch_tag);
            if (!isset(
$tags_array[$which_tag])){
               
//if no tag matches found
               
++$pos;
                continue;
            }
           
//find the end
           
$sec_tag = '</'.$which_tag.'>';
           
$sec_pos = stripos($text,$sec_tag,$pos+$tag_length);
           
//remove everything after if end of the tag not found
           
if ($sec_pos === false) $sec_pos = $length-strlen($sec_tag);
           
$rmv_length = $sec_pos-$pos+strlen($sec_tag);
           
$text = substr_replace($text,'',$pos,$rmv_length);
           
//update length
           
$length = $length - $rmv_length;
           
$pos++;
        }
        return
$text;
    }
?>
erwin at spammij dot nl
08-Jul-2005 06:13
if you want to disable you can easyly replace all instances of < and > , which will make all HTML code not working.
php at scowen dot com
07-Jun-2005 10:50
I have had a similar problem to kangaroo232002 at yahoo dot co dot uk when stripping tags from html containing javascript. The javascript can obviously contain '>' and '<' as comparison operators which are seen by strip_tags() as html tags - leading to undesired results.

To christianbecke at web dot de - this can be third-party html, so although perhaps not always 'correct', that's how it is!
anonymous
27-May-2005 10:45
Someone can use attributes like CSS in the tags.
Example, you strip all tagw except <b> then a user can still do <b style="color: red; font-size: 45pt">Hello</b> which might be undesired.

Maybe BB Code would be something.
bazzy
23-Apr-2005 03:09
I think bryn and john780 are missing the point - eric at direnetworks wasn't suggesting there is an overall string limit of 1024 characters but rather that actual tags over 1024 characters long (eg, in his case it sounds like a really long encrypted <a href> tag) will fail to be stripped.

The functions to slowly pass strings through strip_tags 1024 characters at a time aren't necessary and are actually counter productive (since if a tag spans the break point, ie it is opened before the 1024 characters and closed after the 1024 characters then only the opening tag is removed which leaves a mess of text up to the closing tag).

Only mentioning this as I spent ages working out a better way to deal with this character spanning before I actually went back and read eric's post and realised the subsequent posts were misleading - hopefully it'll save others the same headaches :)
bryn -at- drumdatabase dot net
21-Apr-2005 12:38
Further to john780's idea for a solution to the 1024 character limit of strip_tags - it's a good one, but I think the ltrim function isn't the one for the job? I wrote this simple function to get around the limit (I'm a newbie, so there may be some problem / better way of doing it!):

<?
function strip_tags_in_big_string($textstring){
    while (
strlen($textstring) != 0)
        {
       
$temptext = strip_tags(substr($textstring,0,1024));
       
$safetext .= $temptext;
       
$textstring = substr_replace($textstring,'',0,1024);
        }   
    return
$safetext;
}
?>

Hope someone finds it useful.
cz188658 at tiscali dot cz
07-Apr-2005 11:21
If you want to remove XHTML tags like <br /> (single pair tags), as an allowable_tags parametr you must include tag <br>
Jiri
php at arzynik dot com
29-Mar-2005 03:04
instead of removing tags that you dont want, sometimes you might want to just stop them from doing anything.

<?php
$disalowedtags
= array("script",
                       
"object",
                       
"iframe",
                       
"image",
                       
"applet",
                       
"meta",
                       
"form",
                       
"onmouseover",
                       
"onmouseout");

foreach (
$_GET as $varname)
foreach (
$disalowedtags as $tag)
if (
eregi("<[^>]*".$tag."*\"?[^>]*>", $varname))
die(
"stop that");

foreach (
$_POST as $varname)
foreach (
$disalowedtags as $tag)
if (
eregi("<[^>]*".$tag."*\"?[^>]*>", $varname))
die(
"stop that");

?>
christianbecke at web dot de
16-Feb-2005 04:34
to kangaroo232002 at yahoo dot co dot uk:

As far as I understand, what you report is not a bug in strip_tags(), but a bug in your HTML.
You should use alt='Go &gt;' instead of alt='Go >'.

I suppose your HTML diplays allright in browsers, but that does not mean it's correct. It just shows that browsers are more graceful concerning characters not properly escaped as entities than strip_tags() is.
kangaroo232002 at yahoo dot co dot uk
03-Feb-2005 03:23
After wondering why the following was indexed in my trawler despite stripping all text in tags (and punctuation) " valign left align middle border 0 src go gif name search1 onclick search", please take a quick look at what produced it: <DIV style="position: absolute; TOP:22%; LEFT:68%;"><input type="image" alt="Go >" valign="left" align="middle" border=0 src="go.gif" name="search1" onClick="search()"></div>...

looking at this closely, it is possible to see that despite the 'Go >' statement being enclosed in speech marks (with the right facing chevron), strip_tags() still assumes that it is the end of the input statement, and treats everything after as text. Not sure if this has been fixed in later versions; im using v4.3.3...

good hunting.
jon780 -at- gmail.com
03-Feb-2005 07:18
To eric at direnetworks dot com regarding the 1024 character limit:

You could simply ltrim() the first 1024 characters, run them through strip_tags(), add them to a new string, and remove them from the first.

Perform this in a loop which continued until the original string was of 0 length.
dumb at coder dot com
17-Jan-2005 02:22
/*
15Jan05

Within <textarea>, Browsers auto render & display certain "HTML Entities" and "HTML Entity Codes" as characters:
&lt; shows as <    --    &amp; shows as &    --    etc.

Browsers also auto change any "HTML Entity Codes" entered in a <textarea> into the resultant display characters BEFORE UPLOADING.  There's no way to change this, making it difficult to edit html in a <textarea>

"HTML Entity Codes" (ie, use of &#60 to represent "<", &#38 to represent "&" &#160 to represent "&nbsp;") can be used instead.  Therefore, we need to "HTML-Entitize" the data for display, which changes the raw/displayed characters into their HTML Entity Code equivalents before being shown in a <textarea>.

how would I get a textarea to contain "&lt;" as a literal string of characters and not have it display a "<"
&amp;lt; is indeed the correct way of doing that. And if you wanted to display that, you'd need to use &amp;amp;lt;'. That's just how HTML entities work.

htmlspecialchars() is a subset of htmlentities()
the reverse (ie, changing html entity codes into displayed characters, is done w/ html_entity_decode()

google on ns_quotehtml and see http://aolserver.com/docs/tcl/ns_quotehtml.html
see also http://www.htmlhelp.com/reference/html40/entities/
*/
eric at direnetworks dot com
21-Dec-2004 04:36
the strip_tags() function in both php 4.3.8 and 5.0.2 (probably many more, but these are the only 2 versions I tested with) have a max tag length of 1024.  If you're trying to process a tag over this limit, strip_tags will not return that line (as if it were an illegal tag).   I noticed this problem while trying to parse a paypal encrypted link button (<input type="hidden" name="encrypted" value="encryptedtext">, with <input> as an allowed tag), which is 2702 characters long.  I can't really think of any workaround for this other than parsing each tag to figure out the length, then only sending it to strip_tags() if its under 1024, but at that point, I might as well be stripping the tags myself.
ashley at norris dot org dot au
01-Nov-2004 05:11
leathargy at hotmail dot com wrote:

"it seems we're all overlooking a few things:
1) if we replace "</ta</tableble>" by removing </table, we're not better off..."

I beat this by using ($input contains the data):

<?php
while($input != strip_tags($input)) {
           
$input = strip_tags($input);
        }
?>

This iteratively strips tags until all tags have gone :)
@dada
29-Sep-2004 03:41
if you  only want to have the text within the tags, you can use this function:

function showtextintags($text)

{

$text = preg_replace("/(\<script)(.*?)(script>)/si", "dada", "$text");
$text = strip_tags($text);
$text = str_replace("<!--", "&lt;!--", $text);
$text = preg_replace("/(\<)(.*?)(--\>)/mi", "".nl2br("\\2")."", $text);

return $text;

}

it will show all the text without tags and (!!!) without javascripts
Anonymous User
22-Aug-2004 07:24
Be aware that tags constitute visual whitespace, so stripping may leave the resulting text looking misjoined.

For example,

"<strong>This is a bit of text</strong><p />Followed by this bit"

are seperable paragraphs on a visual plane, but if simply stripped of tags will result in

"This is a bit of textFollowed by this bit"

which may not be what you want, e.g. if you are creating an excerpt for an RSS description field.

The workaround is to force whitespace prior to stripping, using something like this:

      $text = getTheText();
      $text = preg_replace('/</',' <',$text);
      $text = preg_replace('/>/','> ',$text);
      $desc = html_entity_decode(strip_tags($text));
      $desc = preg_replace('/[\n\r\t]/',' ',$desc);
      $desc = preg_replace('/  /',' ',$desc);
Isaac Schlueter php at isaacschlueter dot com
17-Aug-2004 05:32
steven --at-- acko --dot-- net pointed out that you can't make strip_slashes allow comments.  With this function, you can.  Just pass <!--> as one of the allowed tags.  Easy as pie: just pull them out, strip, and then put them back.

<?php
function strip_tags_c($string, $allowed_tags = '')
{   
   
$allow_comments = ( strpos($allowed_tags, '<!-->') !== false );
    if(
$allow_comments )
    {
       
$string = str_replace(array('<!--', '-->'), array('&lt;!--', '--&gt;'), $string);
       
$allowed_tags = str_replace('<!-->', '', $allowed_tags);
    }
   
$string = strip_tags( $string, $allowed_tags );
    if(
$allow_comments ) $string = str_replace(array('&lt;!--', '--&gt;'), array('<!--', '-->'), $string);
    return
$string;
}
?>
Isaac Schlueter php at isaacschlueter dot com
16-Aug-2004 09:16
I am creating a rendering plugin for a CMS system (http://b2evolution.net) that wraps certain bits of text in acronym tags.  The problem is that if you have something like this:
<a href="http://www.php.net" title="PHP is cool!">PHP</a>

then the plugin will mangle it into:

<a href="http://www.<acronym title="PHP: Hypertext Processor">php</acronym>.net" title="<acronym title="PHP: Hypertext Processor">PHP</acronym> is cool!>PHP</a>

This function will strip out tags that occur within other tags.  Not super-useful in tons of situations, but it was an interesting puzzle.  I had started out using preg_replace, but it got riduculously complicated when there were linebreaks and multiple instances in the same tag.

The CMS does its XHTML validation before the content gets to the plugin, so we can be pretty sure that the content is well-formed, except for the tags inside of other tags.

<?php
if( !function_exists( 'antiTagInTag' ) )
{
   
// $content is the string to be anti-tagintagged, and $format sets the format of the internals.
   
function antiTagInTag( $content = '', $format = 'htmlhead' )
    {
        if( !
function_exists( 'format_to_output' ) )
        {   
// Use the external function if it exists, or fall back on just strip_tags.
           
function format_to_output($content, $format)
            {
                return
strip_tags($content);
            }
        }
       
$contentwalker = 0;
       
$length = strlen( $content );
       
$tagend = -1;
        for(
$tagstart = strpos( $content, '<', $tagend + 1 ) ; $tagstart !== false && $tagstart < strlen( $content ); $tagstart = strpos( $content, '<', $tagend ) )
        {
           
// got the start of a tag.  Now find the proper end!
           
$walker = $tagstart + 1;
           
$open = 1;
            while(
$open != 0 && $walker < strlen( $content ) )
            {
               
$nextopen = strpos( $content, '<', $walker );
               
$nextclose = strpos( $content, '>', $walker );
                if(
$nextclose === false )
                {   
// ERROR! Open waka without close waka!
                    // echo '<code>Error in antiTagInTag - malformed tag!</code> ';
                   
return $content;
                }
                if(
$nextopen === false || $nextopen > $nextclose )
                {
// No more opens, but there was a close; or, a close happens before the next open.
                    // walker goes to the close+1, and open decrements
                   
$open --;
                   
$walker = $nextclose + 1;
                }
                elseif(
$nextopen < $nextclose )
                {
// an open before the next close
                   
$open ++;
                   
$walker = $nextopen + 1;
                }
            }
           
$tagend = $walker;
            if(
$tagend > strlen( $content ) )
               
$tagend = strlen( $content );
            else
            {
               
$tagend --;
               
$tagstart ++;
            }
           
$tag = substr( $content, $tagstart, $tagend - $tagstart );
           
$tags[] = '<' . $tag . '>';
           
$newtag = format_to_output( $tag, $format );
           
$newtags[] = '<' . $newtag . '>';
           
$newtag = format_to_output( $tag, $format );
        }
       
       
$content = str_replace($tags, $newtags, $content);
        return
$content;
    }
}
Tony Freeman
20-Nov-2003 12:45
This is a slightly altered version of tREXX's code.  The difference is that this one simply removes the unwanted attributes (rather than flagging them as forbidden).

function removeEvilAttributes($tagSource)
{
        $stripAttrib = "' (style|class)=\"(.*?)\"'i";
        $tagSource = stripslashes($tagSource);
        $tagSource = preg_replace($stripAttrib, '', $tagSource);
        return $tagSource;
}

function removeEvilTags($source)
{
    $allowedTags='<a><br><b><h1><h2><h3><h4><i>' .
             '<img><li><ol><p><strong><table>' .
             '<tr><td><th><u><ul>';
    $source = strip_tags($source, $allowedTags);
    return preg_replace('/<(.*?)>/ie', "'<'.removeEvilAttributes('\\1').'>'", $source);
}

$text = '<p style="Normal">Saluton el <a href="#?"
 class="xsarial">Esperanto-lando</a><img src="my.jpg"
 alt="Saluton" width=100 height=100></p>';

$text = removeEvilTags($text);

var_dump($text);
leathargy at hotmail dot com
26-Oct-2003 08:15
it seems we're all overlooking a few things:
1) if we replace "</ta</tableble>" by removing </table, we're not better off. try using a char-by-char comparison, and replaceing stuff with *s, because then this ex would become "</ta******ble>", which is not problemmatic; also, with a char by char approach, you can skip whitespace, and kill stuff like "< table>"... just make sure <&bkspTable> doesn't work...
2) no browser treats { as <.[as far as i know]
3) because of statement 2, we can do:
$remove=array("<?","<","?>",">");
$change=array("{[pre]}","{[","{/pre}","]}");
$repairSeek = array("{[pre]}", "</pre>","{[b]}","{[/b]}","{[br]}");
// and so forth...

$repairChange("<pre>","</pre>","<b>","<b>","<br>");
// and so forth...

$maltags=array("{[","]}");
$nontags=array("{","}");
$unclean=...;//get variable from somewhere...
$unclean=str_replace($remove,$change,$unclean);
$unclean=str_replace($repairSeek, $repairChange, $unclean);
$clean=str_replace($maltags, $nontags, $unclean);

////end example....
4) we can further improve the above by using explode(for our ease):
function purifyText($unclean, $fixme)
{
$remove=array();
$remove=explode("\n",$fixit['remove']);
//... and so forth for each of the above arrays...
// or you could just pass the arrays..., or a giant string
//put above here...
return $clean
}//done
dougal at gunters dot org
10-Sep-2003 11:03
strip_tags() appears to become nauseated at the site of a <!DOCTYPE> declaration (at least in PHP 4.3.1). You might want to do something like:

$html = str_replace('<!DOCTYPE','<DOCTYPE',$html);

before processing with strip_tags().
Chuck
21-Mar-2003 02:01
Caution, HTML created by Word may contain the sequence
'<?xml...'

Apparently strip_slashes treats this like <?php and removes the remainder of the input string. Not the just the XML tag but all input that follows.
guy at datalink dot SPAMMENOT dot net dot au
15-Mar-2002 08:19
Strip tags will NOT remove HTML entities such as &nbsp;
chrisj at thecyberpunk dot com
18-Dec-2001 10:57
strip_tags has doesn't recognize that css within the style tags are not document text. To fix this do something similar to the following:

$htmlstring = preg_replace("'<style[^>]*>.*</style>'siU",'',$htmlstring);

stripcslashes> <strcspn
Last updated: Fri, 30 Jan 2009
 
 
show source | credits | sitemap | contact | advertising | mirror sites